Apply review comments

2023-04-23 13:55:28 +02:00 · 2023-04-23 13:55:28 +02:00 · 1195577355
commit 1195577355
parent 9521390ce4
2 changed files with 4 additions and 6 deletions
--- a/examples/common.cpp
+++ b/examples/common.cpp
@ -156,8 +156,6 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
            params.interactive = true;
        } else if (arg == "--embedding") {
            params.embedding = true;
-        } else if (arg == "--interactive-start") {
-            params.interactive = true;
        } else if (arg == "--interactive-first") {
            params.interactive_first = true;
        } else if (arg == "-ins" || arg == "--instruct") {
--- a/examples/main/README.md
+++ b/examples/main/README.md
@ -1,6 +1,6 @@
 # main

-This example shows how to run a LLaMA-like model for chatting or generating text. There are three basic modes of operation:
+This example shows how to run a LLaMA-like model for chatting or generating text. There are two basic modes of operation:

 - Text generation: starting from an initial prompt, produce additional text as predicted by the model
 - Chat: alternate between user input to complete a prompt and generating some text based on the input
@ -21,7 +21,7 @@ The most basic application for an LLM is producing more text from a given prompt

 This will run until the model predicts an end-of-text token, or until ``--n_predict`` tokens have been generated (a value of ``-1`` means unlimited). If you want the model to keep going without ever predicting end-of-text on its own, use the ``--ignore-eos`` parameter.

-When generating "infinite" text, the model will at some point exhaust its context size, that is, the number of past tokens it can remember. When this happens, the oldest half of the context is forgotten, and the most recent half is used to seemlessly continue the text. Text generation will temporarily
+When generating "infinite" text, the model will at some point exhaust its context size, that is, the number of past tokens it can remember. When this happens, the oldest half of the context is forgotten, and the most recent half is used to seamlessly continue the text. Text generation will temporarily
 slow down until the context is recomputed. If you want the model to remember the initial prompt (rather than just continuing from the most recent text), you can pass ``--keep -1`` on the command line (to remember the full prompt), or give a specific number of tokens to remember. In theory, this should lead to more consistency across longer texts.

 So, a useful starting point for long text generation (using the default llama-7B model) would be:
@ -59,7 +59,7 @@ runs a basic chat where the prompt primes the model to expect one of the dialog

 The following parameters control sampling, that is, how a token is randomly selected from the most probable candidates that are predicted by the model. For more background, see [https://huggingface.co/blog/how-to-generate](https://huggingface.co/blog/how-to-generate).

- ``--seed``: the starting value of the randon number generator. If you use a positive value, a given prompt will produce the same output in each run. A negative value (the default) will usually produce somehwat different output in each run.
+- ``--seed``: the starting value of the random number generator. If you use a positive value, a given prompt will produce the same output in each run. A negative value (the default) will usually produce somewhat different output in each run.
 - ``--temp``: the "temperature" parameter of the softmax function. A higher temperature means less likely words are being picked more often. Conversely, a temperature of 0 will always pick the most likely next token, leading to identical outputs in each run.
- ``--top_k``, ``--top_p``: restrict the selection of the final token candiates to the *k* most likely, or the tokens that combine to a probability mass of at least *p*.
+- ``--top_k``, ``--top_p``: restrict the selection of the final token candidates to the *k* most likely, or the tokens that combine to a probability mass of at least *p*.
 - ``--repeat_last_n``, ``--repeat_penalty``: reduces repetitions within the last *n* tokes by applying a penalty to repeated tokens.