sampling : refactor + optimize penalties sampler (#10803)

* sampling : refactor + optimize penalties sampler

ggml-ci

* common : apply ignore_eos as logit bias

ggml-ci

* batched : remove penalties sampler

* params : allow penalty_last_n == -1 to be equal to context size

ggml-ci

* common : by default, move the penalties at the end of the sampling chain

ggml-ci

* common : ignore all EOG tokens

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* common : move back the penalties at the front of the sampling chain

ggml-ci

* readme : restore hint about --ignore-eos flag [no ci]

* llama : minor

ggml-ci

* webui : update

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
This commit is contained in:
Georgi Gerganov 2024-12-16 12:31:14 +02:00 committed by GitHub
parent 4ddd199f6f
commit 644fd71b44
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 111 additions and 152 deletions

View file

@ -177,16 +177,11 @@ Example usage: `--temp 0`
- `--repeat-penalty N`: Control the repetition of token sequences in the generated text default: 1.0, 1.0 = disabled).
- `--repeat-last-n N`: Last n tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).
- `--no-penalize-nl`: Disable penalization for newline tokens when applying the repeat penalty.
The `repeat-penalty` option helps prevent the model from generating repetitive or monotonous text. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. The default value is 1.
The `repeat-last-n` option controls the number of tokens in the history to consider for penalizing repetition. A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only consider recent tokens. A value of 0 disables the penalty, and a value of -1 sets the number of tokens considered equal to the context size (`ctx-size`).
Use the `--no-penalize-nl` option to disable newline penalization when applying the repeat penalty. This option is particularly useful for generating chat conversations, dialogues, code, poetry, or any text where newline tokens play a significant role in structure and formatting. Disabling newline penalization helps maintain the natural flow and intended formatting in these specific use cases.
Example usage: `--repeat-penalty 1.15 --repeat-last-n 128 --no-penalize-nl`
### DRY Repetition Penalty
DRY (Don't Repeat Yourself) sampling is an effective technique for reducing repetition in generated text even across long contexts by penalizing tokens based on their recent usage patterns (original [PR link](https://github.com/oobabooga/text-generation-webui/pull/5677)).