llama : refactor sampling v2 (#9294)

- Add `struct llama_sampler` and `struct llama_sampler_i` - Add `llama_sampler_` API - Add `llama_sampler_chain_` API for chaining multiple samplers - Remove `LLAMA_API_INTERNAL` - Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`
2024-09-07 15:16:19 +03:00 · 2024-09-07 15:16:19 +03:00 · df270ef745
commit df270ef745
parent 947538acb8
48 changed files with 3497 additions and 2914 deletions
--- a/examples/server/README.md
+++ b/examples/server/README.md
@ -470,8 +470,6 @@ node index.js

    `frequency_penalty`: Repeat alpha frequency penalty. Default: `0.0`, which is disabled.

-    `penalty_prompt`: This will replace the `prompt` for the purpose of the penalty evaluation. Can be either `null`, a string or an array of numbers representing tokens. Default: `null`, which is to use the original `prompt`.
-
    `mirostat`: Enable Mirostat sampling, controlling perplexity during text generation. Default: `0`, where `0` is disabled, `1` is Mirostat, and `2` is Mirostat 2.0.

    `mirostat_tau`: Set the Mirostat target entropy, parameter tau. Default: `5.0`
@ -724,7 +722,6 @@ Example:
            "stopping_word": ""
        },
        "penalize_nl": true,
-        "penalty_prompt_tokens": [],
        "presence_penalty": 0.0,
        "prompt": "Say hello to llama.cpp",
        "repeat_last_n": 64,
@ -748,8 +745,7 @@ Example:
        "tfs_z": 1.0,
        "top_k": 40,
        "top_p": 0.949999988079071,
-        "typical_p": 1.0,
-        "use_penalty_prompt_tokens": false
+        "typical_p": 1.0
    }
 ]
 ```