llama : add llama_sampling API + move grammar in libllama

ggml-ci
2024-08-05 10:08:25 +03:00 · 2024-08-05 10:08:25 +03:00 · f648ca2cee
commit f648ca2cee
parent b69a480af4
48 changed files with 2481 additions and 2590 deletions
--- a/examples/server/README.md
+++ b/examples/server/README.md
@ -470,8 +470,6 @@ node index.js

    `frequency_penalty`: Repeat alpha frequency penalty. Default: `0.0`, which is disabled.

-    `penalty_prompt`: This will replace the `prompt` for the purpose of the penalty evaluation. Can be either `null`, a string or an array of numbers representing tokens. Default: `null`, which is to use the original `prompt`.
-
    `mirostat`: Enable Mirostat sampling, controlling perplexity during text generation. Default: `0`, where `0` is disabled, `1` is Mirostat, and `2` is Mirostat 2.0.

    `mirostat_tau`: Set the Mirostat target entropy, parameter tau. Default: `5.0`
@ -724,7 +722,6 @@ Example:
            "stopping_word": ""
        },
        "penalize_nl": true,
-        "penalty_prompt_tokens": [],
        "presence_penalty": 0.0,
        "prompt": "Say hello to llama.cpp",
        "repeat_last_n": 64,
@ -748,8 +745,7 @@ Example:
        "tfs_z": 1.0,
        "top_k": 40,
        "top_p": 0.949999988079071,
-        "typical_p": 1.0,
-        "use_penalty_prompt_tokens": false
+        "typical_p": 1.0
    }
 ]
 ```