llama : add remove_space_prefix to llama_detokenize

This commit adds a new parameter to llama_detokenize to remove the leading space before tokens if they have a word boundary character. The motivation for this change is that when llama_server returns completion_propabilities, the tokens are detokenized and currently the leading space for the boundary tokens are removed. With this change llama_server can set remove_space_prefix to false and the leading space will be preserved. Resolves: https://github.com/ggerganov/llama.cpp/issues/11728
2025-02-10 09:47:18 +01:00 · 2025-02-10 09:47:18 +01:00 · cc1fd2fd0d
commit cc1fd2fd0d
parent d7b31a9d84
7 changed files with 35 additions and 24 deletions
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@ -2297,7 +2297,7 @@ struct server_context {
            for (size_t i = 0; i < std::min(n_vocab, n_probs); i++) {
                result.probs.push_back({
                    cur[i].id,
-                    common_detokenize(ctx, {cur[i].id}, special),
+                    common_detokenize(ctx, {cur[i].id}, special, /* remove_space_prefix */ false),
                    cur[i].p
                });
            }