llama : add remove_space_prefix to llama_detokenize

This commit adds a new parameter to llama_detokenize to remove the leading space before tokens if they have a word boundary character. The motivation for this change is that when llama_server returns completion_propabilities, the tokens are detokenized and currently the leading space for the boundary tokens are removed. With this change llama_server can set remove_space_prefix to false and the leading space will be preserved. Resolves: https://github.com/ggerganov/llama.cpp/issues/11728
2025-02-10 09:47:18 +01:00 · 2025-02-10 09:47:18 +01:00 · cc1fd2fd0d
commit cc1fd2fd0d
parent d7b31a9d84
7 changed files with 35 additions and 24 deletions
--- a/src/llama-vocab.h
+++ b/src/llama-vocab.h
@ -111,11 +111,13 @@ struct llama_vocab {
                         char * text,
                      int32_t   text_len_max,
                         bool   remove_special,
-                         bool   unparse_special) const;
+                         bool   unparse_special,
+                         bool   remove_space_prefix = true) const;

    std::string detokenize(
            const std::vector<llama_token> & tokens,
-                                      bool   special) const;
+                                      bool   special,
+                                      bool   remove_space_prefix = true) const;

    void print_info() const;