llama : add support for Tekken pre-tokenizer (#8579)

* llama : Added support for Tekken pre-tokenizer (#8577) Removed uneeded `vocab.tokenizer_clean_spaces` assignment * llama : fix order of pre-tokenizers * * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces * Updated chkhsh for Tekken tokenizer --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-20 09:43:51 -04:00 · 2024-07-20 09:43:51 -04:00 · 940362224d
commit 940362224d
parent 69b9945b44
4 changed files with 18 additions and 0 deletions
--- a/include/llama.h
+++ b/include/llama.h
@ -92,6 +92,7 @@ extern "C" {
        LLAMA_VOCAB_PRE_TYPE_CHATGLM4       = 17,
        LLAMA_VOCAB_PRE_TYPE_VIKING         = 18,
        LLAMA_VOCAB_PRE_TYPE_JAIS           = 19,
+        LLAMA_VOCAB_PRE_TYPE_TEKKEN         = 20,
    };

    // note: these values should be synchronized with ggml_rope