llama : support StableLM 2 1.6B (#5052)

* llama : support StableLM 2 1.6B

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra

* convert : use presence of tokenizer.json to determine StableLM tokenizer loader

It's a less arbitrary heuristic than the vocab size.
This commit is contained in:
compilade 2024-01-22 06:21:52 -05:00 committed by GitHub
parent 152d9d05e0
commit d6bd4d46dd
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 79 additions and 46 deletions

View file

@ -73,6 +73,7 @@ let
ps: [
ps.numpy
ps.sentencepiece
ps.tiktoken
ps.torchWithoutCuda
ps.transformers
]