Add more tokenizer tests (#3742)

* Add more tokenizer tests

* Add starcoder

* Update test vocab files

* Restrict bpe tokenizer tests to unicode planes

* Update comment

* Comment cosmetics

* Remove bloom vocab/test
This commit is contained in:
Galunid 2023-10-24 09:17:17 +02:00 committed by GitHub
parent 469c9addef
commit daab3d7f45
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 16 additions and 3 deletions

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.