llama : sanitize invalid tokens (#9357)

* common : do not add null tokens during warmup

ggml-ci

* llama : check that the input tokens are valid

ggml-ci

* tests : fix batch size of bert model

ggml-ci
This commit is contained in:
Georgi Gerganov 2024-09-08 00:33:13 +03:00 committed by GitHub
parent e536426ded
commit faf69d4237
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 26 additions and 4 deletions

View file

@ -9,8 +9,11 @@ Feature: llama.cpp server
And a model alias bert-bge-small
And 42 as server seed
And 2 slots
And 1024 as batch size
And 1024 as ubatch size
# the bert-bge-small model has context size of 512
# since the generated prompts are as big as the batch size, we need to set the batch size to 512
# ref: https://huggingface.co/BAAI/bge-small-en-v1.5/blob/5c38ec7c405ec4b44b94cc5a9bb96e735b38267a/config.json#L20
And 512 as batch size
And 512 as ubatch size
And 2048 KV cache size
And embeddings extraction
Then the server is starting