server: tests: add truncated prompt tests, better kv cache size (#5933)

* server: tests: add truncated prompt tests, better size

* server, tests : update regex

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Pierrick Hymbert 2024-03-09 10:30:04 +01:00 committed by GitHub
parent c2101a2e90
commit fd72d2d2a5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 81 additions and 23 deletions

View file

@ -6,8 +6,8 @@ Feature: Parallel
Given a server listening on localhost:8080
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
And 42 as server seed
And 512 as batch size
And 64 KV cache size
And 128 as batch size
And 256 KV cache size
And 2 slots
And continuous batching
Then the server is starting
@ -76,6 +76,7 @@ Feature: Parallel
| disabled | 128 |
| enabled | 64 |
Scenario: Multi users with total number of tokens to predict exceeds the KV Cache size #3969
Given a prompt:
"""