server: tests: add truncated prompt tests, better kv cache size (#5933)

* server: tests: add truncated prompt tests, better size * server, tests : update regex --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-09 10:30:04 +01:00 · 2024-03-09 10:30:04 +01:00 · fd72d2d2a5
commit fd72d2d2a5
parent c2101a2e90
4 changed files with 81 additions and 23 deletions
--- a/examples/server/tests/features/parallel.feature
+++ b/examples/server/tests/features/parallel.feature
@ -6,8 +6,8 @@ Feature: Parallel
    Given a server listening on localhost:8080
    And   a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
    And   42 as server seed
-    And   512 as batch size
-    And   64 KV cache size
+    And   128 as batch size
+    And   256 KV cache size
    And   2 slots
    And   continuous batching
    Then  the server is starting
@ -76,6 +76,7 @@ Feature: Parallel
      | disabled  | 128       |
      | enabled   | 64        |

+
  Scenario:  Multi users with total number of tokens to predict exceeds the KV Cache size #3969
    Given a prompt:
      """