Server: add tests for batch size, different seeds (#6950)

2024-05-01 17:52:55 +02:00 · 2024-05-01 17:52:55 +02:00 · 3ea0d36000
commit 3ea0d36000
parent 1613ef8d8e
2 changed files with 155 additions and 79 deletions
--- a/examples/server/tests/features/results.feature
+++ b/examples/server/tests/features/results.feature
@ -7,44 +7,16 @@ Feature: Results
    And   a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
    And   a model file test-model-00001-of-00003.gguf
    And   128 as batch size
-    And   256 KV cache size
+    And   1024 KV cache size
    And   128 max tokens to predict
-
-  Scenario Outline: Multi users completion
-    Given <n_slots> slots
    And   continuous batching
+
+  Scenario Outline: consistent results with same seed
+    Given <n_slots> slots
    Then  the server is starting
    Then  the server is healthy

-    Given 42 as seed
-    And a prompt:
-      """
-      Write a very long story about AI.
-      """
-
-    Given 42 as seed
-    And a prompt:
-      """
-      Write a very long story about AI.
-      """
-
-    Given 42 as seed
-    And a prompt:
-      """
-      Write a very long story about AI.
-      """
-
-    Given 42 as seed
-    And a prompt:
-      """
-      Write a very long story about AI.
-      """
-
-    Given 42 as seed
-    And a prompt:
-      """
-      Write a very long story about AI.
-      """
+    Given 4 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42

    Given concurrent completion requests
    Then the server is busy
@ -55,3 +27,55 @@ Feature: Results
      | n_slots |
      | 1       |
      | 2       |
+
+  Scenario Outline: different results with different seed
+    Given <n_slots> slots
+    Then  the server is starting
+    Then  the server is healthy
+
+    Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
+    Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 43
+    Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 44
+    Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 45
+
+    Given concurrent completion requests
+    Then the server is busy
+    Then the server is idle
+    And  all slots are idle
+    Then all predictions are different
+    Examples:
+      | n_slots |
+      | 1       |
+      | 2       |
+
+  Scenario Outline: consistent results with same seed and varying batch size
+    Given 4 slots
+    And   <temp> temperature
+    # And   0 as draft
+    Then  the server is starting
+    Then  the server is healthy
+
+    Given 1 prompts "Write a very long story about AI." with seed 42
+    And   concurrent completion requests
+    # Then the server is busy # Not all slots will be utilized.
+    Then  the server is idle
+    And   all slots are idle
+
+    Given <n_parallel> prompts "Write a very long story about AI." with seed 42
+    And   concurrent completion requests
+    # Then the server is busy # Not all slots will be utilized.
+    Then the server is idle
+    And  all slots are idle
+
+    Then all predictions are equal
+    Examples:
+      | n_parallel | temp |
+      |  1         | 0.0  |
+      |  2         | 0.0  |
+      |  4         | 0.0  |
+      |  1         | 1.0  |
+      # FIXME: These tests fail on master. The problem seems to be the unified KV cache.
+      # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
+      # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
+      # |  2         | 1.0  |
+      # |  4         | 1.0  |