add cake

2024-03-30 23:23:21 +08:00 · 2024-03-30 23:23:21 +08:00 · d38eef468f
commit d38eef468f
parent 60f685ff7a
1 changed files with 17 additions and 7 deletions
--- a/examples/server/tests/features/slotsave.feature
+++ b/examples/server/tests/features/slotsave.feature
@ -1,5 +1,5 @@
@llama.cpp
-@server
+@slotsave
 Feature: llama.cpp server slot management

  Background: Server startup
@ -15,34 +15,44 @@ Feature: llama.cpp server slot management
    Then  the server is healthy

  Scenario: Save and Restore Slot
+    # First prompt in slot 1 should be fully processed
    Given a user prompt "What is the capital of France?"
    And   using slot id 1
    And   a completion request with no api error
-    Then  24 tokens are predicted matching Lily
+    Then  24 tokens are predicted matching (Lily|cake)
    And   22 prompt tokens are processed
    When  the slot 1 is saved with filename "slot1.bin"
    Then  the server responds with status code 200
+    # Since we have cache, this should only process the last tokens
    Given a user prompt "What is the capital of Germany?"
    And   a completion request with no api error
    Then  24 tokens are predicted matching Thank
    And   7 prompt tokens are processed
-    When  the slot 2 is restored with filename "slot1.bin"
+    # Loading the original cache into slot 0,
+    # we should only be processing 1 prompt token and get the same output
+    When  the slot 0 is restored with filename "slot1.bin"
    Then  the server responds with status code 200
    Given a user prompt "What is the capital of France?"
-    And   using slot id 2
+    And   using slot id 0
    And   a completion request with no api error
-    Then  24 tokens are predicted matching Lily
+    Then  24 tokens are predicted matching (Lily|cake)
+    And   1 prompt tokens are processed
+    # For verification that slot 1 was not corrupted during slot 0 load, same thing
+    Given a user prompt "What is the capital of Germany?"
+    And   using slot id 1
+    And   a completion request with no api error
+    Then  24 tokens are predicted matching Thank
    And   1 prompt tokens are processed

  Scenario: Erase Slot
    Given a user prompt "What is the capital of France?"
    And   using slot id 1
    And   a completion request with no api error
-    Then  24 tokens are predicted matching Lily
+    Then  24 tokens are predicted matching (Lily|cake)
    And   22 prompt tokens are processed
    When  the slot 1 is erased
    Then  the server responds with status code 200
    Given a user prompt "What is the capital of France?"
    And   a completion request with no api error
-    Then  24 tokens are predicted matching Lily
+    Then  24 tokens are predicted matching (Lily|cake)
    And   22 prompt tokens are processed