diff --git a/examples/server/tests/features/slotsave.feature b/examples/server/tests/features/slotsave.feature index 37eefd5c0..9f1e58d23 100644 --- a/examples/server/tests/features/slotsave.feature +++ b/examples/server/tests/features/slotsave.feature @@ -1,5 +1,5 @@ @llama.cpp -@server +@slotsave Feature: llama.cpp server slot management Background: Server startup @@ -15,34 +15,44 @@ Feature: llama.cpp server slot management Then the server is healthy Scenario: Save and Restore Slot + # First prompt in slot 1 should be fully processed Given a user prompt "What is the capital of France?" And using slot id 1 And a completion request with no api error - Then 24 tokens are predicted matching Lily + Then 24 tokens are predicted matching (Lily|cake) And 22 prompt tokens are processed When the slot 1 is saved with filename "slot1.bin" Then the server responds with status code 200 + # Since we have cache, this should only process the last tokens Given a user prompt "What is the capital of Germany?" And a completion request with no api error Then 24 tokens are predicted matching Thank And 7 prompt tokens are processed - When the slot 2 is restored with filename "slot1.bin" + # Loading the original cache into slot 0, + # we should only be processing 1 prompt token and get the same output + When the slot 0 is restored with filename "slot1.bin" Then the server responds with status code 200 Given a user prompt "What is the capital of France?" - And using slot id 2 + And using slot id 0 And a completion request with no api error - Then 24 tokens are predicted matching Lily + Then 24 tokens are predicted matching (Lily|cake) + And 1 prompt tokens are processed + # For verification that slot 1 was not corrupted during slot 0 load, same thing + Given a user prompt "What is the capital of Germany?" + And using slot id 1 + And a completion request with no api error + Then 24 tokens are predicted matching Thank And 1 prompt tokens are processed Scenario: Erase Slot Given a user prompt "What is the capital of France?" And using slot id 1 And a completion request with no api error - Then 24 tokens are predicted matching Lily + Then 24 tokens are predicted matching (Lily|cake) And 22 prompt tokens are processed When the slot 1 is erased Then the server responds with status code 200 Given a user prompt "What is the capital of France?" And a completion request with no api error - Then 24 tokens are predicted matching Lily + Then 24 tokens are predicted matching (Lily|cake) And 22 prompt tokens are processed