This commit is contained in:
Jan Boon 2024-03-30 23:23:21 +08:00
parent 60f685ff7a
commit d38eef468f

View file

@ -1,5 +1,5 @@
@llama.cpp
@server
@slotsave
Feature: llama.cpp server slot management
Background: Server startup
@ -15,34 +15,44 @@ Feature: llama.cpp server slot management
Then the server is healthy
Scenario: Save and Restore Slot
# First prompt in slot 1 should be fully processed
Given a user prompt "What is the capital of France?"
And using slot id 1
And a completion request with no api error
Then 24 tokens are predicted matching Lily
Then 24 tokens are predicted matching (Lily|cake)
And 22 prompt tokens are processed
When the slot 1 is saved with filename "slot1.bin"
Then the server responds with status code 200
# Since we have cache, this should only process the last tokens
Given a user prompt "What is the capital of Germany?"
And a completion request with no api error
Then 24 tokens are predicted matching Thank
And 7 prompt tokens are processed
When the slot 2 is restored with filename "slot1.bin"
# Loading the original cache into slot 0,
# we should only be processing 1 prompt token and get the same output
When the slot 0 is restored with filename "slot1.bin"
Then the server responds with status code 200
Given a user prompt "What is the capital of France?"
And using slot id 2
And using slot id 0
And a completion request with no api error
Then 24 tokens are predicted matching Lily
Then 24 tokens are predicted matching (Lily|cake)
And 1 prompt tokens are processed
# For verification that slot 1 was not corrupted during slot 0 load, same thing
Given a user prompt "What is the capital of Germany?"
And using slot id 1
And a completion request with no api error
Then 24 tokens are predicted matching Thank
And 1 prompt tokens are processed
Scenario: Erase Slot
Given a user prompt "What is the capital of France?"
And using slot id 1
And a completion request with no api error
Then 24 tokens are predicted matching Lily
Then 24 tokens are predicted matching (Lily|cake)
And 22 prompt tokens are processed
When the slot 1 is erased
Then the server responds with status code 200
Given a user prompt "What is the capital of France?"
And a completion request with no api error
Then 24 tokens are predicted matching Lily
Then 24 tokens are predicted matching (Lily|cake)
And 22 prompt tokens are processed