Server: add tests for batch size, different seeds (#6950)
This commit is contained in:
parent
1613ef8d8e
commit
3ea0d36000
2 changed files with 155 additions and 79 deletions
|
@ -7,44 +7,16 @@ Feature: Results
|
|||
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
|
||||
And a model file test-model-00001-of-00003.gguf
|
||||
And 128 as batch size
|
||||
And 256 KV cache size
|
||||
And 1024 KV cache size
|
||||
And 128 max tokens to predict
|
||||
|
||||
Scenario Outline: Multi users completion
|
||||
Given <n_slots> slots
|
||||
And continuous batching
|
||||
|
||||
Scenario Outline: consistent results with same seed
|
||||
Given <n_slots> slots
|
||||
Then the server is starting
|
||||
Then the server is healthy
|
||||
|
||||
Given 42 as seed
|
||||
And a prompt:
|
||||
"""
|
||||
Write a very long story about AI.
|
||||
"""
|
||||
|
||||
Given 42 as seed
|
||||
And a prompt:
|
||||
"""
|
||||
Write a very long story about AI.
|
||||
"""
|
||||
|
||||
Given 42 as seed
|
||||
And a prompt:
|
||||
"""
|
||||
Write a very long story about AI.
|
||||
"""
|
||||
|
||||
Given 42 as seed
|
||||
And a prompt:
|
||||
"""
|
||||
Write a very long story about AI.
|
||||
"""
|
||||
|
||||
Given 42 as seed
|
||||
And a prompt:
|
||||
"""
|
||||
Write a very long story about AI.
|
||||
"""
|
||||
Given 4 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
|
||||
|
||||
Given concurrent completion requests
|
||||
Then the server is busy
|
||||
|
@ -55,3 +27,55 @@ Feature: Results
|
|||
| n_slots |
|
||||
| 1 |
|
||||
| 2 |
|
||||
|
||||
Scenario Outline: different results with different seed
|
||||
Given <n_slots> slots
|
||||
Then the server is starting
|
||||
Then the server is healthy
|
||||
|
||||
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
|
||||
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 43
|
||||
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 44
|
||||
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 45
|
||||
|
||||
Given concurrent completion requests
|
||||
Then the server is busy
|
||||
Then the server is idle
|
||||
And all slots are idle
|
||||
Then all predictions are different
|
||||
Examples:
|
||||
| n_slots |
|
||||
| 1 |
|
||||
| 2 |
|
||||
|
||||
Scenario Outline: consistent results with same seed and varying batch size
|
||||
Given 4 slots
|
||||
And <temp> temperature
|
||||
# And 0 as draft
|
||||
Then the server is starting
|
||||
Then the server is healthy
|
||||
|
||||
Given 1 prompts "Write a very long story about AI." with seed 42
|
||||
And concurrent completion requests
|
||||
# Then the server is busy # Not all slots will be utilized.
|
||||
Then the server is idle
|
||||
And all slots are idle
|
||||
|
||||
Given <n_parallel> prompts "Write a very long story about AI." with seed 42
|
||||
And concurrent completion requests
|
||||
# Then the server is busy # Not all slots will be utilized.
|
||||
Then the server is idle
|
||||
And all slots are idle
|
||||
|
||||
Then all predictions are equal
|
||||
Examples:
|
||||
| n_parallel | temp |
|
||||
| 1 | 0.0 |
|
||||
| 2 | 0.0 |
|
||||
| 4 | 0.0 |
|
||||
| 1 | 1.0 |
|
||||
# FIXME: These tests fail on master. The problem seems to be the unified KV cache.
|
||||
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
|
||||
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
|
||||
# | 2 | 1.0 |
|
||||
# | 4 | 1.0 |
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue