server: tests: keep only the PHI-2 test

This commit is contained in:
Pierrick HYMBERT 2024-03-02 20:53:00 +01:00
parent 2cdd21e26b
commit a6ea72541f

View file

@ -7,8 +7,7 @@ Feature: Passkey / Self-extend with context shift
Given a server listening on localhost:8080 Given a server listening on localhost:8080
# Generates a long text of junk and inserts a secret passkey number inside it. # Generates a long text of junk and inserts a secret passkey number inside it.
# We process the entire prompt using batches of n_batch and shifting the cache # Then we query the LLM for the secret passkey.
# when it is full and then we query the LLM for the secret passkey.
# see #3856 and #4810 # see #3856 and #4810
Scenario Outline: Passkey Scenario Outline: Passkey
Given a model file <hf_file> from HF repo <hf_repo> Given a model file <hf_file> from HF repo <hf_repo>
@ -17,6 +16,7 @@ Feature: Passkey / Self-extend with context shift
And <n_predicted> server max tokens to predict And <n_predicted> server max tokens to predict
And 42 as seed And 42 as seed
And <n_ctx> KV cache size And <n_ctx> KV cache size
And 1 slots
And <n_ga> group attention factor to extend context size through self-extend And <n_ga> group attention factor to extend context size through self-extend
And <n_ga_w> group attention width to extend context size through self-extend And <n_ga_w> group attention width to extend context size through self-extend
# Can be override with N_GPU_LAYERS # Can be override with N_GPU_LAYERS
@ -47,7 +47,7 @@ Feature: Passkey / Self-extend with context shift
Examples: Examples:
| hf_repo | hf_file | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content | | hf_repo | hf_file | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content |
| TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 16 | 512 | 250 | 50 | 42 | 1 | 42 | | TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 4 | 512 | 250 | 50 | 42 | 1 | 42 |
| TheBloke/Llama-2-7B-GGUF | llama-2-7b.Q2_K.gguf | 4096 | 3 | 16384 | 512 | 4 | 512 | 500 | 300 | 1234 | 5 | 1234 | #| TheBloke/Llama-2-7B-GGUF | llama-2-7b.Q2_K.gguf | 4096 | 3 | 16384 | 512 | 4 | 512 | 500 | 300 | 1234 | 5 | 1234 |
| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 4096 | 2 | 16384 | 512 | 4 | 512 | 500 | 100 | 0987 | 5 | 0987 | #| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 32768 | 2 | 16384 | 512 | 4 | 512 | 500 | 100 | 0987 | 5 | 0987 |