server: tests: keep only the PHI-2 test
This commit is contained in:
parent
2cdd21e26b
commit
a6ea72541f
1 changed files with 5 additions and 5 deletions
|
@ -7,8 +7,7 @@ Feature: Passkey / Self-extend with context shift
|
||||||
Given a server listening on localhost:8080
|
Given a server listening on localhost:8080
|
||||||
|
|
||||||
# Generates a long text of junk and inserts a secret passkey number inside it.
|
# Generates a long text of junk and inserts a secret passkey number inside it.
|
||||||
# We process the entire prompt using batches of n_batch and shifting the cache
|
# Then we query the LLM for the secret passkey.
|
||||||
# when it is full and then we query the LLM for the secret passkey.
|
|
||||||
# see #3856 and #4810
|
# see #3856 and #4810
|
||||||
Scenario Outline: Passkey
|
Scenario Outline: Passkey
|
||||||
Given a model file <hf_file> from HF repo <hf_repo>
|
Given a model file <hf_file> from HF repo <hf_repo>
|
||||||
|
@ -17,6 +16,7 @@ Feature: Passkey / Self-extend with context shift
|
||||||
And <n_predicted> server max tokens to predict
|
And <n_predicted> server max tokens to predict
|
||||||
And 42 as seed
|
And 42 as seed
|
||||||
And <n_ctx> KV cache size
|
And <n_ctx> KV cache size
|
||||||
|
And 1 slots
|
||||||
And <n_ga> group attention factor to extend context size through self-extend
|
And <n_ga> group attention factor to extend context size through self-extend
|
||||||
And <n_ga_w> group attention width to extend context size through self-extend
|
And <n_ga_w> group attention width to extend context size through self-extend
|
||||||
# Can be override with N_GPU_LAYERS
|
# Can be override with N_GPU_LAYERS
|
||||||
|
@ -47,7 +47,7 @@ Feature: Passkey / Self-extend with context shift
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
| hf_repo | hf_file | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content |
|
| hf_repo | hf_file | n_ctx_train | ngl | n_ctx | n_batch | n_ga | n_ga_w | n_junk | i_pos | passkey | n_predicted | re_content |
|
||||||
| TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 16 | 512 | 250 | 50 | 42 | 1 | 42 |
|
| TheBloke/phi-2-GGUF | phi-2.Q4_K_M.gguf | 2048 | 5 | 8192 | 512 | 4 | 512 | 250 | 50 | 42 | 1 | 42 |
|
||||||
| TheBloke/Llama-2-7B-GGUF | llama-2-7b.Q2_K.gguf | 4096 | 3 | 16384 | 512 | 4 | 512 | 500 | 300 | 1234 | 5 | 1234 |
|
#| TheBloke/Llama-2-7B-GGUF | llama-2-7b.Q2_K.gguf | 4096 | 3 | 16384 | 512 | 4 | 512 | 500 | 300 | 1234 | 5 | 1234 |
|
||||||
| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 4096 | 2 | 16384 | 512 | 4 | 512 | 500 | 100 | 0987 | 5 | 0987 |
|
#| TheBloke/Mixtral-8x7B-v0.1-GGUF | mixtral-8x7b-v0.1.Q2_K.gguf | 32768 | 2 | 16384 | 512 | 4 | 512 | 500 | 100 | 0987 | 5 | 0987 |
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue