server : add lora hotswap endpoint (WIP) (#8857)
* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style
This commit is contained in:
parent
641f5dd2a6
commit
1e6f6554aa
9 changed files with 251 additions and 92 deletions
36
examples/server/tests/features/lora.feature
Normal file
36
examples/server/tests/features/lora.feature
Normal file
|
@ -0,0 +1,36 @@
|
|||
@llama.cpp
|
||||
@lora
|
||||
Feature: llama.cpp server
|
||||
|
||||
Background: Server startup
|
||||
Given a server listening on localhost:8080
|
||||
And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
|
||||
And a model file stories15M_MOE-F16.gguf
|
||||
And a model alias stories15M_MOE
|
||||
And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf
|
||||
And 42 as server seed
|
||||
And 1024 as batch size
|
||||
And 1024 as ubatch size
|
||||
And 2048 KV cache size
|
||||
And 64 max tokens to predict
|
||||
And 0.0 temperature
|
||||
Then the server is starting
|
||||
Then the server is healthy
|
||||
|
||||
Scenario: Completion LoRA disabled
|
||||
Given switch off lora adapter 0
|
||||
Given a prompt:
|
||||
"""
|
||||
Look in thy glass
|
||||
"""
|
||||
And a completion request with no api error
|
||||
Then 64 tokens are predicted matching little|girl|three|years|old
|
||||
|
||||
Scenario: Completion LoRA enabled
|
||||
Given switch on lora adapter 0
|
||||
Given a prompt:
|
||||
"""
|
||||
Look in thy glass
|
||||
"""
|
||||
And a completion request with no api error
|
||||
Then 64 tokens are predicted matching eye|love|glass|sun
|
Loading…
Add table
Add a link
Reference in a new issue