* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style
		
			
				
	
	
		
			36 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
			
		
		
	
	
			36 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
| @llama.cpp
 | |
| @lora
 | |
| Feature: llama.cpp server
 | |
| 
 | |
|   Background: Server startup
 | |
|     Given a server listening on localhost:8080
 | |
|     And   a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
 | |
|     And   a model file stories15M_MOE-F16.gguf
 | |
|     And   a model alias stories15M_MOE
 | |
|     And   a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf
 | |
|     And   42 as server seed
 | |
|     And   1024 as batch size
 | |
|     And   1024 as ubatch size
 | |
|     And   2048 KV cache size
 | |
|     And   64 max tokens to predict
 | |
|     And   0.0 temperature
 | |
|     Then  the server is starting
 | |
|     Then  the server is healthy
 | |
| 
 | |
|   Scenario: Completion LoRA disabled
 | |
|     Given switch off lora adapter 0
 | |
|     Given a prompt:
 | |
|     """
 | |
|     Look in thy glass
 | |
|     """
 | |
|     And   a completion request with no api error
 | |
|     Then  64 tokens are predicted matching little|girl|three|years|old
 | |
| 
 | |
|   Scenario: Completion LoRA enabled
 | |
|     Given switch on lora adapter 0
 | |
|     Given a prompt:
 | |
|     """
 | |
|     Look in thy glass
 | |
|     """
 | |
|     And   a completion request with no api error
 | |
|     Then  64 tokens are predicted matching eye|love|glass|sun
 |