server : allow using LoRA adapters per-request (#10994)

* slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-02 15:05:18 +01:00 · 2025-01-02 15:05:18 +01:00 · 0da5d86026
commit 0da5d86026
parent a45433ba20
8 changed files with 235 additions and 59 deletions
--- a/examples/server/tests/requirements.txt
+++ b/examples/server/tests/requirements.txt
@ -5,3 +5,4 @@ numpy~=1.26.4
 openai~=1.55.3
 prometheus-client~=0.20.0
 requests~=2.32.3
+wget~=3.2