server : allow using LoRA adapters per-request (#10994)
* slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
parent
a45433ba20
commit
0da5d86026
8 changed files with 235 additions and 59 deletions
|
@ -44,6 +44,12 @@ To run with stdout/stderr display in real time (verbose output, but useful for d
|
|||
DEBUG=1 ./tests.sh -s -v -x
|
||||
```
|
||||
|
||||
To run single test unit:
|
||||
|
||||
```shell
|
||||
./tests.sh unit/test_{name of test case here}.py -v -x
|
||||
```
|
||||
|
||||
Hint: You can compile and run test in single command, useful for local developement:
|
||||
|
||||
```shell
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue