update help of llama-bench

This commit is contained in:
jianyuzh 2024-02-02 11:34:41 +08:00
parent 3a59a81eab
commit 796d91af35

View file

@ -23,19 +23,23 @@ usage: ./llama-bench [options]
options: options:
-h, --help -h, --help
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf) -m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
-p, --n-prompt <n> (default: 512) -p, --n-prompt <n> (default: 512)
-n, --n-gen <n> (default: 128) -n, --n-gen <n> (default: 128)
-b, --batch-size <n> (default: 512) -b, --batch-size <n> (default: 512)
--memory-f32 <0|1> (default: 0) -ctk <t>, --cache-type-k <t> (default: f16)
-t, --threads <n> (default: 16) -ctv <t>, --cache-type-v <t> (default: f16)
-ngl N, --n-gpu-layers <n> (default: 99) -t, --threads <n> (default: 112)
-mg i, --main-gpu <i> (default: 0) -ngl, --n-gpu-layers <n> (default: 99)
-mmq, --mul-mat-q <0|1> (default: 1) -sm, --split-mode <none|layer|row> (default: layer)
-ts, --tensor_split <ts0/ts1/..> -mg, --main-gpu <i> (default: 0)
-r, --repetitions <n> (default: 5) -nkvo, --no-kv-offload <0|1> (default: 0)
-o, --output <csv|json|md|sql> (default: md) -mmp, --mmap <0|1> (default: 1)
-v, --verbose (default: 0) -mmq, --mul-mat-q <0|1> (default: 1)
-ts, --tensor_split <ts0/ts1/..> (default: 0)
-r, --repetitions <n> (default: 5)
-o, --output <csv|json|md|sql> (default: md)
-v, --verbose (default: 0)
Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times. Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
``` ```
@ -51,6 +55,10 @@ Each test is repeated the number of times given by `-r`, and the results are ave
For a description of the other options, see the [main example](../main/README.md). For a description of the other options, see the [main example](../main/README.md).
Note:
- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`.
## Examples ## Examples
### Text generation with different models ### Text generation with different models