server : --n-predict option document and cap to max value (#5549)

* server: document --n-predict

* server: ensure client request cannot override n_predict if set

* server: fix print usage LF in new --n-predict option
This commit is contained in:
Pierrick Hymbert 2024-02-18 17:30:09 +01:00 committed by GitHub
parent 66c1968f7a
commit 36376abe05
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 15 additions and 1 deletions

View file

@ -39,6 +39,7 @@ see https://github.com/ggerganov/llama.cpp/issues/1437
- `--mmproj MMPROJ_FILE`: Path to a multimodal projector file for LLaVA.
- `--grp-attn-n`: Set the group attention factor to extend context size through self-extend(default: 1=disabled), used together with group attention width `--grp-attn-w`
- `--grp-attn-w`: Set the group attention width to extend context size through self-extend(default: 512), used together with group attention factor `--grp-attn-n`
- `-n, --n-predict`: Set the maximum tokens to predict (default: -1)
## Build