server : add option to time limit the generation phase (#9865)

ggml-ci
This commit is contained in:
Georgi Gerganov 2024-10-12 16:14:27 +03:00 committed by GitHub
parent 1bde94dd02
commit edc265661c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 46 additions and 6 deletions

View file

@ -374,6 +374,8 @@ node index.js
`min_keep`: If greater than 0, force samplers to return N possible tokens at minimum. Default: `0`
`t_max_predict_ms`: Set a time limit in milliseconds for the prediction (a.k.a. text-generation) phase. The timeout will trigger if the generation takes more than the specified time (measured since the first token was generated) and if a new-line character has already been generated. Useful for FIM applications. Default: `0`, which is disabled.
`image_data`: An array of objects to hold base64-encoded image `data` and its `id`s to be reference in `prompt`. You can determine the place of the image in the prompt as in the following: `USER:[img-12]Describe the image in detail.\nASSISTANT:`. In this case, `[img-12]` will be replaced by the embeddings of the image with id `12` in the following `image_data` array: `{..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}`. Use `image_data` only with multimodal models, e.g., LLaVA.
`id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1`