server: allow filtering llama server response fields (#10940)
* llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
This commit is contained in:
parent
30caac3a68
commit
09fe2e7613
4 changed files with 63 additions and 1 deletions
|
@ -450,6 +450,8 @@ These words will not be included in the completion, so make sure to add them to
|
|||
|
||||
`post_sampling_probs`: Returns the probabilities of top `n_probs` tokens after applying sampling chain.
|
||||
|
||||
`response_fields`: A list of response fields, for example: `"response_fields": ["content", "generation_settings/n_predict"]`. If the specified field is missing, it will simply be omitted from the response without triggering an error.
|
||||
|
||||
**Response format**
|
||||
|
||||
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue