server : clarify /slots endpoint, add is_processing (#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests
This commit is contained in:
Xuan Son Nguyen 2024-11-04 16:33:29 +01:00 committed by GitHub
parent 6a066b9978
commit 9e0ecfb697
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 18 additions and 19 deletions

View file

@ -692,7 +692,10 @@ Given a ChatML-formatted json description in `messages`, it returns the predicte
### GET `/slots`: Returns the current slots processing state
This endpoint can be disabled with `--no-slots`
> [!WARNING]
> This endpoint is intended for debugging and may be modified in future versions. For security reasons, we strongly advise against enabling it in production environments.
This endpoint is disabled by default and can be enabled with `--slots`
If query param `?fail_on_no_slot=1` is set, this endpoint will respond with status code 503 if there is no available slots.
@ -709,6 +712,7 @@ Example:
"grammar": "",
"id": 0,
"ignore_eos": false,
"is_processing": false,
"logit_bias": [],
"min_p": 0.05000000074505806,
"mirostat": 0,
@ -741,7 +745,6 @@ Example:
"temperature"
],
"seed": 42,
"state": 1,
"stop": [
"\n"
],
@ -755,10 +758,6 @@ Example:
]
```
Possible values for `slot[i].state` are:
- `0`: SLOT_STATE_IDLE
- `1`: SLOT_STATE_PROCESSING
### GET `/metrics`: Prometheus compatible metrics exporter
This endpoint is only accessible if `--metrics` is set.