server: Add "tokens per second" information in the backend (#10548)

* add cmake rvv support

* add timings

* remove space

* update readme

* fix

* fix code

* remove empty line

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
This commit is contained in:
haopeng 2024-12-02 21:45:54 +08:00 committed by GitHub
parent 991f8aabee
commit 64ed2091b2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 44 additions and 1 deletions

View file

@ -416,6 +416,8 @@ node index.js
`samplers`: The order the samplers should be applied in. An array of strings representing sampler type names. If a sampler is not set, it will not be used. If a sampler is specified more than once, it will be applied multiple times. Default: `["dry", "top_k", "typ_p", "top_p", "min_p", "xtc", "temperature"]` - these are all the available values.
`timings_per_token`: Include prompt processing and text generation speed information in each response. Default: `false`
**Response format**
- Note: When using streaming mode (`stream`), only `content` and `stop` will be returned until end of completion.