server: bench: minor fixes (#10765)

* server/bench:
- support openAI streaming standard output with [DONE]\n\n
- export k6 raw results in csv
- fix too many tcp idle connection in tcp_wait
- add metric time to emit first token

* server/bench:
- fix when prometheus not started
- wait for server to be ready before starting bench
This commit is contained in:
Pierrick Hymbert 2025-01-02 18:06:12 +01:00 committed by GitHub
parent 0da5d86026
commit 2f0ee84b9b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 39 additions and 15 deletions

View file

@ -6,10 +6,10 @@ Benchmark is using [k6](https://k6.io/).
SSE is not supported by default in k6, you have to build k6 with the [xk6-sse](https://github.com/phymbert/xk6-sse) extension.
Example:
Example (assuming golang >= 1.21 is installed):
```shell
go install go.k6.io/xk6/cmd/xk6@latest
xk6 build master \
$GOPATH/bin/xk6 build master \
--with github.com/phymbert/xk6-sse
```
@ -33,7 +33,7 @@ The server must answer OAI Chat completion requests on `http://localhost:8080/v1
Example:
```shell
server --host localhost --port 8080 \
llama-server --host localhost --port 8080 \
--model ggml-model-q4_0.gguf \
--cont-batching \
--metrics \