server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708)

* server: monitoring - add /metrics prometheus compatible endpoint

* server: concurrency issue, when 2 task are waiting for results, only one call thread is notified

* server: metrics - move to a dedicated struct
This commit is contained in:
Pierrick Hymbert 2024-02-25 13:49:43 +01:00 committed by GitHub
parent 1289408817
commit d52d7819b8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 191 additions and 8 deletions

View file

@ -13,6 +13,7 @@ Feature: llama.cpp server
And 1 slots
And embeddings extraction
And 32 server max tokens to predict
And prometheus compatible metrics exposed
Then the server is starting
Then the server is healthy
@ -25,6 +26,7 @@ Feature: llama.cpp server
And <n_predict> max tokens to predict
And a completion request with no api error
Then <n_predicted> tokens are predicted matching <re_content>
And prometheus metrics are exposed
Examples: Prompts
| prompt | n_predict | re_content | n_predicted |