server : refactored the task processing logic (#5065)

* server: add llama_server_queue struct

* server: add llama_server_response_event

* server: add comments

* server: move all mutexes away from server.cpp

* server: correct multitask response

* server: only add back deferred tasks when one slot is available

* server: fix a race condition cause by "request_completion"
This commit is contained in:
Xuan Son Nguyen 2024-01-26 13:42:20 +01:00 committed by GitHub
parent 413e7b0559
commit 48c857aa10
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 876 additions and 692 deletions

File diff suppressed because it is too large Load diff