Update examples/server/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Pierrick Hymbert 2024-02-25 21:43:34 +01:00 committed by GitHub
parent 42d781e264
commit e647ed4ada
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -5,14 +5,13 @@ Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/
Set of LLM REST APIs and a simple web front end to interact with llama.cpp. Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
**Features:** **Features:**
* SOTA LLM inference performance with GGUF quantized models on GPU and CPU * LLM inference of F16 and quantum models on GPU and CPU
* [OpenAI API](https://github.com/openai/openai-openapi) compatibles chat completions and embeddings routes * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
* Parallel decoding with multi-user support
* Continuous batching * Continuous batching
* KV cache attention * Multimodal (wip)
* Embedding * API key security
* Multimodal * Monitoring endpoints
* API Key security
* Production ready monitoring endpoints
The project is under active development, and we are [looking for feedback and contributors](https://github.com/ggerganov/llama.cpp/issues/4216). The project is under active development, and we are [looking for feedback and contributors](https://github.com/ggerganov/llama.cpp/issues/4216).