From e18281940ec7be7b32bb15779999b53557b7307b Mon Sep 17 00:00:00 2001 From: Ujjawal Panchal <31011628+Ujjawal-K-Panchal@users.noreply.github.com> Date: Sun, 21 Jul 2024 14:18:16 +0530 Subject: [PATCH] docfix: server readme: quantum models -> quantized models. --- examples/server/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/server/README.md b/examples/server/README.md index e477d1501..93dcdaa58 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -5,7 +5,7 @@ Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/ Set of LLM REST APIs and a simple web front end to interact with llama.cpp. **Features:** - * LLM inference of F16 and quantum models on GPU and CPU + * LLM inference of F16 and quantized models on GPU and CPU * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes * Parallel decoding with multi-user support * Continuous batching