From 734cf1096bfc270d650305b21f1594e124ae0d4f Mon Sep 17 00:00:00 2001 From: Kyle Mistele Date: Sat, 27 Jan 2024 12:03:29 -0600 Subject: [PATCH] fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA --- examples/server/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/server/README.md b/examples/server/README.md index 0a6229624..ef8071845 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -70,7 +70,7 @@ You can consume the endpoints with Postman or NodeJS with axios library. You can docker run -p 8080:8080 -v /path/to/models:/models ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 # or, with CUDA: -docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99 +docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server-cuda -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99 ``` ## Testing with CURL