diff --git a/examples/server/README.md b/examples/server/README.md index e6cbdefe4..0a6229624 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -70,7 +70,7 @@ You can consume the endpoints with Postman or NodeJS with axios library. You can docker run -p 8080:8080 -v /path/to/models:/models ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 # or, with CUDA: -docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 1 +docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99 ``` ## Testing with CURL