fix(doc): update container tag from server to server-cuda for README example on running server container with CUDA

This commit is contained in:
Kyle Mistele 2024-01-27 12:03:29 -06:00
parent 7298e97947
commit 734cf1096b

View file

@ -70,7 +70,7 @@ You can consume the endpoints with Postman or NodeJS with axios library. You can
docker run -p 8080:8080 -v /path/to/models:/models ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080
# or, with CUDA:
docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99
docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server-cuda -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 99
```
## Testing with CURL