From 5e498be6485afd7bcb9b3a8d87f75023e3676e52 Mon Sep 17 00:00:00 2001
From: Kyle Mistele <kyle@mistele.com>
Date: Sat, 27 Jan 2024 00:00:30 -0600
Subject: [PATCH] doc: add information about running with docker to the server
 README

---
 examples/server/README.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/examples/server/README.md b/examples/server/README.md
index fd3034b99..e6cbdefe4 100644
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -65,6 +65,14 @@ server.exe -m models\7B\ggml-model.gguf -c 2048
 The above command will start a server that by default listens on `127.0.0.1:8080`.
 You can consume the endpoints with Postman or NodeJS with axios library. You can visit the web front end at the same url.
 
+### Docker:
+```bash
+docker run -p 8080:8080 -v /path/to/models:/models ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080
+
+# or, with CUDA:
+docker run -p 8080:8080 -v /path/to/models:/models --gpus all ggerganov/llama.cpp:server -m models/7B/ggml-model.gguf -c 512 --host 0.0.0.0 --port 8080 --n-gpu-layers 1
+```
+
 ## Testing with CURL
 
 Using [curl](https://curl.se/). On Windows `curl.exe` should be available in the base OS.