Added readme for server example

2023-05-03 10:38:35 -06:00 · 2023-05-03 10:38:35 -06:00 · 197bb66339
commit 197bb66339
parent f01c6cbc7e
2 changed files with 101 additions and 1 deletions
--- a/examples/server/README.md
+++ b/examples/server/README.md
@ -0,0 +1,100 @@
+## llama.cpp/example/server
+
+This example allow you to have a llama.cpp http server to interact from a web page or consume the API.
+
+It doesn't require external dependencies.
+
+## Limitations:
+* Just tested in Windows and Linux
+* Only CMake build.
+* Only one context at a time.
+* Just vicuna support for interaction.
+
+## Endpoints
+
+You can interact with this API Endpoints.
+
+`POST hostname:port/setting-context`
+
+`POST hostname:port/set-message`
+
+`GET hostname:port/completion`
+
+## Usage
+### Get Code
+```bash
+git clone https://github.com/FSSRepo/llama.cpp.git
+cd llama.cpp
+```
+### Build
+```bash
+mkdir build
+cd build
+cmake ..
+cmake --build . --config Release
+```
+### Run
+Model tested: [Vicuna](https://huggingface.co/chharlesonfire/ggml-vicuna-7b-4bit/blob/main/ggml-vicuna-7b-q4_0.bin)
+```bash
+server -m ggml-vicuna-7b-q4_0.bin --keep -1 --ctx_size 2048
+```
+
+### Node JS Test the endpoints
+
+You need to have [Node.js](https://nodejs.org/en) installed.
+
+```bash
+mkdir llama-client
+cd llama-client
+npm init
+npm install axios
+```
+
+Create a index.js file and put inside this:
+```javascript
+const axios = require('axios');
+
+async function Test() {
+    let result = await axios.post("http://127.0.0.1:8080/setting-context", {
+        context: [
+            { role: "system", content: "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions." },
+            { role: "user", content: "Hello, Assistant." },
+            { role: "assistant", content: "Hello. How may I help you today?" },
+            { role: "user", content: "Please tell me the largest city in Europe." },
+            { role: "assistant", content: "Sure. The largest city in Europe is Moscow, the capital of Russia." }
+        ],
+        batch_size: 64,
+        temperature: 0.2,
+        top_k: 40,
+        top_p: 0.9,
+        n_predict: 2048,
+        threads: 5
+    });
+    result = await axios.post("http://127.0.0.1:8080/set-message", {
+        message: ' What is linux?'
+    });
+    if(result.data.can_inference) {
+        result = await axios.get("http://127.0.0.1:8080/completion?stream=true", { responseType: 'stream' });
+        result.data.on('data', (data) => {
+            // token by token completion
+            let dat = JSON.parse(data.toString());
+            process.stdout.write(dat.content);
+        });
+
+        /*
+        Wait the entire completion (takes long time for response)
+
+        result = await axios.get("http://127.0.0.1:8080/completion");
+        console.log(result.data.content);
+        */
+    }
+}
+
+Test();
+```
+
+And run it:
+
+```bash
+node .
+```
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@ -730,7 +730,7 @@ int main(int argc, char ** argv) {
                { "content", completion.c_str() },
                { "total_tokens", llama->tokens_completion }
              };
-            printf("\nCompletion finished: %i tokens predicted.\n", llama->tokens_completion);
+            printf("\rCompletion finished: %i tokens predicted.\n", llama->tokens_completion);
            res.set_content(data.dump(), "application/json");
            }
  });