SimpleChat:Add n_predict (equiv max_tokens) for llamacpp server

The /completions endpoint of examples/server doesnt take max_tokens, instead it takes the internal n_predict, for now add the same on the client side, maybe later add max_tokens to /completions endpoint handling.
2024-05-24 23:16:55 +05:30 · 2024-05-24 23:16:55 +05:30 · b3afd6c86a
commit b3afd6c86a
parent 8f172b9070
2 changed files with 5 additions and 0 deletions
--- a/examples/server/public_simplechat/readme.md
+++ b/examples/server/public_simplechat/readme.md
@ -174,6 +174,10 @@ Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat u
 available wrt next query-response. However dont forget that the server when started should
 also be started with a model context size of 1k or more, to be on safe side.

+  The /completions endpoint of examples/server doesnt take max_tokens, instead it takes the
+  internal n_predict, for now add the same here on the client side, maybe later add max_tokens
+  to /completions endpoint handling code on server side.
+
 Frequency and presence penalty fields are set to 1.2 in the set of fields sent to server
 along with the user query. So that the model is partly set to try avoid repeating text in
 its response.
--- a/examples/server/public_simplechat/simplechat.js
+++ b/examples/server/public_simplechat/simplechat.js
@ -578,6 +578,7 @@ class Me {
            "max_tokens": 1024,
            "frequency_penalty": 1.2,
            "presence_penalty": 1.2,
+            "n_predict": 1024
        };
    }