SimpleChat:Add n_predict (equiv max_tokens) for llamacpp server
The /completions endpoint of examples/server doesnt take max_tokens, instead it takes the internal n_predict, for now add the same on the client side, maybe later add max_tokens to /completions endpoint handling.
This commit is contained in:
parent
8f172b9070
commit
b3afd6c86a
2 changed files with 5 additions and 0 deletions
|
@ -174,6 +174,10 @@ Set max_tokens to 1024, so that a relatively large previous reponse doesnt eat u
|
||||||
available wrt next query-response. However dont forget that the server when started should
|
available wrt next query-response. However dont forget that the server when started should
|
||||||
also be started with a model context size of 1k or more, to be on safe side.
|
also be started with a model context size of 1k or more, to be on safe side.
|
||||||
|
|
||||||
|
The /completions endpoint of examples/server doesnt take max_tokens, instead it takes the
|
||||||
|
internal n_predict, for now add the same here on the client side, maybe later add max_tokens
|
||||||
|
to /completions endpoint handling code on server side.
|
||||||
|
|
||||||
Frequency and presence penalty fields are set to 1.2 in the set of fields sent to server
|
Frequency and presence penalty fields are set to 1.2 in the set of fields sent to server
|
||||||
along with the user query. So that the model is partly set to try avoid repeating text in
|
along with the user query. So that the model is partly set to try avoid repeating text in
|
||||||
its response.
|
its response.
|
||||||
|
|
|
@ -578,6 +578,7 @@ class Me {
|
||||||
"max_tokens": 1024,
|
"max_tokens": 1024,
|
||||||
"frequency_penalty": 1.2,
|
"frequency_penalty": 1.2,
|
||||||
"presence_penalty": 1.2,
|
"presence_penalty": 1.2,
|
||||||
|
"n_predict": 1024
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue