llama : improve infill support and special token detection (#9798)

* llama : improve infill support

ggml-ci

* llama : add more FIM token strings

ggml-ci

* server : update prompt on slot restore (#9800)

* gguf : deprecate old FIM token KVs
This commit is contained in:
Georgi Gerganov 2024-10-12 08:21:51 +03:00 committed by GitHub
parent 943d20b411
commit 11ac9800af
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 601 additions and 427 deletions

View file

@ -526,7 +526,7 @@ Takes a prefix and a suffix and returns the predicted completion as stream.
- `input_prefix`: Set the prefix of the code to infill.
- `input_suffix`: Set the suffix of the code to infill.
It also accepts all the options of `/completion` except `stream` and `prompt`.
It also accepts all the options of `/completion`.
### **GET** `/props`: Get server global properties.