Commit graph

557 commits

Author SHA1 Message Date
Olivier Chafik
b2d17287aa update readme section about common model tool call formats
./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null
2025-02-04 14:27:38 +00:00
Olivier Chafik
39c1d8163b return thoughts in reasoning_content field 2025-02-04 11:37:09 +00:00
ochafik
1f5ec59809 ensure deepseek r1 thoughts parsed even w/o tool calls 2025-02-04 04:48:08 +00:00
ochafik
b6e14a4101 fix mistral expectation 2025-02-04 04:26:49 +00:00
ochafik
812544ab8b server: check that content is null when we get tool_calls 2025-02-04 04:14:15 +00:00
ochafik
86994db697 fix spaces 2025-02-04 03:47:52 +00:00
ochafik
78b47bb0e9 fix test_calc_result 2025-02-04 03:46:26 +00:00
ochafik
326e7002b3 update test_calc_result 2025-02-04 03:13:13 +00:00
Olivier Chafik
30ea3591c9 update to minja's new api 2025-02-03 23:53:27 +00:00
Olivier Chafik
bc6d910f6d Merge branch 'master' into r1-toolcall 2025-02-03 23:51:31 +00:00
Olivier Chafik
cde3833239
tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616)
* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Xuan-Son Nguyen
b3451785ac
server : (webui) revert hacky solution from #11626 (#11634) 2025-02-04 00:10:52 +01:00
Woof Dog
1d1e6a90bc
server : (webui) allow typing and submitting during llm response (#11626) 2025-02-03 23:16:27 +01:00
Olivier Chafik
c6214ee9d6 rm unneeded vocab 2025-02-03 19:59:50 +00:00
Olivier Chafik
7dc271fb37 tool-calls: add deepseek r1 template + accommodate broken official template slightly better 2025-02-03 19:59:33 +00:00
Olivier Chafik
569610ee77 tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out 2025-02-03 18:57:55 +00:00
Olivier Chafik
4cb0e1d873 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 17:15:14 +00:00
Daniel Bevenius
5598f475be
server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)
This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server
code.

The motivation for this is that when using a debug build the server
would crash when an exception was throws and terminate the server
process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set
cpp_httplib will not call the exception handler, which would normally
return a 500 error to the client. This caused tests to fail when using
a debug build.

Fixes: https://github.com/ggerganov/llama.cpp/issues/11613
2025-02-03 16:45:38 +01:00
Olivier Chafik
5d18d76b69 fix double bos issue (drop bos/eos tokens from jinja template) 2025-02-03 13:59:16 +00:00
ochafik
a76073cf88 minimize diffs 2025-02-03 10:58:52 +00:00
ochafik
77ae97e7d6 Update test_tool_call.py 2025-02-03 10:28:30 +00:00
mashdragon
d92cb67e37
server : (webui) Fix Shift+Enter handling (#11609)
* Fix Shift+Enter handling

`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway

* build index.html.gz

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-02-03 10:42:55 +01:00
ochafik
1e9acd2d31 tool-call: allow --jinja --chat-template chatml 2025-02-03 04:07:11 +00:00
ochafik
5e6f2a21ae add deepseek models to server tool call section in readme 2025-02-03 02:44:42 +00:00
ochafik
19bea4ecc3 tell DS R1 not to overthink (weather test) 2025-02-03 02:24:30 +00:00
ochafik
ae9d5812a7 tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world 2025-02-03 02:24:30 +00:00
ochafik
04be723b33 tool-call: fix command-r7b parsing when response is multiline 2025-02-03 02:24:30 +00:00
ochafik
08716281f2 rename tests 2025-02-03 02:24:30 +00:00
ochafik
c80cb30938 update logs 2025-02-03 02:24:30 +00:00
ochafik
28345877e4 server/oai: ensure content is null when there are tool calls 2025-02-03 02:24:30 +00:00
ochafik
130ca222c9 DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode) 2025-02-03 02:24:30 +00:00
ochafik
87de852b7f pass vocab to common_chat_params_init 2025-02-03 02:24:30 +00:00
Olivier Chafik
bfcce4d693
tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)
* `tool-call`: support Command R7B (w/ tool_plan return)

* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override

* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
Olivier Chafik
a83f528688
tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539)
* An empty tool_call_id is better than none!

* sync: minja (tool call name optional https://github.com/google/minja/pull/36)

* Force-disable parallel_tool_calls if template doesn't support it

* More debug logs

* Llama 3.x tools: accept / trigger on more varied spaced outputs

* Fix empty content for functionary v3.2 tool call

* Add proper tool call docs to server README

* readme: function calling *is* supported now

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-31 14:15:25 +00:00
Olivier Chafik
b1bcd309fc
fix stop regression (#11543) 2025-01-31 13:48:31 +00:00
Olivier Chafik
5783575c9d
Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) 2025-01-31 08:24:29 +00:00
Olivier Chafik
4a2b196d03
server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) 2025-01-31 10:12:40 +02:00
Daniel Bevenius
a2df2787b3
server : update help metrics processing/deferred (#11512)
This commit updates the help text for the metrics `requests_processing`
and `requests_deferred` to be more grammatically correct.

Currently the returned metrics look like this:
```console
\# HELP llamacpp:requests_processing Number of request processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of request deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```

With this commit, the metrics will look like this:
```console
\# HELP llamacpp:requests_processing Number of requests processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of requests deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```
This is also consistent with the description of the metrics in the
server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).
2025-01-31 06:04:53 +01:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Daniel Bevenius
4314e56c4f
server : use lambda instead of std::bind (#11507)
This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.

The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.

Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md
2025-01-30 11:05:00 +01:00
Isaac McFadyen
496e5bf46b
server : (docs) added response format for /apply-template [no ci] (#11503) 2025-01-30 10:11:53 +01:00
Daniel Bevenius
e0449763a4
server : update json snippets in README.md [no ci] (#11492)
This commit updates some of JSON snippets in README.md file and
removes the `json` language tag from the code blocks.

The motivation for this changes is that if there is invalid json in a
code snippet these are highlighted in red which can make it somewhat
difficult to read and can be a little distracting.
2025-01-30 05:48:14 +01:00
Nigel Bosch
eb7cf15a80
server : add /apply-template endpoint for additional use cases of Minja functionality (#11489)
* add /apply-template endpoint to server

* remove unnecessary line

* add /apply-template documentation

* return only "prompt" field in /apply-template

* use suggested idea instead of my overly verbose way
2025-01-29 19:45:44 +01:00
Daniel Bevenius
e51c47b401
server : update auto gen files comments [no ci] (#11484)
* server : update auto gen files comments

This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.

The motivation for this change is that `deps.sh` was removed in
Commit 91c36c269b ("server : (web ui)
Various improvements, now use vite as bundler (#10599)").

* squash! server : update auto gen files comments [no ci]

Move comments about file generation to README.md.

* squash! server : update auto gen files comments [no ci]

Remove the comments in server.cpp that mention that information
can be found in the README.md file.
2025-01-29 16:34:18 +01:00
peidaqi
cf8cc856d7
server : Fixed wrong function name in llamacpp server unit test (#11473)
The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
2025-01-29 00:03:42 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task (#11418)
* server : fix cleaning up stream task

* one more spot
2025-01-25 16:36:44 +01:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364)
* webui : put DeepSeek R1 CoT in a collapsible <details> element

* webui: refactor split

* webui: don't use regex to split cot and response

* webui: format+qol

* webui: no loading icon if the model isn't generating

* ui fix, add configs

* add jsdoc types

* only filter </think> for assistant msg

* build

* update build

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called (#11340)
* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if
2025-01-23 13:56:05 +01:00
Diego Devesa
12c2bdf2de
server : fix draft context not being released (#11354) 2025-01-22 17:44:40 +01:00
Jiří Podivín
96f4053934
Adding logprobs to /v1/completions (#11344)
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2025-01-22 12:51:32 +01:00