llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	b2d17287aa	update readme section about common model tool call formats ./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null	2025-02-04 14:27:38 +00:00
Olivier Chafik	39c1d8163b	return thoughts in reasoning_content field	2025-02-04 11:37:09 +00:00
ochafik	1f5ec59809	ensure deepseek r1 thoughts parsed even w/o tool calls	2025-02-04 04:48:08 +00:00
ochafik	b6e14a4101	fix mistral expectation	2025-02-04 04:26:49 +00:00
ochafik	812544ab8b	server: check that content is null when we get tool_calls	2025-02-04 04:14:15 +00:00
ochafik	86994db697	fix spaces	2025-02-04 03:47:52 +00:00
ochafik	78b47bb0e9	fix test_calc_result	2025-02-04 03:46:26 +00:00
ochafik	326e7002b3	update test_calc_result	2025-02-04 03:13:13 +00:00
Olivier Chafik	30ea3591c9	update to minja's new api	2025-02-03 23:53:27 +00:00
Olivier Chafik	bc6d910f6d	Merge branch 'master' into r1-toolcall	2025-02-03 23:51:31 +00:00
Olivier Chafik	cde3833239	`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616 ) * tool-call: allow `--jinja --chat-template chatml` * fix double bos issue (drop bos/eos tokens from jinja template) * add missing try catch around jinja parsing to default to chatml * Simplify default chatml logic	2025-02-03 23:49:27 +00:00
Xuan-Son Nguyen	b3451785ac	server : (webui) revert hacky solution from #11626 (#11634 )	2025-02-04 00:10:52 +01:00
Woof Dog	1d1e6a90bc	server : (webui) allow typing and submitting during llm response (#11626 )	2025-02-03 23:16:27 +01:00
Olivier Chafik	c6214ee9d6	rm unneeded vocab	2025-02-03 19:59:50 +00:00
Olivier Chafik	7dc271fb37	tool-calls: add deepseek r1 template + accommodate broken official template slightly better	2025-02-03 19:59:33 +00:00
Olivier Chafik	569610ee77	tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out	2025-02-03 18:57:55 +00:00
Olivier Chafik	4cb0e1d873	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:15:14 +00:00
Daniel Bevenius	5598f475be	server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622 ) This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server code. The motivation for this is that when using a debug build the server would crash when an exception was throws and terminate the server process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set cpp_httplib will not call the exception handler, which would normally return a 500 error to the client. This caused tests to fail when using a debug build. Fixes: https://github.com/ggerganov/llama.cpp/issues/11613	2025-02-03 16:45:38 +01:00
Olivier Chafik	5d18d76b69	fix double bos issue (drop bos/eos tokens from jinja template)	2025-02-03 13:59:16 +00:00
ochafik	a76073cf88	minimize diffs	2025-02-03 10:58:52 +00:00
ochafik	77ae97e7d6	Update test_tool_call.py	2025-02-03 10:28:30 +00:00
mashdragon	d92cb67e37	server : (webui) Fix Shift+Enter handling (#11609 ) * Fix Shift+Enter handling `exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway * build index.html.gz --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-03 10:42:55 +01:00
ochafik	1e9acd2d31	tool-call: allow `--jinja --chat-template chatml`	2025-02-03 04:07:11 +00:00
ochafik	5e6f2a21ae	add deepseek models to server tool call section in readme	2025-02-03 02:44:42 +00:00
ochafik	19bea4ecc3	tell DS R1 not to overthink (weather test)	2025-02-03 02:24:30 +00:00
ochafik	ae9d5812a7	tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world	2025-02-03 02:24:30 +00:00
ochafik	04be723b33	tool-call: fix command-r7b parsing when response is multiline	2025-02-03 02:24:30 +00:00
ochafik	08716281f2	rename tests	2025-02-03 02:24:30 +00:00
ochafik	c80cb30938	update logs	2025-02-03 02:24:30 +00:00
ochafik	28345877e4	server/oai: ensure content is null when there are tool calls	2025-02-03 02:24:30 +00:00
ochafik	130ca222c9	DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode)	2025-02-03 02:24:30 +00:00
ochafik	87de852b7f	pass vocab to common_chat_params_init	2025-02-03 02:24:30 +00:00
Olivier Chafik	bfcce4d693	`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 ) * `tool-call`: support Command R7B (w/ tool_plan return) * `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override * `tool-call`: test cleanup / handle lazy grammar triggers	2025-02-02 09:25:38 +00:00
Olivier Chafik	a83f528688	`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 ) * An empty tool_call_id is better than none! * sync: minja (tool call name optional https://github.com/google/minja/pull/36) * Force-disable parallel_tool_calls if template doesn't support it * More debug logs * Llama 3.x tools: accept / trigger on more varied spaced outputs * Fix empty content for functionary v3.2 tool call * Add proper tool call docs to server README * readme: function calling is supported now * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-31 14:15:25 +00:00
Olivier Chafik	b1bcd309fc	fix stop regression (#11543 )	2025-01-31 13:48:31 +00:00
Olivier Chafik	5783575c9d	Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533 )	2025-01-31 08:24:29 +00:00
Olivier Chafik	4a2b196d03	server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531 )	2025-01-31 10:12:40 +02:00
Daniel Bevenius	a2df2787b3	server : update help metrics processing/deferred (#11512 ) This commit updates the help text for the metrics `requests_processing` and `requests_deferred` to be more grammatically correct. Currently the returned metrics look like this: ```console \# HELP llamacpp:requests_processing Number of request processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of request deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` With this commit, the metrics will look like this: ```console \# HELP llamacpp:requests_processing Number of requests processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of requests deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` This is also consistent with the description of the metrics in the server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).	2025-01-31 06:04:53 +01:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Daniel Bevenius	4314e56c4f	server : use lambda instead of std::bind (#11507 ) This commit replaces the two usages of `std::bind` in favor of lambdas for the callback functions for `callback_new_task` and `callback_update_slots`. The motivation for this changes is consistency with the rest of the code in server.cpp (lambdas are used for all other callbacks/handlers). Also lambdas are more readable (perhaps this is subjective) but also they are recommended over `std::bind` in modern C++. Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md	2025-01-30 11:05:00 +01:00
Isaac McFadyen	496e5bf46b	server : (docs) added response format for /apply-template [no ci] (#11503 )	2025-01-30 10:11:53 +01:00
Daniel Bevenius	e0449763a4	server : update json snippets in README.md [no ci] (#11492 ) This commit updates some of JSON snippets in README.md file and removes the `json` language tag from the code blocks. The motivation for this changes is that if there is invalid json in a code snippet these are highlighted in red which can make it somewhat difficult to read and can be a little distracting.	2025-01-30 05:48:14 +01:00
Nigel Bosch	eb7cf15a80	server : add /apply-template endpoint for additional use cases of Minja functionality (#11489 ) * add /apply-template endpoint to server * remove unnecessary line * add /apply-template documentation * return only "prompt" field in /apply-template * use suggested idea instead of my overly verbose way	2025-01-29 19:45:44 +01:00
Daniel Bevenius	e51c47b401	server : update auto gen files comments [no ci] (#11484 ) * server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit `91c36c269b` ("server : (web ui) Various improvements, now use vite as bundler (#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.	2025-01-29 16:34:18 +01:00
peidaqi	cf8cc856d7	server : Fixed wrong function name in llamacpp server unit test (#11473 ) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True	2025-01-29 00:03:42 +01:00
Xuan Son Nguyen	49b0e3cec4	server : fix cleaning up stream task (#11418 ) * server : fix cleaning up stream task * one more spot	2025-01-25 16:36:44 +01:00
stduhpf	c07e87f38b	server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364 ) * webui : put DeepSeek R1 CoT in a collapsible <details> element * webui: refactor split * webui: don't use regex to split cot and response * webui: format+qol * webui: no loading icon if the model isn't generating * ui fix, add configs * add jsdoc types * only filter </think> for assistant msg * build * update build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-24 09:02:38 +01:00
Xuan Son Nguyen	5845661640	server : add more clean up when cancel_tasks is called (#11340 ) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if	2025-01-23 13:56:05 +01:00
Diego Devesa	12c2bdf2de	server : fix draft context not being released (#11354 )	2025-01-22 17:44:40 +01:00
Jiří Podivín	96f4053934	Adding logprobs to /v1/completions (#11344 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2025-01-22 12:51:32 +01:00

1 2 3 4 5 ...

557 commits