llama.cpp

Author	SHA1	Message	Date
Xuan Son Nguyen	2d51c459c6	code style changes on test	2025-01-30 11:52:31 +01:00
Olivier Chafik	8ef37a3c07	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-30 10:50:02 +00:00
Olivier Chafik	3d804dec76	sync: minja (#11499 )	2025-01-30 10:30:27 +00:00
mgroeber9110	ffd0821c57	vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496 )	2025-01-30 12:10:59 +02:00
Daniel Bevenius	4314e56c4f	server : use lambda instead of std::bind (#11507 ) This commit replaces the two usages of `std::bind` in favor of lambdas for the callback functions for `callback_new_task` and `callback_update_slots`. The motivation for this changes is consistency with the rest of the code in server.cpp (lambdas are used for all other callbacks/handlers). Also lambdas are more readable (perhaps this is subjective) but also they are recommended over `std::bind` in modern C++. Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md	2025-01-30 11:05:00 +01:00
Isaac McFadyen	496e5bf46b	server : (docs) added response format for /apply-template [no ci] (#11503 )	2025-01-30 10:11:53 +01:00
Guspan Tanadi	7919256c57	readme : reference examples relative links (#11505 )	2025-01-30 06:58:02 +01:00
ochafik	9591af1fc5	increase http timeout to 12	2025-01-30 04:50:59 +00:00
ochafik	7635912f73	llama 3.2 1b now fails the weather tool call?	2025-01-30 04:49:52 +00:00
ochafik	b831a6e0d3	rm unused llama_param	2025-01-30 04:49:02 +00:00
Daniel Bevenius	e0449763a4	server : update json snippets in README.md [no ci] (#11492 ) This commit updates some of JSON snippets in README.md file and removes the `json` language tag from the code blocks. The motivation for this changes is that if there is invalid json in a code snippet these are highlighted in red which can make it somewhat difficult to read and can be a little distracting.	2025-01-30 05:48:14 +01:00
ochafik	18450e690f	debug logs are back	2025-01-30 04:34:14 +00:00
ochafik	81547e6f9b	nits	2025-01-30 04:20:06 +00:00
ochafik	f8e14bffc3	split chat handler vs. parser around enum again	2025-01-30 04:11:05 +00:00
ochafik	590c97931a	Update tests readme + add raw output to verbose log	2025-01-30 00:43:30 +00:00
ochafik	774557cfb4	llama 3.1: allow `{name:` & `{function:` syntax even w/ builtin tools (70B model just likes that!)	2025-01-30 00:43:06 +00:00
ochafik	d86a1ae80d	Unify content + message in server_task_result_cmpl_final (+ avoid string copy)	2025-01-30 00:13:12 +00:00
ochafik	77c60e662e	Avoid passing tools twice in generic handler (now that minja passes them automatically when needed)	2025-01-30 00:09:56 +00:00
ochafik	a810c37c76	Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env)	2025-01-29 23:16:18 +00:00
ochafik	cbecb35619	Add tool call to hot topics	2025-01-29 22:44:46 +00:00
ochafik	64545ac9d5	Somehow /* bad inside block comments, ok fine.	2025-01-29 22:38:52 +00:00
ochafik	2b2456978a	Add cli mode to test-chat to generate template summaries markdown	2025-01-29 22:33:16 +00:00
ochafik	84bc083faf	Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py)	2025-01-29 21:43:14 +00:00
ochafik	bc8a61138f	nits	2025-01-29 21:42:12 +00:00
ochafik	36c776f329	Finish renaming of chat inputs vs. params [skip ci]	2025-01-29 21:29:45 +00:00
ochafik	ed7c622d78	Rename: common/chat.*, common_chat_{inputs -> params}	2025-01-29 21:18:49 +00:00
ochafik	6e676c8030	sync: minja	2025-01-29 20:31:28 +00:00
ochafik	ba27e98582	Unify llama 3.x chat handling again (allow `{"type": "function", "name": ...` prefix)	2025-01-29 19:47:28 +00:00
Nigel Bosch	eb7cf15a80	server : add /apply-template endpoint for additional use cases of Minja functionality (#11489 ) * add /apply-template endpoint to server * remove unnecessary line * add /apply-template documentation * return only "prompt" field in /apply-template * use suggested idea instead of my overly verbose way	2025-01-29 19:45:44 +01:00
ochafik	7b5e0803c8	Move templates/ under models/	2025-01-29 18:16:35 +00:00
ochafik	682026f84b	Create meta-llama-Llama-3.1-8B-Instruct.jinja	2025-01-29 18:09:59 +00:00
ochafik	babdefc4dd	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-29 17:54:57 +00:00
ochafik	0f8af536c9	nits	2025-01-29 17:50:44 +00:00
ochafik	77dd67c28c	tool-calls: disable crashing tests	2025-01-29 17:36:18 +00:00
Rémy Oudompheng	66ee4f297c	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 ) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-01-29 18:29:39 +01:00
ochafik	76f6ab19ad	Update test_tool_call.py	2025-01-29 17:04:30 +00:00
ochafik	41eec4622b	rm unused templates, rename one	2025-01-29 16:50:54 +00:00
ochafik	40cc3f2fde	Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call	2025-01-29 16:45:59 +00:00
Olivier Chafik	384f54a135	Split bulk of tool call tests to slow lane	2025-01-29 16:13:45 +00:00
Olivier Chafik	923c805d04	rm dead code + nits	2025-01-29 15:57:58 +00:00
Daniel Bevenius	e51c47b401	server : update auto gen files comments [no ci] (#11484 ) * server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit `91c36c269b` ("server : (web ui) Various improvements, now use vite as bundler (#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.	2025-01-29 16:34:18 +01:00
Jeff Bolz	2711d0215f	vulkan: Catch pipeline creation failure and print an error message (#11436 ) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging	2025-01-29 09:26:50 -06:00
Eric Curtin	f0d4b29edf	Parse https://ollama.com/library/ syntax (#11480 ) People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-29 11:23:10 +00:00
Georgi Gerganov	815857791d	sync : ggml	2025-01-29 11:25:29 +02:00
William Tambellini	1a0e87d291	ggml : add option to not print stack on abort (ggml/1081) * Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-29 11:24:53 +02:00
issixx	d2e518e9b4	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-01-29 11:24:51 +02:00
Daniel Bevenius	b636228c0a	embedding : enable --no-warmup option (#11475 ) This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.	2025-01-29 10:38:54 +02:00
Molly Sophia	325afb370a	llama: fix missing k_cache store for rwkv6qwen2 (#11445 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-01-29 12:07:21 +08:00
ochafik	4a1e8e9f91	refactor test-chat-handler	2025-01-29 04:00:01 +00:00
ochafik	18d5a1b2ca	nits	2025-01-29 02:15:34 +00:00

1 2 3 4 5 ...

4955 commits