llama.cpp

Author	SHA1	Message	Date
ochafik	cbecb35619	Add tool call to hot topics	2025-01-29 22:44:46 +00:00
ochafik	64545ac9d5	Somehow /* bad inside block comments, ok fine.	2025-01-29 22:38:52 +00:00
ochafik	2b2456978a	Add cli mode to test-chat to generate template summaries markdown	2025-01-29 22:33:16 +00:00
ochafik	84bc083faf	Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py)	2025-01-29 21:43:14 +00:00
ochafik	bc8a61138f	nits	2025-01-29 21:42:12 +00:00
ochafik	36c776f329	Finish renaming of chat inputs vs. params [skip ci]	2025-01-29 21:29:45 +00:00
ochafik	ed7c622d78	Rename: common/chat.*, common_chat_{inputs -> params}	2025-01-29 21:18:49 +00:00
ochafik	6e676c8030	sync: minja	2025-01-29 20:31:28 +00:00
ochafik	ba27e98582	Unify llama 3.x chat handling again (allow `{"type": "function", "name": ...` prefix)	2025-01-29 19:47:28 +00:00
ochafik	7b5e0803c8	Move templates/ under models/	2025-01-29 18:16:35 +00:00
ochafik	682026f84b	Create meta-llama-Llama-3.1-8B-Instruct.jinja	2025-01-29 18:09:59 +00:00
ochafik	babdefc4dd	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-29 17:54:57 +00:00
ochafik	0f8af536c9	nits	2025-01-29 17:50:44 +00:00
ochafik	77dd67c28c	tool-calls: disable crashing tests	2025-01-29 17:36:18 +00:00
Rémy Oudompheng	66ee4f297c	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 ) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-01-29 18:29:39 +01:00
ochafik	76f6ab19ad	Update test_tool_call.py	2025-01-29 17:04:30 +00:00
ochafik	41eec4622b	rm unused templates, rename one	2025-01-29 16:50:54 +00:00
ochafik	40cc3f2fde	Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call	2025-01-29 16:45:59 +00:00
Olivier Chafik	384f54a135	Split bulk of tool call tests to slow lane	2025-01-29 16:13:45 +00:00
Olivier Chafik	923c805d04	rm dead code + nits	2025-01-29 15:57:58 +00:00
Daniel Bevenius	e51c47b401	server : update auto gen files comments [no ci] (#11484 ) * server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit `91c36c269b` ("server : (web ui) Various improvements, now use vite as bundler (#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.	2025-01-29 16:34:18 +01:00
Jeff Bolz	2711d0215f	vulkan: Catch pipeline creation failure and print an error message (#11436 ) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging	2025-01-29 09:26:50 -06:00
Eric Curtin	f0d4b29edf	Parse https://ollama.com/library/ syntax (#11480 ) People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-29 11:23:10 +00:00
Georgi Gerganov	815857791d	sync : ggml	2025-01-29 11:25:29 +02:00
William Tambellini	1a0e87d291	ggml : add option to not print stack on abort (ggml/1081) * Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-29 11:24:53 +02:00
issixx	d2e518e9b4	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-01-29 11:24:51 +02:00
Daniel Bevenius	b636228c0a	embedding : enable --no-warmup option (#11475 ) This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.	2025-01-29 10:38:54 +02:00
Molly Sophia	325afb370a	llama: fix missing k_cache store for rwkv6qwen2 (#11445 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-01-29 12:07:21 +08:00
ochafik	4a1e8e9f91	refactor test-chat-handler	2025-01-29 04:00:01 +00:00
ochafik	18d5a1b2ca	nits	2025-01-29 02:15:34 +00:00
ochafik	47be437356	Text fireworks v2 template	2025-01-29 01:51:07 +00:00
ochafik	4cdbb8c53f	Revert breaking minja change	2025-01-29 01:50:49 +00:00
ochafik	64263910d8	Fix firefunction w/ jinja: requires two variables, use the chat handlers everywhere templates are used	2025-01-29 01:15:44 +00:00
ochafik	d603d067d5	sync: minja	2025-01-28 23:49:04 +00:00
ochafik	4f257550a2	minja: sync on https://github.com/google/minja/pull/33	2025-01-28 23:46:51 +00:00
Emreerdog	794fe23f29	cmake: add hints for locating ggml on Windows using Llama find-package (#11466 )	2025-01-28 19:22:06 -04:00
peidaqi	cf8cc856d7	server : Fixed wrong function name in llamacpp server unit test (#11473 ) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True	2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen	d0c08040b6	ci : fix build CPU arm64 (#11472 ) * ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble	2025-01-29 00:02:56 +01:00
uvos	be5ef7963f	HIP: Supress transformation warning in softmax.cu loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.	2025-01-28 23:06:32 +01:00
Nikita Sarychev	cae9fb4361	HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080 ) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.	2025-01-28 16:42:20 +01:00
ochafik	cad1448ac7	Disable test-chat-handler on win32 like the other grammar-related tests	2025-01-28 14:46:37 +00:00
Eric Curtin	7fee2889e6	Add github protocol pulling and http:// (#11465 ) As pulling protocols to llama-run Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-28 14:45:41 +00:00
ochafik	cd63ba435e	beef up test-chat-handler w/ delta expectations	2025-01-28 14:40:23 +00:00
Nuno	d7d1eccacc	docker: allow installing pip packages system-wide (#11437 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 14:17:25 +00:00
someone13574	4bf3119d61	cmake : don't fail on `GGML_CPU=OFF` (#11457 )	2025-01-28 15:15:34 +01:00
ochafik	ba10b47ae5	Add missing link dep for windows build	2025-01-28 10:52:14 +00:00
ochafik	b5a74d1a24	Simplify parser defs (incremental parsing for streaming will need more thinking)	2025-01-28 10:48:11 +00:00
Nuno	f643120bad	docker: add perplexity and bench commands to full image (#11438 ) Signed-off-by: rare-magma <rare-magma@posteo.eu>	2025-01-28 10:42:32 +00:00
ochafik	ec4aeaf18a	Revert "Allow tool use + streaming" This reverts commit `62717145f7`.	2025-01-28 10:29:17 +00:00
Akarshan Biswas	6e84b0ab8e	SYCL : SOFTMAX F16 mask support and other fixes (#11261 ) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-01-28 09:56:58 +00:00

1 2 3 4 5 ...

4935 commits