llama.cpp

Author	SHA1	Message	Date
Jeff Bolz	1b598b3058	vulkan: use smaller combined allocations to avoid fragmentation (#11551 )	2025-02-06 07:02:18 +01:00
Charles Duffy	902368a06b	metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690 ) Avoids breakage in nix flake build introduced by `b0569130c5`	2025-02-06 09:52:31 +08:00
Matvey Soloviev	c3db0480bb	readme : add link to Autopen under UIs (#11684 ) Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.	2025-02-06 01:55:25 +01:00
Olivier Chafik	d1a064070f	revert tool example backfill change - command 7rb just needs the right template	2025-02-05 16:33:37 +00:00
Olivier Chafik	994301da12	use existing string_strip	2025-02-05 16:33:16 +00:00
Olivier Chafik	33efcb3c59	Update README.md	2025-02-05 16:20:11 +00:00
Olivier Chafik	098629df15	disable some failing chatml tests	2025-02-05 16:15:19 +00:00
Olivier Chafik	0917e0a80d	fix --think arg env	2025-02-05 16:15:09 +00:00
Olivier Chafik	39b50c37dc	Update README.md	2025-02-05 15:53:48 +00:00
Olivier Chafik	e6d9b52480	align Command R7B w/ --think / reasoning_content behaviour	2025-02-05 15:47:37 +00:00
Olivier Chafik	3841a163ef	fix compiler warning about parens	2025-02-05 13:05:27 +00:00
ochafik	f3e9f8b62a	fix test_thoughts	2025-02-05 12:34:27 +00:00
ochafik	d20c2ce4e7	Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall	2025-02-05 12:16:42 +00:00
ochafik	9d7c3cc51b	--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)	2025-02-05 12:16:37 +00:00
Georgi Gerganov	d774ab3acc	metal : adjust support conditions for norm operators (#11671 ) cont #11659 ggml-ci	2025-02-05 10:57:42 +02:00
Johannes Gäßler	fa62da9b2d	CUDA: support for mat. mul. with ne03 != ne13 (#11656 )	2025-02-05 08:58:31 +01:00
SAMI	1ec208083c	llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644 ) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace	2025-02-05 10:45:40 +03:00
Olivier Chafik	1f1f06aa26	Merge branch 'master' into r1-toolcall	2025-02-05 01:10:45 +00:00
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00
Johannes Gäßler	fd08255d0d	CUDA: non-contiguous (RMS) norm support (#11659 ) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-04 22:21:42 +01:00
fxzjshm	3ec9fd4b77	HIP: force max threads per block to be 1024 (#11621 ) Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <fxzjshm@163.com>	2025-02-04 19:18:38 +01:00
Olivier Chafik	5d60cebbcc	Update test_tool_call.py	2025-02-04 17:48:29 +00:00
Xuan-Son Nguyen	3962fc1a79	server : add try..catch to places not covered by set_exception_handler (#11620 ) * server : add try..catch to places not covered by set_exception_handler * log_server_request: rm try catch, add reminder	2025-02-04 18:25:42 +01:00
Radoslav Gerganov	1bef571f6a	arg : list RPC devices first when using --list-devices (#11655 ) List devices in the same order as they appear when evaluating the model and splitting tensors across devices, i.e. RPC devices come first in the list. ref #11435	2025-02-04 18:16:20 +02:00
Olivier Chafik	933f7a186e	Merge branch 'master' into r1-toolcall	2025-02-04 15:56:25 +00:00
Olivier Chafik	db288b60cb	`tool-call`: command r7b fix for normal responses (#11608 ) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat	2025-02-04 15:48:53 +00:00
Olivier Chafik	b2d17287aa	update readme section about common model tool call formats ./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null	2025-02-04 14:27:38 +00:00
Olivier Chafik	39c1d8163b	return thoughts in reasoning_content field	2025-02-04 11:37:09 +00:00
Shelby Jenkins	106045e7bb	readme : add llm_client Rust crate to readme bindings (#11628 ) [This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it. It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible. It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face. So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.	2025-02-04 13:20:55 +02:00
Jhen-Jie Hong	f117d84b48	swift : fix llama-vocab api usage (#11645 ) * swiftui : fix vocab api usage * batched.swift : fix vocab api usage	2025-02-04 13:15:24 +02:00
Jhen-Jie Hong	534c46b53c	metal : use residency set for other platforms (#11648 )	2025-02-04 13:07:18 +02:00
Georgi Gerganov	387a1598ca	authors : update	2025-02-04 13:04:10 +02:00
Georgi Gerganov	7c9e0ca520	sync : ggml	2025-02-04 12:59:21 +02:00
Christian Kastner	8f8290ada9	cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) This makes git as a dependency optional, and is useful in the case where ggml is built not from git, but from a tarball, or a distribution source package. This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be using it, though, so there doesn't seem much value factor it out, or even require it.	2025-02-04 12:59:15 +02:00
ochafik	d1b66910c5	r1: revert making <｜tool▁calls▁begin｜> optional as somehow sampling triggers us on "<｜tool▁call▁begin｜><", which is already invalid per the grammar	2025-02-04 10:38:03 +00:00
ochafik	0db9881285	Fix r1 grammar since we made <｜tool▁calls▁begin｜> optional (triggering on just <｜tool▁call▁begin｜> for 7B's sake)	2025-02-04 10:30:10 +00:00
ochafik	b5b117fa1c	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 09:45:27 +00:00
Georgi Gerganov	b34aedd558	ci : do not stale-close roadmap issues	2025-02-04 09:31:01 +02:00
ochafik	21f207156f	Update chat.cpp	2025-02-04 05:16:23 +00:00
ochafik	438ce0b8a1	fix test-chat	2025-02-04 04:51:36 +00:00
ochafik	1f5ec59809	ensure deepseek r1 thoughts parsed even w/o tool calls	2025-02-04 04:48:08 +00:00
ochafik	b6e14a4101	fix mistral expectation	2025-02-04 04:26:49 +00:00
ochafik	d44eb95c67	tool-call: ensure we don't return content when there are tool calls / warn	2025-02-04 04:18:49 +00:00
ochafik	812544ab8b	server: check that content is null when we get tool_calls	2025-02-04 04:14:15 +00:00
ochafik	d43e4f6c22	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 04:05:02 +00:00
ochafik	f12e3507f7	Update chat.cpp	2025-02-04 04:02:18 +00:00
ochafik	56a14ddc83	fix mistral chat test: need empty tokens	2025-02-04 04:01:35 +00:00
ochafik	b1527292b6	Update test-chat.cpp	2025-02-04 03:56:03 +00:00
ochafik	09caa63451	`sync`: minja `182de30cda`	2025-02-04 03:52:59 +00:00
ochafik	86994db697	fix spaces	2025-02-04 03:47:52 +00:00

1 2 3 4 5 ...

4772 commits