llama.cpp

Author	SHA1	Message	Date
ochafik	f3e9f8b62a	fix test_thoughts	2025-02-05 12:34:27 +00:00
ochafik	d20c2ce4e7	Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall	2025-02-05 12:16:42 +00:00
ochafik	9d7c3cc51b	--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)	2025-02-05 12:16:37 +00:00
Olivier Chafik	1f1f06aa26	Merge branch 'master' into r1-toolcall	2025-02-05 01:10:45 +00:00
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00
Johannes Gäßler	fd08255d0d	CUDA: non-contiguous (RMS) norm support (#11659 ) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-04 22:21:42 +01:00
fxzjshm	3ec9fd4b77	HIP: force max threads per block to be 1024 (#11621 ) Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm. Signed-off-by: fxzjshm <fxzjshm@163.com>	2025-02-04 19:18:38 +01:00
Olivier Chafik	5d60cebbcc	Update test_tool_call.py	2025-02-04 17:48:29 +00:00
Xuan-Son Nguyen	3962fc1a79	server : add try..catch to places not covered by set_exception_handler (#11620 ) * server : add try..catch to places not covered by set_exception_handler * log_server_request: rm try catch, add reminder	2025-02-04 18:25:42 +01:00
Radoslav Gerganov	1bef571f6a	arg : list RPC devices first when using --list-devices (#11655 ) List devices in the same order as they appear when evaluating the model and splitting tensors across devices, i.e. RPC devices come first in the list. ref #11435	2025-02-04 18:16:20 +02:00
Olivier Chafik	933f7a186e	Merge branch 'master' into r1-toolcall	2025-02-04 15:56:25 +00:00
Olivier Chafik	db288b60cb	`tool-call`: command r7b fix for normal responses (#11608 ) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat	2025-02-04 15:48:53 +00:00
Olivier Chafik	b2d17287aa	update readme section about common model tool call formats ./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null	2025-02-04 14:27:38 +00:00
Olivier Chafik	39c1d8163b	return thoughts in reasoning_content field	2025-02-04 11:37:09 +00:00
Shelby Jenkins	106045e7bb	readme : add llm_client Rust crate to readme bindings (#11628 ) [This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it. It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible. It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face. So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.	2025-02-04 13:20:55 +02:00
Jhen-Jie Hong	f117d84b48	swift : fix llama-vocab api usage (#11645 ) * swiftui : fix vocab api usage * batched.swift : fix vocab api usage	2025-02-04 13:15:24 +02:00
Jhen-Jie Hong	534c46b53c	metal : use residency set for other platforms (#11648 )	2025-02-04 13:07:18 +02:00
Georgi Gerganov	387a1598ca	authors : update	2025-02-04 13:04:10 +02:00
Georgi Gerganov	7c9e0ca520	sync : ggml	2025-02-04 12:59:21 +02:00
Christian Kastner	8f8290ada9	cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) This makes git as a dependency optional, and is useful in the case where ggml is built not from git, but from a tarball, or a distribution source package. This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be using it, though, so there doesn't seem much value factor it out, or even require it.	2025-02-04 12:59:15 +02:00
ochafik	d1b66910c5	r1: revert making <｜tool▁calls▁begin｜> optional as somehow sampling triggers us on "<｜tool▁call▁begin｜><", which is already invalid per the grammar	2025-02-04 10:38:03 +00:00
ochafik	0db9881285	Fix r1 grammar since we made <｜tool▁calls▁begin｜> optional (triggering on just <｜tool▁call▁begin｜> for 7B's sake)	2025-02-04 10:30:10 +00:00
ochafik	b5b117fa1c	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 09:45:27 +00:00
Georgi Gerganov	b34aedd558	ci : do not stale-close roadmap issues	2025-02-04 09:31:01 +02:00
ochafik	21f207156f	Update chat.cpp	2025-02-04 05:16:23 +00:00
ochafik	438ce0b8a1	fix test-chat	2025-02-04 04:51:36 +00:00
ochafik	1f5ec59809	ensure deepseek r1 thoughts parsed even w/o tool calls	2025-02-04 04:48:08 +00:00
ochafik	b6e14a4101	fix mistral expectation	2025-02-04 04:26:49 +00:00
ochafik	d44eb95c67	tool-call: ensure we don't return content when there are tool calls / warn	2025-02-04 04:18:49 +00:00
ochafik	812544ab8b	server: check that content is null when we get tool_calls	2025-02-04 04:14:15 +00:00
ochafik	d43e4f6c22	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 04:05:02 +00:00
ochafik	f12e3507f7	Update chat.cpp	2025-02-04 04:02:18 +00:00
ochafik	56a14ddc83	fix mistral chat test: need empty tokens	2025-02-04 04:01:35 +00:00
ochafik	b1527292b6	Update test-chat.cpp	2025-02-04 03:56:03 +00:00
ochafik	09caa63451	`sync`: minja `182de30cda`	2025-02-04 03:52:59 +00:00
ochafik	86994db697	fix spaces	2025-02-04 03:47:52 +00:00
ochafik	78b47bb0e9	fix test_calc_result	2025-02-04 03:46:26 +00:00
ochafik	326e7002b3	update test_calc_result	2025-02-04 03:13:13 +00:00
ochafik	f0154a6479	Fix / test models/templates/llama-cpp-deepseek-r1.jinja	2025-02-04 03:09:15 +00:00
ochafik	a682d1216d	fix / test parsing of r1 parser	2025-02-04 02:23:31 +00:00
ochafik	9a6847c857	move trigger_words init inside non-llguidance branch	2025-02-04 01:13:01 +00:00
ochafik	18a11f43f0	tool-call: r1: fix grammar	2025-02-04 01:12:44 +00:00
ochafik	e84ee88f50	r1: fix inadvertent newline in grammar before <｜tool▁call▁end｜>	2025-02-04 00:36:38 +00:00
Olivier Chafik	ce28224de8	tool-call: r1: add one more trigger approx "<｜tool calls begin｜>"	2025-02-04 00:28:40 +00:00
Olivier Chafik	bff549deb6	simplify hack to fix original template's backfill from minja	2025-02-04 00:14:48 +00:00
Olivier Chafik	bbd45bf6a2	sync: minja	2025-02-04 00:14:15 +00:00
Olivier Chafik	30ea3591c9	update to minja's new api	2025-02-03 23:53:27 +00:00
Olivier Chafik	11c1f0c7d4	actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options	2025-02-03 23:52:28 +00:00
Olivier Chafik	bc6d910f6d	Merge branch 'master' into r1-toolcall	2025-02-03 23:51:31 +00:00
Olivier Chafik	cde3833239	`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616 ) * tool-call: allow `--jinja --chat-template chatml` * fix double bos issue (drop bos/eos tokens from jinja template) * add missing try catch around jinja parsing to default to chatml * Simplify default chatml logic	2025-02-03 23:49:27 +00:00

1 2 3 4 5 ...

4708 commits