llama.cpp

Author	SHA1	Message	Date
ochafik	cc2c712cf9	Merge remote-tracking branch 'origin/master' into r1-toolcall	2025-02-08 14:35:10 +00:00
Christian Fillion	7ee953a64a	llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727 ) The C API in llama.h claims users can implement `llama_sampler_i` to create custom `llama_sampler`. The sampler chain takes ownership and calls `llama_sampler_free` on them. However, `llama_sampler_free` is hard-coded to use `delete`. This is undefined behavior if the object wasn't also allocated via `new` from libllama's C++ runtime. Callers in C and C-compatible languages do not use C++'s `new` operator. C++ callers may not be sharing the same heap as libllama.	2025-02-07 11:33:27 +02:00
Daniel Bevenius	b7552cfcbc	common : add default embeddings presets (#11677 ) * common : add default embeddings presets This commit adds default embeddings presets for the following models: - bge-small-en-v1.5 - e5-small-v2 - gte-small These can be used with llama-embedding and llama-server. For example, with llama-embedding: ```console ./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?" ``` And with llama-server: ```console ./build/bin/llama-server --embd-gte-small-default ``` And the embeddings endpoint can then be called with a POST request: ```console curl --request POST \ --url http://localhost:8080/embeddings \ --header "Content-Type: application/json" \ --data '{"input": "Hello, how are you?"}' ``` I'm not sure if these are the most common embedding models but hopefully this can be a good starting point for discussion and further improvements. Refs: https://github.com/ggerganov/llama.cpp/issues/10932	2025-02-07 09:15:22 +01:00
Olivier Chafik	d1a064070f	revert tool example backfill change - command 7rb just needs the right template	2025-02-05 16:33:37 +00:00
Olivier Chafik	994301da12	use existing string_strip	2025-02-05 16:33:16 +00:00
Olivier Chafik	0917e0a80d	fix --think arg env	2025-02-05 16:15:09 +00:00
Olivier Chafik	e6d9b52480	align Command R7B w/ --think / reasoning_content behaviour	2025-02-05 15:47:37 +00:00
Olivier Chafik	3841a163ef	fix compiler warning about parens	2025-02-05 13:05:27 +00:00
ochafik	f3e9f8b62a	fix test_thoughts	2025-02-05 12:34:27 +00:00
ochafik	d20c2ce4e7	Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall	2025-02-05 12:16:42 +00:00
ochafik	9d7c3cc51b	--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)	2025-02-05 12:16:37 +00:00
Olivier Chafik	1f1f06aa26	Merge branch 'master' into r1-toolcall	2025-02-05 01:10:45 +00:00
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00
Radoslav Gerganov	1bef571f6a	arg : list RPC devices first when using --list-devices (#11655 ) List devices in the same order as they appear when evaluating the model and splitting tensors across devices, i.e. RPC devices come first in the list. ref #11435	2025-02-04 18:16:20 +02:00
Olivier Chafik	933f7a186e	Merge branch 'master' into r1-toolcall	2025-02-04 15:56:25 +00:00
Olivier Chafik	db288b60cb	`tool-call`: command r7b fix for normal responses (#11608 ) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat	2025-02-04 15:48:53 +00:00
Olivier Chafik	39c1d8163b	return thoughts in reasoning_content field	2025-02-04 11:37:09 +00:00
ochafik	d1b66910c5	r1: revert making <｜tool▁calls▁begin｜> optional as somehow sampling triggers us on "<｜tool▁call▁begin｜><", which is already invalid per the grammar	2025-02-04 10:38:03 +00:00
ochafik	0db9881285	Fix r1 grammar since we made <｜tool▁calls▁begin｜> optional (triggering on just <｜tool▁call▁begin｜> for 7B's sake)	2025-02-04 10:30:10 +00:00
ochafik	b5b117fa1c	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 09:45:27 +00:00
ochafik	21f207156f	Update chat.cpp	2025-02-04 05:16:23 +00:00
ochafik	438ce0b8a1	fix test-chat	2025-02-04 04:51:36 +00:00
ochafik	1f5ec59809	ensure deepseek r1 thoughts parsed even w/o tool calls	2025-02-04 04:48:08 +00:00
ochafik	d44eb95c67	tool-call: ensure we don't return content when there are tool calls / warn	2025-02-04 04:18:49 +00:00
ochafik	d43e4f6c22	Merge branch 'sync-minja-4' into r1-toolcall	2025-02-04 04:05:02 +00:00
ochafik	f12e3507f7	Update chat.cpp	2025-02-04 04:02:18 +00:00
ochafik	56a14ddc83	fix mistral chat test: need empty tokens	2025-02-04 04:01:35 +00:00
ochafik	09caa63451	`sync`: minja `182de30cda`	2025-02-04 03:52:59 +00:00
ochafik	f0154a6479	Fix / test models/templates/llama-cpp-deepseek-r1.jinja	2025-02-04 03:09:15 +00:00
ochafik	a682d1216d	fix / test parsing of r1 parser	2025-02-04 02:23:31 +00:00
ochafik	9a6847c857	move trigger_words init inside non-llguidance branch	2025-02-04 01:13:01 +00:00
ochafik	18a11f43f0	tool-call: r1: fix grammar	2025-02-04 01:12:44 +00:00
ochafik	e84ee88f50	r1: fix inadvertent newline in grammar before <｜tool▁call▁end｜>	2025-02-04 00:36:38 +00:00
Olivier Chafik	ce28224de8	tool-call: r1: add one more trigger approx "<｜tool calls begin｜>"	2025-02-04 00:28:40 +00:00
Olivier Chafik	bff549deb6	simplify hack to fix original template's backfill from minja	2025-02-04 00:14:48 +00:00
Olivier Chafik	bbd45bf6a2	sync: minja	2025-02-04 00:14:15 +00:00
Olivier Chafik	30ea3591c9	update to minja's new api	2025-02-03 23:53:27 +00:00
Olivier Chafik	11c1f0c7d4	actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options	2025-02-03 23:52:28 +00:00
Olivier Chafik	cde3833239	`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616 ) * tool-call: allow `--jinja --chat-template chatml` * fix double bos issue (drop bos/eos tokens from jinja template) * add missing try catch around jinja parsing to default to chatml * Simplify default chatml logic	2025-02-03 23:49:27 +00:00
Olivier Chafik	108da907f0	sync: minja https://github.com/google/minja/pull/46	2025-02-03 23:31:49 +00:00
Olivier Chafik	1c302e18ba	simpler hacky fixes for original broken template (+ fix minja example syntax polyfill)	2025-02-03 20:34:44 +00:00
Olivier Chafik	c6214ee9d6	rm unneeded vocab	2025-02-03 19:59:50 +00:00
Olivier Chafik	7dc271fb37	tool-calls: add deepseek r1 template + accommodate broken official template slightly better	2025-02-03 19:59:33 +00:00
Olivier Chafik	0be7f652e9	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 19:35:54 +00:00
Olivier Chafik	d73448de1c	Simplify default chatml logic	2025-02-03 19:22:53 +00:00
Olivier Chafik	569610ee77	tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out	2025-02-03 18:57:55 +00:00
Olivier Chafik	c397bd1f5f	tweak delta logic	2025-02-03 17:57:38 +00:00
Olivier Chafik	df3474e2c2	tool-calls: r1: add missing <｜tool▁calls▁end｜> to grammar!	2025-02-03 17:33:14 +00:00
Olivier Chafik	08271b5505	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:32:38 +00:00
Olivier Chafik	b2dd490926	add missing try catch around jinja parsing to default to chatml	2025-02-03 17:32:12 +00:00

1 2 3 4 5 ...

441 commits