llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	ce28224de8	tool-call: r1: add one more trigger approx "<｜tool calls begin｜>"	2025-02-04 00:28:40 +00:00
Olivier Chafik	bff549deb6	simplify hack to fix original template's backfill from minja	2025-02-04 00:14:48 +00:00
Olivier Chafik	bbd45bf6a2	sync: minja	2025-02-04 00:14:15 +00:00
Olivier Chafik	30ea3591c9	update to minja's new api	2025-02-03 23:53:27 +00:00
Olivier Chafik	11c1f0c7d4	actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options	2025-02-03 23:52:28 +00:00
Olivier Chafik	108da907f0	sync: minja https://github.com/google/minja/pull/46	2025-02-03 23:31:49 +00:00
Olivier Chafik	1c302e18ba	simpler hacky fixes for original broken template (+ fix minja example syntax polyfill)	2025-02-03 20:34:44 +00:00
Olivier Chafik	c6214ee9d6	rm unneeded vocab	2025-02-03 19:59:50 +00:00
Olivier Chafik	7dc271fb37	tool-calls: add deepseek r1 template + accommodate broken official template slightly better	2025-02-03 19:59:33 +00:00
Olivier Chafik	0be7f652e9	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 19:35:54 +00:00
Olivier Chafik	d73448de1c	Simplify default chatml logic	2025-02-03 19:22:53 +00:00
Olivier Chafik	569610ee77	tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out	2025-02-03 18:57:55 +00:00
Olivier Chafik	c397bd1f5f	tweak delta logic	2025-02-03 17:57:38 +00:00
Olivier Chafik	df3474e2c2	tool-calls: r1: add missing <｜tool▁calls▁end｜> to grammar!	2025-02-03 17:33:14 +00:00
Olivier Chafik	08271b5505	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:32:38 +00:00
Olivier Chafik	b2dd490926	add missing try catch around jinja parsing to default to chatml	2025-02-03 17:32:12 +00:00
Olivier Chafik	4cb0e1d873	Merge branch 'jinja-chatml' into r1-toolcall	2025-02-03 17:15:14 +00:00
Olivier Chafik	2b3c4829a3	fix build / rm diff	2025-02-03 16:34:43 +00:00
Olivier Chafik	aa98e59038	fix bad merge	2025-02-03 14:01:49 +00:00
Olivier Chafik	5d18d76b69	fix double bos issue (drop bos/eos tokens from jinja template)	2025-02-03 13:59:16 +00:00
Olivier Chafik	cf83623a47	fix typo	2025-02-03 13:58:46 +00:00
ochafik	a76073cf88	minimize diffs	2025-02-03 10:58:52 +00:00
ochafik	1e9acd2d31	tool-call: allow `--jinja --chat-template chatml`	2025-02-03 04:07:11 +00:00
ochafik	04be723b33	tool-call: fix command-r7b parsing when response is multiline	2025-02-03 02:24:30 +00:00
ochafik	73d08d49cf	tool-call: allow `--jinja --chat-template chatml`	2025-02-03 02:24:30 +00:00
ochafik	c80cb30938	update logs	2025-02-03 02:24:30 +00:00
ochafik	04d511b5b5	Avoid double bos w/ jinja	2025-02-03 02:24:30 +00:00
ochafik	130ca222c9	DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode)	2025-02-03 02:24:30 +00:00
ochafik	87de852b7f	pass vocab to common_chat_params_init	2025-02-03 02:24:30 +00:00
ochafik	d3b60b8ad8	minja: enhance backfill of templates w/o tools description (use example tool call delta!)	2025-02-03 01:03:04 +00:00
Eric Curtin	84ec8a58f7	Name colors (#11573 ) It's more descriptive, use #define's so we can use compile-time concatenations. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-02 15:14:48 +00:00
Olivier Chafik	bfcce4d693	`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 ) * `tool-call`: support Command R7B (w/ tool_plan return) * `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override * `tool-call`: test cleanup / handle lazy grammar triggers	2025-02-02 09:25:38 +00:00
Olivier Chafik	69804487e0	Fix exotic ci env that lacks ostringstream::str (#11581 )	2025-02-02 09:10:15 +00:00
Michał Moskal	ff227703d6	sampling : support for llguidance grammars (#10224 ) * initial porting of previous LLG patch * update for new APIs * build: integrate llguidance as an external project * use '%llguidance' as marker to enable llg lark syntax * add some docs * clarify docs * code style fixes * remove llguidance.h from .gitignore * fix tests when llg is enabled * pass vocab not model to llama_sampler_init_llg() * copy test-grammar-integration.cpp to test-llguidance.cpp * clang fmt * fix ref-count bug * build and run test * gbnf -> lark syntax * conditionally include llguidance test based on LLAMA_LLGUIDANCE flag * rename llguidance test file to test-grammar-llguidance.cpp * add gh action for llg test * align tests with LLG grammar syntax and JSON Schema spec * llama_tokenizer() in fact requires valid utf8 * update llg * format file * add $LLGUIDANCE_LOG_LEVEL support * fix whitespace * fix warning * include <cmath> for INFINITY * add final newline * fail llama_sampler_init_llg() at runtime * Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes * simplify #includes * improve doc string for LLAMA_LLGUIDANCE * typo in merge * bump llguidance to 0.6.12	2025-02-02 09:55:32 +02:00
Olivier Chafik	cfd74c86db	`sync`: minja (`418a2364b5`) (#11574 )	2025-02-01 12:24:51 +00:00
Olivier Chafik	a83f528688	`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 ) * An empty tool_call_id is better than none! * sync: minja (tool call name optional https://github.com/google/minja/pull/36) * Force-disable parallel_tool_calls if template doesn't support it * More debug logs * Llama 3.x tools: accept / trigger on more varied spaced outputs * Fix empty content for functionary v3.2 tool call * Add proper tool call docs to server README * readme: function calling is supported now * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-31 14:15:25 +00:00
Steve Grubb	1bd3047a93	common: Add missing va_end (#11529 ) The va_copy man page states that va_end must be called to revert whatever the copy did. For some implementaions, not calling va_end has no consequences. For others it could leak memory.	2025-01-31 07:58:55 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Olivier Chafik	3d804dec76	sync: minja (#11499 )	2025-01-30 10:30:27 +00:00
Daniel Bevenius	b636228c0a	embedding : enable --no-warmup option (#11475 ) This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.	2025-01-29 10:38:54 +02:00
Olivier Chafik	c64d2becb1	`minja`: sync at `0f5f7f2b37` (#11352 )	2025-01-22 16:16:27 +00:00
Olivier Chafik	a94f3b2727	`common`: utils to split / join / repeat strings (from json converter) (#11342 ) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp	2025-01-22 09:51:44 +00:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes	2025-01-20 22:29:43 +02:00
LostRuins Concedo	6390a998bf	tts : add guide tokens support (#11186 ) * Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences. * applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start	2025-01-18 12:20:57 +02:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Xuan Son Nguyen	84a44815f7	cli : auto activate conversation mode if chat template is available (#11214 ) * cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models	2025-01-13 20:18:12 +01:00
Xuan Son Nguyen	00b4c3da62	common : support tag-based --hf-repo like on ollama (#11195 ) * common : support tag-based hf_repo like on ollama * fix build * various fixes * small fixes * fix style * fix windows build? * move common_get_hf_file to common.cpp * fix complain with noreturn	2025-01-13 13:56:23 +01:00
Xuan Son Nguyen	9a483999a6	llama : fix chat template gguf key (#11201 )	2025-01-12 13:45:14 +01:00
Georgi Gerganov	afa8a9ec9b	llama : add `llama_vocab`, functions -> methods, naming (#11110 ) * llama : functions -> methods (#11110) * llama : add struct llama_vocab to the API (#11156) ggml-ci * hparams : move vocab params to llama_vocab (#11159) ggml-ci * vocab : more pimpl (#11165) ggml-ci * vocab : minor tokenization optimizations (#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (#11167) ggml-ci * llama : update API names to use correct prefix (#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-12 11:32:42 +02:00

1 2 3 4 5 ...

407 commits