llama.cpp

Author	SHA1	Message	Date
Olivier Chafik	dbf841b0d2	Push laziness down to grammar impl	2025-01-22 01:25:54 +00:00
Olivier Chafik	77f4098c83	Delete update_jinja_goldens.py	2025-01-21 14:41:59 +00:00
Olivier Chafik	f6e73dac43	Remove examples/agent (moved to https://gist.github.com/ochafik/9246d289b7d38d49e1ee2755698d6c79 )	2025-01-21 14:41:56 +00:00
Olivier Chafik	b49d0521e9	rm tests/test-minja from makefile	2025-01-21 14:12:38 +00:00
Olivier Chafik	fec0260366	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-21 13:44:58 +00:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Xuan Son Nguyen	e28245f35f	export-lora : fix tok_embd tensor (#11330 )	2025-01-21 14:07:12 +01:00
Radoslav Gerganov	6da5bec81c	rpc : better caching of the base buffer pointer (#11331 ) There is no need to use map, just store the base pointer in the buffer context.	2025-01-21 15:06:41 +02:00
Eric Curtin	2e2f8f093c	linenoise.cpp refactoring (#11301 ) More RAII mainly Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-21 09:32:35 +00:00
Georgi Gerganov	2139667ec4	metal : fix out-of-bounds write (#11314 ) ggml-ci	2025-01-21 08:48:13 +02:00
ochafik	c606255948	Merge branch 'jinja' into tool-call	2025-01-21 03:49:30 +00:00
ochafik	9d8ebd62c6	Update minja from https://github.com/google/minja/pull/27	2025-01-21 03:18:06 +00:00
ochafik	ba8dd66fdf	Merge branch 'jinja' into tool-call	2025-01-21 01:43:14 +00:00
ochafik	ff2cce57ad	Update minja to https://github.com/google/minja/pull/25	2025-01-21 01:26:19 +00:00
ochafik	56aa93c266	fix std imports for gcc build	2025-01-21 00:08:22 +00:00
ochafik	7ea6a06cde	Merge branch 'jinja' into tool-call	2025-01-20 23:59:24 +00:00
ochafik	8347da907d	Update minja to `b8437df626`	2025-01-20 23:59:15 +00:00
ochafik	b110374714	apply renames from jinja branch	2025-01-20 23:59:01 +00:00
ochafik	9bab6939cd	Merge branch 'jinja' into tool-call	2025-01-20 23:55:12 +00:00
ochafik	8a7c89e60c	reinstate assert on chat_templates.template_default	2025-01-20 23:44:42 +00:00
ochafik	ee475d2f51	rename: common_chat_template[s]	2025-01-20 23:42:07 +00:00
ochafik	8348c605ac	Warn against missing eos / bos tokens when jinja template references them	2025-01-20 23:00:47 +00:00
ochafik	54a669e09e	Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)	2025-01-20 22:50:08 +00:00
ochafik	099f983949	Merge remote-tracking branch 'origin/master' into jinja	2025-01-20 21:58:04 +00:00
ochafik	154bfaaa39	Refactor chat template validation	2025-01-20 21:54:34 +00:00
ochafik	8c84aefd4d	Update --chat-template-file w/ recent change to --chat-template	2025-01-20 21:48:31 +00:00
ochafik	c9e8fdd70e	Move chat_templates inside server_context + remove mutex	2025-01-20 21:25:18 +00:00
ochafik	db9dd0c1ac	Finish suggested renamings	2025-01-20 21:06:18 +00:00
Olivier Chafik	153e852411	Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-20 20:55:52 +00:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes	2025-01-20 22:29:43 +02:00
Jeff Bolz	aea8ddd516	vulkan: fix coopmat2 validation failures (#11284 ) mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.	2025-01-20 10:38:32 -06:00
Georgi Gerganov	9f7add1cde	examples : fix add_special conditions (#11311 )	2025-01-20 16:36:08 +02:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-20 16:02:43 +02:00
Michael Podvitskiy	a4251edd6f	cmake: fix shell command quoting in build-info script (#11309 )	2025-01-20 16:02:15 +02:00
Xuan Son Nguyen	ec7f3ac9ab	llama : add support for Deepseek-R1-Qwen distill model (#11310 ) * llama : add support for Deepseek-R1-Qwen distill model * coding style	2025-01-20 14:35:07 +01:00
Georgi Gerganov	ef6dada60c	cont : fix whitespaces (#11305 )	2025-01-20 09:29:32 +02:00
Kyle Bruene	ae3c1db2f9	llama : re-add LLM_ARCH_PHIMOE (#11305 ) Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.	2025-01-20 09:21:01 +02:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Georgi Gerganov	b9daaffe02	simple-chat : fix BOS being added to each message (#11278 )	2025-01-19 18:12:09 +02:00
Nicolò Scipione	99487b57d4	SYCL: Introducing memory host pool (#11251 ) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-01-19 21:33:34 +08:00
ochafik	0401a83b9b	agent: add --greedy, --top-p, --top-k options	2025-01-19 02:07:06 +00:00
ochafik	c207fdcde6	Merge branch 'jinja' into tool-call	2025-01-18 18:05:11 +00:00
ochafik	cc50356470	minja: fix vigogne (https://github.com/google/minja/pull/22 )	2025-01-18 17:55:04 +00:00
ochafik	e3c475cd12	Disable jinja test that has a cryptic windows failure	2025-01-18 14:55:27 +00:00
ochafik	d6f058da8c	Merge branch 'jinja' into tool-call	2025-01-18 14:54:57 +00:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f30f099228	server : implement cancellable request (#11285 ) * server : implement cancellable request * fix typo * httplib 0.18.5 * fix i underflow	2025-01-18 14:12:05 +01:00
ochafik	0e74c9dabe	Add missing optional include to server.cpp	2025-01-18 11:58:00 +00:00
ochafik	fc60802b6e	Rm unused optional include	2025-01-18 11:35:54 +00:00

1 2 3 4 5 ...

4828 commits