llama.cpp

Author	SHA1	Message	Date
ochafik	9d8ebd62c6	Update minja from https://github.com/google/minja/pull/27	2025-01-21 03:18:06 +00:00
ochafik	ff2cce57ad	Update minja to https://github.com/google/minja/pull/25	2025-01-21 01:26:19 +00:00
ochafik	8347da907d	Update minja to `b8437df626`	2025-01-20 23:59:15 +00:00
ochafik	8a7c89e60c	reinstate assert on chat_templates.template_default	2025-01-20 23:44:42 +00:00
ochafik	ee475d2f51	rename: common_chat_template[s]	2025-01-20 23:42:07 +00:00
ochafik	8348c605ac	Warn against missing eos / bos tokens when jinja template references them	2025-01-20 23:00:47 +00:00
ochafik	54a669e09e	Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)	2025-01-20 22:50:08 +00:00
ochafik	099f983949	Merge remote-tracking branch 'origin/master' into jinja	2025-01-20 21:58:04 +00:00
ochafik	154bfaaa39	Refactor chat template validation	2025-01-20 21:54:34 +00:00
ochafik	8c84aefd4d	Update --chat-template-file w/ recent change to --chat-template	2025-01-20 21:48:31 +00:00
ochafik	c9e8fdd70e	Move chat_templates inside server_context + remove mutex	2025-01-20 21:25:18 +00:00
ochafik	db9dd0c1ac	Finish suggested renamings	2025-01-20 21:06:18 +00:00
Olivier Chafik	153e852411	Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-20 20:55:52 +00:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes	2025-01-20 22:29:43 +02:00
Jeff Bolz	aea8ddd516	vulkan: fix coopmat2 validation failures (#11284 ) mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.	2025-01-20 10:38:32 -06:00
Georgi Gerganov	9f7add1cde	examples : fix add_special conditions (#11311 )	2025-01-20 16:36:08 +02:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-20 16:02:43 +02:00
Michael Podvitskiy	a4251edd6f	cmake: fix shell command quoting in build-info script (#11309 )	2025-01-20 16:02:15 +02:00
Xuan Son Nguyen	ec7f3ac9ab	llama : add support for Deepseek-R1-Qwen distill model (#11310 ) * llama : add support for Deepseek-R1-Qwen distill model * coding style	2025-01-20 14:35:07 +01:00
Georgi Gerganov	ef6dada60c	cont : fix whitespaces (#11305 )	2025-01-20 09:29:32 +02:00
Kyle Bruene	ae3c1db2f9	llama : re-add LLM_ARCH_PHIMOE (#11305 ) Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.	2025-01-20 09:21:01 +02:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Georgi Gerganov	b9daaffe02	simple-chat : fix BOS being added to each message (#11278 )	2025-01-19 18:12:09 +02:00
Nicolò Scipione	99487b57d4	SYCL: Introducing memory host pool (#11251 ) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-01-19 21:33:34 +08:00
ochafik	cc50356470	minja: fix vigogne (https://github.com/google/minja/pull/22 )	2025-01-18 17:55:04 +00:00
ochafik	e3c475cd12	Disable jinja test that has a cryptic windows failure	2025-01-18 14:55:27 +00:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f30f099228	server : implement cancellable request (#11285 ) * server : implement cancellable request * fix typo * httplib 0.18.5 * fix i underflow	2025-01-18 14:12:05 +01:00
ochafik	0e74c9dabe	Add missing optional include to server.cpp	2025-01-18 11:58:00 +00:00
ochafik	fc60802b6e	Rm unused optional include	2025-01-18 11:35:54 +00:00
Georgi Gerganov	f26c874179	scripts : restore hf.sh (#11288 ) ggml-ci	2025-01-18 13:18:32 +02:00
ochafik	5074e6fecd	Fix copy elision warning	2025-01-18 10:48:03 +00:00
ochafik	33322e823e	Flush stdout in chat template before potential crash	2025-01-18 10:38:21 +00:00
ochafik	e63520f37a	Forward decl minja::chat_template to avoid eager json dep	2025-01-18 10:37:56 +00:00
LostRuins Concedo	6390a998bf	tts : add guide tokens support (#11186 ) * Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences. * applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start	2025-01-18 12:20:57 +02:00
Jeff Bolz	44e18ef939	vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281 ) Add code similar to mul_mm_cm2 to force alignment of strides, to avoid a performance regression. Add noncontiguous FA tests in test-backend-ops. Fixes #11268.	2025-01-18 09:26:50 +01:00
ochafik	ee1e10e21e	Normalize newlines in test-chat-templates for windows tests	2025-01-18 02:52:40 +00:00
ochafik	d5fa351a24	Revert LLAMA_CHATML_TEMPLATE refactor	2025-01-18 01:04:12 +00:00
ochafik	81c0d437a5	Attempt to fix linkage of LLAMA_CHATML_TEMPLATE	2025-01-18 00:56:19 +00:00
ochafik	40db78963b	Merge remote-tracking branch 'origin/master' into jinja	2025-01-18 00:44:37 +00:00
ochafik	b75d0622e4	Refactor common_chat_* functions to accept minja template + use_jinja option	2025-01-18 00:43:38 +00:00
codezjx	3edfa7d375	llama.android: add field formatChat to control whether to parse special tokens when send message (#11270 )	2025-01-17 14:57:56 +02:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Georgi Gerganov	a133566d34	vocab : fix double-eos check (#11273 ) ggml-ci	2025-01-17 09:28:00 +02:00
David Renshaw	960ec65273	llama : fix deprecation message: vocabable -> vocab (#11269 )	2025-01-17 08:12:01 +01:00
musoles	7a689c415e	README : added kalavai to infrastructure list (#11216 )	2025-01-17 01:10:49 +01:00
Jeff Bolz	bd38ddea01	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 ) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination	2025-01-16 22:47:10 +01:00
Jeff Bolz	466300fe14	vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206 ) Do masking on whole dwords, fetch all scales at once.	2025-01-16 22:23:49 +01:00
Jeff Bolz	206bc53422	vulkan: optimize coopmat2 q2_k dequant function (#11130 )	2025-01-16 22:16:39 +01:00

1 2 3 4 5 ...

4565 commits