llama.cpp

Author	SHA1	Message	Date
Xuan Son Nguyen	e28245f35f	export-lora : fix tok_embd tensor (#11330 )	2025-01-21 14:07:12 +01:00
Radoslav Gerganov	6da5bec81c	rpc : better caching of the base buffer pointer (#11331 ) There is no need to use map, just store the base pointer in the buffer context.	2025-01-21 15:06:41 +02:00
Eric Curtin	2e2f8f093c	linenoise.cpp refactoring (#11301 ) More RAII mainly Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-21 09:32:35 +00:00
Georgi Gerganov	2139667ec4	metal : fix out-of-bounds write (#11314 ) ggml-ci	2025-01-21 08:48:13 +02:00
ochafik	c606255948	Merge branch 'jinja' into tool-call	2025-01-21 03:49:30 +00:00
ochafik	9d8ebd62c6	Update minja from https://github.com/google/minja/pull/27	2025-01-21 03:18:06 +00:00
ochafik	ba8dd66fdf	Merge branch 'jinja' into tool-call	2025-01-21 01:43:14 +00:00
ochafik	ff2cce57ad	Update minja to https://github.com/google/minja/pull/25	2025-01-21 01:26:19 +00:00
ochafik	56aa93c266	fix std imports for gcc build	2025-01-21 00:08:22 +00:00
ochafik	7ea6a06cde	Merge branch 'jinja' into tool-call	2025-01-20 23:59:24 +00:00
ochafik	8347da907d	Update minja to `b8437df626`	2025-01-20 23:59:15 +00:00
ochafik	b110374714	apply renames from jinja branch	2025-01-20 23:59:01 +00:00
ochafik	9bab6939cd	Merge branch 'jinja' into tool-call	2025-01-20 23:55:12 +00:00
ochafik	8a7c89e60c	reinstate assert on chat_templates.template_default	2025-01-20 23:44:42 +00:00
ochafik	ee475d2f51	rename: common_chat_template[s]	2025-01-20 23:42:07 +00:00
ochafik	8348c605ac	Warn against missing eos / bos tokens when jinja template references them	2025-01-20 23:00:47 +00:00
ochafik	54a669e09e	Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)	2025-01-20 22:50:08 +00:00
ochafik	099f983949	Merge remote-tracking branch 'origin/master' into jinja	2025-01-20 21:58:04 +00:00
ochafik	154bfaaa39	Refactor chat template validation	2025-01-20 21:54:34 +00:00
ochafik	8c84aefd4d	Update --chat-template-file w/ recent change to --chat-template	2025-01-20 21:48:31 +00:00
ochafik	c9e8fdd70e	Move chat_templates inside server_context + remove mutex	2025-01-20 21:25:18 +00:00
ochafik	db9dd0c1ac	Finish suggested renamings	2025-01-20 21:06:18 +00:00
Olivier Chafik	153e852411	Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-20 20:55:52 +00:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes	2025-01-20 22:29:43 +02:00
Jeff Bolz	aea8ddd516	vulkan: fix coopmat2 validation failures (#11284 ) mul mat and flash attention shaders were loading f32 types directly into A/B matrices, which happens to work but is technically invalid usage. For FA, we can load it as an Accumulator matrix and convert and this is not in the inner loop and is cheap enough. For mul mat, it's more efficient to do this conversion in a separate pass and have the input(s) be f16. coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.	2025-01-20 10:38:32 -06:00
Georgi Gerganov	9f7add1cde	examples : fix add_special conditions (#11311 )	2025-01-20 16:36:08 +02:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-20 16:02:43 +02:00
Michael Podvitskiy	a4251edd6f	cmake: fix shell command quoting in build-info script (#11309 )	2025-01-20 16:02:15 +02:00
Xuan Son Nguyen	ec7f3ac9ab	llama : add support for Deepseek-R1-Qwen distill model (#11310 ) * llama : add support for Deepseek-R1-Qwen distill model * coding style	2025-01-20 14:35:07 +01:00
Georgi Gerganov	ef6dada60c	cont : fix whitespaces (#11305 )	2025-01-20 09:29:32 +02:00
Kyle Bruene	ae3c1db2f9	llama : re-add LLM_ARCH_PHIMOE (#11305 ) Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.	2025-01-20 09:21:01 +02:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Georgi Gerganov	b9daaffe02	simple-chat : fix BOS being added to each message (#11278 )	2025-01-19 18:12:09 +02:00
Nicolò Scipione	99487b57d4	SYCL: Introducing memory host pool (#11251 ) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-01-19 21:33:34 +08:00
ochafik	0401a83b9b	agent: add --greedy, --top-p, --top-k options	2025-01-19 02:07:06 +00:00
ochafik	c207fdcde6	Merge branch 'jinja' into tool-call	2025-01-18 18:05:11 +00:00
ochafik	cc50356470	minja: fix vigogne (https://github.com/google/minja/pull/22 )	2025-01-18 17:55:04 +00:00
ochafik	e3c475cd12	Disable jinja test that has a cryptic windows failure	2025-01-18 14:55:27 +00:00
ochafik	d6f058da8c	Merge branch 'jinja' into tool-call	2025-01-18 14:54:57 +00:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f30f099228	server : implement cancellable request (#11285 ) * server : implement cancellable request * fix typo * httplib 0.18.5 * fix i underflow	2025-01-18 14:12:05 +01:00
ochafik	0e74c9dabe	Add missing optional include to server.cpp	2025-01-18 11:58:00 +00:00
ochafik	fc60802b6e	Rm unused optional include	2025-01-18 11:35:54 +00:00
ochafik	76893f5880	Merge branch 'jinja' into tool-call	2025-01-18 11:26:56 +00:00
Georgi Gerganov	f26c874179	scripts : restore hf.sh (#11288 ) ggml-ci	2025-01-18 13:18:32 +02:00
ochafik	5074e6fecd	Fix copy elision warning	2025-01-18 10:48:03 +00:00
ochafik	33322e823e	Flush stdout in chat template before potential crash	2025-01-18 10:38:21 +00:00
ochafik	e63520f37a	Forward decl minja::chat_template to avoid eager json dep	2025-01-18 10:37:56 +00:00
LostRuins Concedo	6390a998bf	tts : add guide tokens support (#11186 ) * Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences. * applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start	2025-01-18 12:20:57 +02:00

1 2 3 4 5 ...

4822 commits