Commit graph

4809 commits

Author SHA1 Message Date
ochafik
8a7c89e60c reinstate assert on chat_templates.template_default 2025-01-20 23:44:42 +00:00
ochafik
ee475d2f51 rename: common_chat_template[s] 2025-01-20 23:42:07 +00:00
ochafik
8348c605ac Warn against missing eos / bos tokens when jinja template references them 2025-01-20 23:00:47 +00:00
ochafik
54a669e09e Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) 2025-01-20 22:50:08 +00:00
ochafik
099f983949 Merge remote-tracking branch 'origin/master' into jinja 2025-01-20 21:58:04 +00:00
ochafik
154bfaaa39 Refactor chat template validation 2025-01-20 21:54:34 +00:00
ochafik
8c84aefd4d Update --chat-template-file w/ recent change to --chat-template 2025-01-20 21:48:31 +00:00
ochafik
c9e8fdd70e Move chat_templates inside server_context + remove mutex 2025-01-20 21:25:18 +00:00
ochafik
db9dd0c1ac Finish suggested renamings 2025-01-20 21:06:18 +00:00
Olivier Chafik
153e852411
Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-20 20:55:52 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model (#11318)
* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes
2025-01-20 22:29:43 +02:00
Jeff Bolz
aea8ddd516
vulkan: fix coopmat2 validation failures (#11284)
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.

coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-01-20 10:38:32 -06:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions (#11311) 2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno (#11296)
ggml-ci

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script (#11309) 2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model (#11310)
* llama : add support for Deepseek-R1-Qwen distill model

* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces (#11305) 2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE (#11305)
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled (#11300)
* tests : increase timeout when sanitizers are enabled

* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message (#11278) 2025-01-19 18:12:09 +02:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool (#11251)
* Implement host pool for matrix_info

Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp

* Remove unnecessary headers and cast

* Reorder member variable to avoid warning on initialization

* Formatting

* Remove unused variable

* Address PR review feedback - remove warning

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
ochafik
0401a83b9b agent: add --greedy, --top-p, --top-k options 2025-01-19 02:07:06 +00:00
ochafik
c207fdcde6 Merge branch 'jinja' into tool-call 2025-01-18 18:05:11 +00:00
ochafik
cc50356470 minja: fix vigogne (https://github.com/google/minja/pull/22) 2025-01-18 17:55:04 +00:00
ochafik
e3c475cd12 Disable jinja test that has a cryptic windows failure 2025-01-18 14:55:27 +00:00
ochafik
d6f058da8c Merge branch 'jinja' into tool-call 2025-01-18 14:54:57 +00:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run (#11252)
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp (#11279)
* cmake : add sanitizer flags for llama.cpp

ggml-ci

* tests : fix compile warnings

ggml-ci

* cmake : move sanitizer flags to llama_add_compile_flags

ggml-ci

* cmake : move llama.cpp compile flags to top level lists

ggml-ci

* cmake : apply only sanitizer flags at top level

ggml-ci

* tests : fix gguf context use in same_tensor_data

* gguf-test: tensor data comparison

* dummy : trigger ggml-ci

* unicode : silence gcc warnings

ggml-ci

* ci : use sanitizer builds only in Debug mode

ggml-ci

* cmake : add status messages [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request (#11285)
* server : implement cancellable request

* fix typo

* httplib 0.18.5

* fix i underflow
2025-01-18 14:12:05 +01:00
ochafik
0e74c9dabe Add missing optional include to server.cpp 2025-01-18 11:58:00 +00:00
ochafik
fc60802b6e Rm unused optional include 2025-01-18 11:35:54 +00:00
ochafik
76893f5880 Merge branch 'jinja' into tool-call 2025-01-18 11:26:56 +00:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh (#11288)
ggml-ci
2025-01-18 13:18:32 +02:00
ochafik
5074e6fecd Fix copy elision warning 2025-01-18 10:48:03 +00:00
ochafik
33322e823e Flush stdout in chat template before potential crash 2025-01-18 10:38:21 +00:00
ochafik
e63520f37a Forward decl minja::chat_template to avoid eager json dep 2025-01-18 10:37:56 +00:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support (#11186)
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.

* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
Jeff Bolz
44e18ef939
vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281)
Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.

Add noncontiguous FA tests in test-backend-ops.

Fixes #11268.
2025-01-18 09:26:50 +01:00
ochafik
ee1e10e21e Normalize newlines in test-chat-templates for windows tests 2025-01-18 02:52:40 +00:00
ochafik
acf7c240d8 tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models) 2025-01-18 02:39:37 +00:00
ochafik
259d9e4511 tools: greedy sampling in tests 2025-01-18 02:39:10 +00:00
ochafik
2ceabee0f8 Fix fetch_server_test_models.py (avoid conv trap) 2025-01-18 01:36:46 +00:00
ochafik
045edd1d7e Merge branch 'jinja' into tool-call 2025-01-18 01:04:57 +00:00
ochafik
d5fa351a24 Revert LLAMA_CHATML_TEMPLATE refactor 2025-01-18 01:04:12 +00:00
ochafik
138a4ba83f Merge branch 'jinja' into tool-call 2025-01-18 00:59:10 +00:00
ochafik
81c0d437a5 Attempt to fix linkage of LLAMA_CHATML_TEMPLATE 2025-01-18 00:56:19 +00:00
ochafik
40db78963b Merge remote-tracking branch 'origin/master' into jinja 2025-01-18 00:44:37 +00:00
ochafik
b75d0622e4 Refactor common_chat_* functions to accept minja template + use_jinja option 2025-01-18 00:43:38 +00:00
ochafik
3c7784c51c Refactor common_chat_* functions to accept minja template + use_jinja option 2025-01-18 00:13:16 +00:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) 2025-01-17 14:57:56 +02:00