Commit graph

4828 commits

Author SHA1 Message Date
Olivier Chafik
dbf841b0d2 Push laziness down to grammar impl 2025-01-22 01:25:54 +00:00
Olivier Chafik
77f4098c83 Delete update_jinja_goldens.py 2025-01-21 14:41:59 +00:00
Olivier Chafik
f6e73dac43 Remove examples/agent (moved to https://gist.github.com/ochafik/9246d289b7d38d49e1ee2755698d6c79) 2025-01-21 14:41:56 +00:00
Olivier Chafik
b49d0521e9 rm tests/test-minja from makefile 2025-01-21 14:12:38 +00:00
Olivier Chafik
fec0260366 Merge remote-tracking branch 'origin/master' into tool-call 2025-01-21 13:44:58 +00:00
Olivier Chafik
6171c9d258
Add Jinja template support (#11016)
* Copy minja from 58f0ca6dd7

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to b8437df626

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Xuan Son Nguyen
e28245f35f
export-lora : fix tok_embd tensor (#11330) 2025-01-21 14:07:12 +01:00
Radoslav Gerganov
6da5bec81c
rpc : better caching of the base buffer pointer (#11331)
There is no need to use map, just store the base pointer in the buffer
context.
2025-01-21 15:06:41 +02:00
Eric Curtin
2e2f8f093c
linenoise.cpp refactoring (#11301)
More RAII mainly

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-21 09:32:35 +00:00
Georgi Gerganov
2139667ec4
metal : fix out-of-bounds write (#11314)
ggml-ci
2025-01-21 08:48:13 +02:00
ochafik
c606255948 Merge branch 'jinja' into tool-call 2025-01-21 03:49:30 +00:00
ochafik
9d8ebd62c6 Update minja from https://github.com/google/minja/pull/27 2025-01-21 03:18:06 +00:00
ochafik
ba8dd66fdf Merge branch 'jinja' into tool-call 2025-01-21 01:43:14 +00:00
ochafik
ff2cce57ad Update minja to https://github.com/google/minja/pull/25 2025-01-21 01:26:19 +00:00
ochafik
56aa93c266 fix std imports for gcc build 2025-01-21 00:08:22 +00:00
ochafik
7ea6a06cde Merge branch 'jinja' into tool-call 2025-01-20 23:59:24 +00:00
ochafik
8347da907d Update minja to b8437df626 2025-01-20 23:59:15 +00:00
ochafik
b110374714 apply renames from jinja branch 2025-01-20 23:59:01 +00:00
ochafik
9bab6939cd Merge branch 'jinja' into tool-call 2025-01-20 23:55:12 +00:00
ochafik
8a7c89e60c reinstate assert on chat_templates.template_default 2025-01-20 23:44:42 +00:00
ochafik
ee475d2f51 rename: common_chat_template[s] 2025-01-20 23:42:07 +00:00
ochafik
8348c605ac Warn against missing eos / bos tokens when jinja template references them 2025-01-20 23:00:47 +00:00
ochafik
54a669e09e Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) 2025-01-20 22:50:08 +00:00
ochafik
099f983949 Merge remote-tracking branch 'origin/master' into jinja 2025-01-20 21:58:04 +00:00
ochafik
154bfaaa39 Refactor chat template validation 2025-01-20 21:54:34 +00:00
ochafik
8c84aefd4d Update --chat-template-file w/ recent change to --chat-template 2025-01-20 21:48:31 +00:00
ochafik
c9e8fdd70e Move chat_templates inside server_context + remove mutex 2025-01-20 21:25:18 +00:00
ochafik
db9dd0c1ac Finish suggested renamings 2025-01-20 21:06:18 +00:00
Olivier Chafik
153e852411
Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-20 20:55:52 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model (#11318)
* common : add -hfd option for the draft model

* cont : fix env var

* cont : more fixes
2025-01-20 22:29:43 +02:00
Jeff Bolz
aea8ddd516
vulkan: fix coopmat2 validation failures (#11284)
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.

coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-01-20 10:38:32 -06:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions (#11311) 2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno (#11296)
ggml-ci

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script (#11309) 2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model (#11310)
* llama : add support for Deepseek-R1-Qwen distill model

* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces (#11305) 2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE (#11305)
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled (#11300)
* tests : increase timeout when sanitizers are enabled

* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message (#11278) 2025-01-19 18:12:09 +02:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool (#11251)
* Implement host pool for matrix_info

Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp

* Remove unnecessary headers and cast

* Reorder member variable to avoid warning on initialization

* Formatting

* Remove unused variable

* Address PR review feedback - remove warning

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
ochafik
0401a83b9b agent: add --greedy, --top-p, --top-k options 2025-01-19 02:07:06 +00:00
ochafik
c207fdcde6 Merge branch 'jinja' into tool-call 2025-01-18 18:05:11 +00:00
ochafik
cc50356470 minja: fix vigogne (https://github.com/google/minja/pull/22) 2025-01-18 17:55:04 +00:00
ochafik
e3c475cd12 Disable jinja test that has a cryptic windows failure 2025-01-18 14:55:27 +00:00
ochafik
d6f058da8c Merge branch 'jinja' into tool-call 2025-01-18 14:54:57 +00:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run (#11252)
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:

https://github.com/ericcurtin/linenoise.cpp

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp (#11279)
* cmake : add sanitizer flags for llama.cpp

ggml-ci

* tests : fix compile warnings

ggml-ci

* cmake : move sanitizer flags to llama_add_compile_flags

ggml-ci

* cmake : move llama.cpp compile flags to top level lists

ggml-ci

* cmake : apply only sanitizer flags at top level

ggml-ci

* tests : fix gguf context use in same_tensor_data

* gguf-test: tensor data comparison

* dummy : trigger ggml-ci

* unicode : silence gcc warnings

ggml-ci

* ci : use sanitizer builds only in Debug mode

ggml-ci

* cmake : add status messages [no ci]

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request (#11285)
* server : implement cancellable request

* fix typo

* httplib 0.18.5

* fix i underflow
2025-01-18 14:12:05 +01:00
ochafik
0e74c9dabe Add missing optional include to server.cpp 2025-01-18 11:58:00 +00:00
ochafik
fc60802b6e Rm unused optional include 2025-01-18 11:35:54 +00:00