ochafik
b110374714
apply renames from jinja branch
2025-01-20 23:59:01 +00:00
ochafik
9bab6939cd
Merge branch 'jinja' into tool-call
2025-01-20 23:55:12 +00:00
ochafik
8a7c89e60c
reinstate assert on chat_templates.template_default
2025-01-20 23:44:42 +00:00
ochafik
ee475d2f51
rename: common_chat_template[s]
2025-01-20 23:42:07 +00:00
ochafik
8348c605ac
Warn against missing eos / bos tokens when jinja template references them
2025-01-20 23:00:47 +00:00
ochafik
54a669e09e
Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
2025-01-20 22:50:08 +00:00
ochafik
099f983949
Merge remote-tracking branch 'origin/master' into jinja
2025-01-20 21:58:04 +00:00
ochafik
154bfaaa39
Refactor chat template validation
2025-01-20 21:54:34 +00:00
ochafik
8c84aefd4d
Update --chat-template-file w/ recent change to --chat-template
2025-01-20 21:48:31 +00:00
ochafik
c9e8fdd70e
Move chat_templates inside server_context + remove mutex
2025-01-20 21:25:18 +00:00
ochafik
db9dd0c1ac
Finish suggested renamings
2025-01-20 21:06:18 +00:00
Olivier Chafik
153e852411
Apply suggestions from code review
...
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-20 20:55:52 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model ( #11318 )
...
* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes
2025-01-20 22:29:43 +02:00
Jeff Bolz
aea8ddd516
vulkan: fix coopmat2 validation failures ( #11284 )
...
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.
coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-01-20 10:38:32 -06:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions ( #11311 )
2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno ( #11296 )
...
ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script ( #11309 )
2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled ( #11300 )
...
* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message ( #11278 )
2025-01-19 18:12:09 +02:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool ( #11251 )
...
* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
ochafik
0401a83b9b
agent: add --greedy, --top-p, --top-k options
2025-01-19 02:07:06 +00:00
ochafik
c207fdcde6
Merge branch 'jinja' into tool-call
2025-01-18 18:05:11 +00:00
ochafik
cc50356470
minja: fix vigogne ( https://github.com/google/minja/pull/22 )
2025-01-18 17:55:04 +00:00
ochafik
e3c475cd12
Disable jinja test that has a cryptic windows failure
2025-01-18 14:55:27 +00:00
ochafik
d6f058da8c
Merge branch 'jinja' into tool-call
2025-01-18 14:54:57 +00:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run ( #11252 )
...
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request ( #11285 )
...
* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow
2025-01-18 14:12:05 +01:00
ochafik
0e74c9dabe
Add missing optional include to server.cpp
2025-01-18 11:58:00 +00:00
ochafik
fc60802b6e
Rm unused optional include
2025-01-18 11:35:54 +00:00
ochafik
76893f5880
Merge branch 'jinja' into tool-call
2025-01-18 11:26:56 +00:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
ochafik
5074e6fecd
Fix copy elision warning
2025-01-18 10:48:03 +00:00
ochafik
33322e823e
Flush stdout in chat template before potential crash
2025-01-18 10:38:21 +00:00
ochafik
e63520f37a
Forward decl minja::chat_template to avoid eager json dep
2025-01-18 10:37:56 +00:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support ( #11186 )
...
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
Jeff Bolz
44e18ef939
vulkan: fix coopmat2 flash attention for non-contiguous inputs ( #11281 )
...
Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.
Add noncontiguous FA tests in test-backend-ops.
Fixes #11268 .
2025-01-18 09:26:50 +01:00
ochafik
ee1e10e21e
Normalize newlines in test-chat-templates for windows tests
2025-01-18 02:52:40 +00:00
ochafik
acf7c240d8
tools: run tool call slow tests when SLOW_TESTS=1 (+ prefetch models)
2025-01-18 02:39:37 +00:00
ochafik
259d9e4511
tools: greedy sampling in tests
2025-01-18 02:39:10 +00:00
ochafik
2ceabee0f8
Fix fetch_server_test_models.py (avoid conv trap)
2025-01-18 01:36:46 +00:00
ochafik
045edd1d7e
Merge branch 'jinja' into tool-call
2025-01-18 01:04:57 +00:00
ochafik
d5fa351a24
Revert LLAMA_CHATML_TEMPLATE refactor
2025-01-18 01:04:12 +00:00
ochafik
138a4ba83f
Merge branch 'jinja' into tool-call
2025-01-18 00:59:10 +00:00
ochafik
81c0d437a5
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
2025-01-18 00:56:19 +00:00
ochafik
40db78963b
Merge remote-tracking branch 'origin/master' into jinja
2025-01-18 00:44:37 +00:00
ochafik
b75d0622e4
Refactor common_chat_* functions to accept minja template + use_jinja option
2025-01-18 00:43:38 +00:00