ochafik
9d8ebd62c6
Update minja from https://github.com/google/minja/pull/27
2025-01-21 03:18:06 +00:00
ochafik
ff2cce57ad
Update minja to https://github.com/google/minja/pull/25
2025-01-21 01:26:19 +00:00
ochafik
8347da907d
Update minja to b8437df626
2025-01-20 23:59:15 +00:00
ochafik
8a7c89e60c
reinstate assert on chat_templates.template_default
2025-01-20 23:44:42 +00:00
ochafik
ee475d2f51
rename: common_chat_template[s]
2025-01-20 23:42:07 +00:00
ochafik
8348c605ac
Warn against missing eos / bos tokens when jinja template references them
2025-01-20 23:00:47 +00:00
ochafik
54a669e09e
Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
2025-01-20 22:50:08 +00:00
ochafik
099f983949
Merge remote-tracking branch 'origin/master' into jinja
2025-01-20 21:58:04 +00:00
ochafik
154bfaaa39
Refactor chat template validation
2025-01-20 21:54:34 +00:00
ochafik
8c84aefd4d
Update --chat-template-file w/ recent change to --chat-template
2025-01-20 21:48:31 +00:00
ochafik
c9e8fdd70e
Move chat_templates inside server_context + remove mutex
2025-01-20 21:25:18 +00:00
ochafik
db9dd0c1ac
Finish suggested renamings
2025-01-20 21:06:18 +00:00
Olivier Chafik
153e852411
Apply suggestions from code review
...
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-20 20:55:52 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model ( #11318 )
...
* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes
2025-01-20 22:29:43 +02:00
Jeff Bolz
aea8ddd516
vulkan: fix coopmat2 validation failures ( #11284 )
...
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.
coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-01-20 10:38:32 -06:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions ( #11311 )
2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno ( #11296 )
...
ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script ( #11309 )
2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled ( #11300 )
...
* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message ( #11278 )
2025-01-19 18:12:09 +02:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool ( #11251 )
...
* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
ochafik
cc50356470
minja: fix vigogne ( https://github.com/google/minja/pull/22 )
2025-01-18 17:55:04 +00:00
ochafik
e3c475cd12
Disable jinja test that has a cryptic windows failure
2025-01-18 14:55:27 +00:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run ( #11252 )
...
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request ( #11285 )
...
* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow
2025-01-18 14:12:05 +01:00
ochafik
0e74c9dabe
Add missing optional include to server.cpp
2025-01-18 11:58:00 +00:00
ochafik
fc60802b6e
Rm unused optional include
2025-01-18 11:35:54 +00:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
ochafik
5074e6fecd
Fix copy elision warning
2025-01-18 10:48:03 +00:00
ochafik
33322e823e
Flush stdout in chat template before potential crash
2025-01-18 10:38:21 +00:00
ochafik
e63520f37a
Forward decl minja::chat_template to avoid eager json dep
2025-01-18 10:37:56 +00:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support ( #11186 )
...
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
Jeff Bolz
44e18ef939
vulkan: fix coopmat2 flash attention for non-contiguous inputs ( #11281 )
...
Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.
Add noncontiguous FA tests in test-backend-ops.
Fixes #11268 .
2025-01-18 09:26:50 +01:00
ochafik
ee1e10e21e
Normalize newlines in test-chat-templates for windows tests
2025-01-18 02:52:40 +00:00
ochafik
d5fa351a24
Revert LLAMA_CHATML_TEMPLATE refactor
2025-01-18 01:04:12 +00:00
ochafik
81c0d437a5
Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
2025-01-18 00:56:19 +00:00
ochafik
40db78963b
Merge remote-tracking branch 'origin/master' into jinja
2025-01-18 00:44:37 +00:00
ochafik
b75d0622e4
Refactor common_chat_* functions to accept minja template + use_jinja option
2025-01-18 00:43:38 +00:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message ( #11270 )
2025-01-17 14:57:56 +02:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
David Renshaw
960ec65273
llama : fix deprecation message: vocabable -> vocab ( #11269 )
2025-01-17 08:12:01 +01:00
musoles
7a689c415e
README : added kalavai to infrastructure list ( #11216 )
2025-01-17 01:10:49 +01:00
Jeff Bolz
bd38ddea01
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl ( #11166 )
...
* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl
Shaders are based on cpy.cu.
* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32
* ggml: copy q->f32 assumes some contiguity in the destination
2025-01-16 22:47:10 +01:00
Jeff Bolz
466300fe14
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. ( #11206 )
...
Do masking on whole dwords, fetch all scales at once.
2025-01-16 22:23:49 +01:00
Jeff Bolz
206bc53422
vulkan: optimize coopmat2 q2_k dequant function ( #11130 )
2025-01-16 22:16:39 +01:00