Xuan Son Nguyen
e28245f35f
export-lora : fix tok_embd tensor ( #11330 )
2025-01-21 14:07:12 +01:00
Radoslav Gerganov
6da5bec81c
rpc : better caching of the base buffer pointer ( #11331 )
...
There is no need to use map, just store the base pointer in the buffer
context.
2025-01-21 15:06:41 +02:00
Eric Curtin
2e2f8f093c
linenoise.cpp refactoring ( #11301 )
...
More RAII mainly
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-21 09:32:35 +00:00
Georgi Gerganov
2139667ec4
metal : fix out-of-bounds write ( #11314 )
...
ggml-ci
2025-01-21 08:48:13 +02:00
ochafik
c606255948
Merge branch 'jinja' into tool-call
2025-01-21 03:49:30 +00:00
ochafik
9d8ebd62c6
Update minja from https://github.com/google/minja/pull/27
2025-01-21 03:18:06 +00:00
ochafik
ba8dd66fdf
Merge branch 'jinja' into tool-call
2025-01-21 01:43:14 +00:00
ochafik
ff2cce57ad
Update minja to https://github.com/google/minja/pull/25
2025-01-21 01:26:19 +00:00
ochafik
56aa93c266
fix std imports for gcc build
2025-01-21 00:08:22 +00:00
ochafik
7ea6a06cde
Merge branch 'jinja' into tool-call
2025-01-20 23:59:24 +00:00
ochafik
8347da907d
Update minja to b8437df626
2025-01-20 23:59:15 +00:00
ochafik
b110374714
apply renames from jinja branch
2025-01-20 23:59:01 +00:00
ochafik
9bab6939cd
Merge branch 'jinja' into tool-call
2025-01-20 23:55:12 +00:00
ochafik
8a7c89e60c
reinstate assert on chat_templates.template_default
2025-01-20 23:44:42 +00:00
ochafik
ee475d2f51
rename: common_chat_template[s]
2025-01-20 23:42:07 +00:00
ochafik
8348c605ac
Warn against missing eos / bos tokens when jinja template references them
2025-01-20 23:00:47 +00:00
ochafik
54a669e09e
Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
2025-01-20 22:50:08 +00:00
ochafik
099f983949
Merge remote-tracking branch 'origin/master' into jinja
2025-01-20 21:58:04 +00:00
ochafik
154bfaaa39
Refactor chat template validation
2025-01-20 21:54:34 +00:00
ochafik
8c84aefd4d
Update --chat-template-file w/ recent change to --chat-template
2025-01-20 21:48:31 +00:00
ochafik
c9e8fdd70e
Move chat_templates inside server_context + remove mutex
2025-01-20 21:25:18 +00:00
ochafik
db9dd0c1ac
Finish suggested renamings
2025-01-20 21:06:18 +00:00
Olivier Chafik
153e852411
Apply suggestions from code review
...
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-20 20:55:52 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model ( #11318 )
...
* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes
2025-01-20 22:29:43 +02:00
Jeff Bolz
aea8ddd516
vulkan: fix coopmat2 validation failures ( #11284 )
...
mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.
coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.
2025-01-20 10:38:32 -06:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions ( #11311 )
2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno ( #11296 )
...
ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script ( #11309 )
2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled ( #11300 )
...
* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message ( #11278 )
2025-01-19 18:12:09 +02:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool ( #11251 )
...
* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
ochafik
0401a83b9b
agent: add --greedy, --top-p, --top-k options
2025-01-19 02:07:06 +00:00
ochafik
c207fdcde6
Merge branch 'jinja' into tool-call
2025-01-18 18:05:11 +00:00
ochafik
cc50356470
minja: fix vigogne ( https://github.com/google/minja/pull/22 )
2025-01-18 17:55:04 +00:00
ochafik
e3c475cd12
Disable jinja test that has a cryptic windows failure
2025-01-18 14:55:27 +00:00
ochafik
d6f058da8c
Merge branch 'jinja' into tool-call
2025-01-18 14:54:57 +00:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run ( #11252 )
...
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request ( #11285 )
...
* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow
2025-01-18 14:12:05 +01:00
ochafik
0e74c9dabe
Add missing optional include to server.cpp
2025-01-18 11:58:00 +00:00
ochafik
fc60802b6e
Rm unused optional include
2025-01-18 11:35:54 +00:00
ochafik
76893f5880
Merge branch 'jinja' into tool-call
2025-01-18 11:26:56 +00:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
ochafik
5074e6fecd
Fix copy elision warning
2025-01-18 10:48:03 +00:00
ochafik
33322e823e
Flush stdout in chat template before potential crash
2025-01-18 10:38:21 +00:00
ochafik
e63520f37a
Forward decl minja::chat_template to avoid eager json dep
2025-01-18 10:37:56 +00:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support ( #11186 )
...
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00