Commit graph

4772 commits

Author SHA1 Message Date
Jeff Bolz
1b598b3058
vulkan: use smaller combined allocations to avoid fragmentation (#11551) 2025-02-06 07:02:18 +01:00
Charles Duffy
902368a06b
metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690)
Avoids breakage in nix flake build introduced by b0569130c5
2025-02-06 09:52:31 +08:00
Matvey Soloviev
c3db0480bb
readme : add link to Autopen under UIs (#11684)
Autopen (https://github.com/blackhole89/autopen) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.
2025-02-06 01:55:25 +01:00
Olivier Chafik
d1a064070f revert tool example backfill change - command 7rb just needs the right template 2025-02-05 16:33:37 +00:00
Olivier Chafik
994301da12 use existing string_strip 2025-02-05 16:33:16 +00:00
Olivier Chafik
33efcb3c59 Update README.md 2025-02-05 16:20:11 +00:00
Olivier Chafik
098629df15 disable some failing chatml tests 2025-02-05 16:15:19 +00:00
Olivier Chafik
0917e0a80d fix --think arg env 2025-02-05 16:15:09 +00:00
Olivier Chafik
39b50c37dc Update README.md 2025-02-05 15:53:48 +00:00
Olivier Chafik
e6d9b52480 align Command R7B w/ --think / reasoning_content behaviour 2025-02-05 15:47:37 +00:00
Olivier Chafik
3841a163ef fix compiler warning about parens 2025-02-05 13:05:27 +00:00
ochafik
f3e9f8b62a fix test_thoughts 2025-02-05 12:34:27 +00:00
ochafik
d20c2ce4e7 Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall 2025-02-05 12:16:42 +00:00
ochafik
9d7c3cc51b --think to force any model to return reasoning_content (or just parse <think> for deepseek r1) 2025-02-05 12:16:37 +00:00
Georgi Gerganov
d774ab3acc
metal : adjust support conditions for norm operators (#11671)
cont #11659

ggml-ci
2025-02-05 10:57:42 +02:00
Johannes Gäßler
fa62da9b2d
CUDA: support for mat. mul. with ne03 != ne13 (#11656) 2025-02-05 08:58:31 +01:00
SAMI
1ec208083c
llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)
* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file

* Fixed the gcc warning regarding minor linting

* Removed trailing whitespace
2025-02-05 10:45:40 +03:00
Olivier Chafik
1f1f06aa26
Merge branch 'master' into r1-toolcall 2025-02-05 01:10:45 +00:00
Olivier Chafik
9f4cc8f8d3
sync: minja (#11641)
* `sync`: minja

182de30cda

https://github.com/google/minja/pull/46

https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support (#11659)
* CUDA: non-contiguous (RMS) norm support

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-04 22:21:42 +01:00
fxzjshm
3ec9fd4b77
HIP: force max threads per block to be 1024 (#11621)
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-04 19:18:38 +01:00
Olivier Chafik
5d60cebbcc Update test_tool_call.py 2025-02-04 17:48:29 +00:00
Xuan-Son Nguyen
3962fc1a79
server : add try..catch to places not covered by set_exception_handler (#11620)
* server : add try..catch to places not covered by set_exception_handler

* log_server_request: rm try catch, add reminder
2025-02-04 18:25:42 +01:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices (#11655)
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.

ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
933f7a186e Merge branch 'master' into r1-toolcall 2025-02-04 15:56:25 +00:00
Olivier Chafik
db288b60cb
tool-call: command r7b fix for normal responses (#11608)
* fix command r7b normal response regex + add to server test

* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Olivier Chafik
b2d17287aa update readme section about common model tool call formats
./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null
2025-02-04 14:27:38 +00:00
Olivier Chafik
39c1d8163b return thoughts in reasoning_content field 2025-02-04 11:37:09 +00:00
Shelby Jenkins
106045e7bb
readme : add llm_client Rust crate to readme bindings (#11628)
[This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it.

It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.

It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.

So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.
2025-02-04 13:20:55 +02:00
Jhen-Jie Hong
f117d84b48
swift : fix llama-vocab api usage (#11645)
* swiftui : fix vocab api usage

* batched.swift : fix vocab api usage
2025-02-04 13:15:24 +02:00
Jhen-Jie Hong
534c46b53c
metal : use residency set for other platforms (#11648) 2025-02-04 13:07:18 +02:00
Georgi Gerganov
387a1598ca
authors : update 2025-02-04 13:04:10 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml 2025-02-04 12:59:21 +02:00
Christian Kastner
8f8290ada9
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.

This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 12:59:15 +02:00
ochafik
d1b66910c5 r1: revert making <|tool▁calls▁begin|> optional as somehow sampling triggers us on "<|tool▁call▁begin|><", which is already invalid per the grammar 2025-02-04 10:38:03 +00:00
ochafik
0db9881285 Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggering on just <|tool▁call▁begin|> for 7B's sake) 2025-02-04 10:30:10 +00:00
ochafik
b5b117fa1c Merge branch 'sync-minja-4' into r1-toolcall 2025-02-04 09:45:27 +00:00
Georgi Gerganov
b34aedd558
ci : do not stale-close roadmap issues 2025-02-04 09:31:01 +02:00
ochafik
21f207156f Update chat.cpp 2025-02-04 05:16:23 +00:00
ochafik
438ce0b8a1 fix test-chat 2025-02-04 04:51:36 +00:00
ochafik
1f5ec59809 ensure deepseek r1 thoughts parsed even w/o tool calls 2025-02-04 04:48:08 +00:00
ochafik
b6e14a4101 fix mistral expectation 2025-02-04 04:26:49 +00:00
ochafik
d44eb95c67 tool-call: ensure we don't return content when there are tool calls / warn 2025-02-04 04:18:49 +00:00
ochafik
812544ab8b server: check that content is null when we get tool_calls 2025-02-04 04:14:15 +00:00
ochafik
d43e4f6c22 Merge branch 'sync-minja-4' into r1-toolcall 2025-02-04 04:05:02 +00:00
ochafik
f12e3507f7 Update chat.cpp 2025-02-04 04:02:18 +00:00
ochafik
56a14ddc83 fix mistral chat test: need empty tokens 2025-02-04 04:01:35 +00:00
ochafik
b1527292b6 Update test-chat.cpp 2025-02-04 03:56:03 +00:00
ochafik
09caa63451 sync: minja
182de30cda
2025-02-04 03:52:59 +00:00
ochafik
86994db697 fix spaces 2025-02-04 03:47:52 +00:00