Jeff Bolz
1b598b3058
vulkan: use smaller combined allocations to avoid fragmentation ( #11551 )
2025-02-06 07:02:18 +01:00
Charles Duffy
902368a06b
metal : avoid breaking build when metal API predates TARGET_OS_VISION ( #11690 )
...
Avoids breakage in nix flake build introduced by b0569130c5
2025-02-06 09:52:31 +08:00
Matvey Soloviev
c3db0480bb
readme : add link to Autopen under UIs ( #11684 )
...
Autopen (https://github.com/blackhole89/autopen ) is a graphical text editor that uses llama.cpp to tokenize the buffer on the fly, score the buffer, visualise token logits and allow you to switch back and forth between different possible completions at any point. It hopefully meets the criteria for inclusion, as the dependency on llama.cpp is stated prominently.
2025-02-06 01:55:25 +01:00
Olivier Chafik
d1a064070f
revert tool example backfill change - command 7rb just needs the right template
2025-02-05 16:33:37 +00:00
Olivier Chafik
994301da12
use existing string_strip
2025-02-05 16:33:16 +00:00
Olivier Chafik
33efcb3c59
Update README.md
2025-02-05 16:20:11 +00:00
Olivier Chafik
098629df15
disable some failing chatml tests
2025-02-05 16:15:19 +00:00
Olivier Chafik
0917e0a80d
fix --think arg env
2025-02-05 16:15:09 +00:00
Olivier Chafik
39b50c37dc
Update README.md
2025-02-05 15:53:48 +00:00
Olivier Chafik
e6d9b52480
align Command R7B w/ --think / reasoning_content behaviour
2025-02-05 15:47:37 +00:00
Olivier Chafik
3841a163ef
fix compiler warning about parens
2025-02-05 13:05:27 +00:00
ochafik
f3e9f8b62a
fix test_thoughts
2025-02-05 12:34:27 +00:00
ochafik
d20c2ce4e7
Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall
2025-02-05 12:16:42 +00:00
ochafik
9d7c3cc51b
--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)
2025-02-05 12:16:37 +00:00
Georgi Gerganov
d774ab3acc
metal : adjust support conditions for norm operators ( #11671 )
...
cont #11659
ggml-ci
2025-02-05 10:57:42 +02:00
Johannes Gäßler
fa62da9b2d
CUDA: support for mat. mul. with ne03 != ne13 ( #11656 )
2025-02-05 08:58:31 +01:00
SAMI
1ec208083c
llava: add quantization for the visual projector LLAVA, Qwen2VL ( #11644 )
...
* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file
* Fixed the gcc warning regarding minor linting
* Removed trailing whitespace
2025-02-05 10:45:40 +03:00
Olivier Chafik
1f1f06aa26
Merge branch 'master' into r1-toolcall
2025-02-05 01:10:45 +00:00
Olivier Chafik
9f4cc8f8d3
sync
: minja (#11641 )
...
* `sync`: minja
182de30cda
https://github.com/google/minja/pull/46
https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support ( #11659 )
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-04 22:21:42 +01:00
fxzjshm
3ec9fd4b77
HIP: force max threads per block to be 1024 ( #11621 )
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-04 19:18:38 +01:00
Olivier Chafik
5d60cebbcc
Update test_tool_call.py
2025-02-04 17:48:29 +00:00
Xuan-Son Nguyen
3962fc1a79
server : add try..catch to places not covered by set_exception_handler ( #11620 )
...
* server : add try..catch to places not covered by set_exception_handler
* log_server_request: rm try catch, add reminder
2025-02-04 18:25:42 +01:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices ( #11655 )
...
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.
ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
933f7a186e
Merge branch 'master' into r1-toolcall
2025-02-04 15:56:25 +00:00
Olivier Chafik
db288b60cb
tool-call
: command r7b fix for normal responses (#11608 )
...
* fix command r7b normal response regex + add to server test
* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Olivier Chafik
b2d17287aa
update readme section about common model tool call formats
...
./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null
2025-02-04 14:27:38 +00:00
Olivier Chafik
39c1d8163b
return thoughts in reasoning_content field
2025-02-04 11:37:09 +00:00
Shelby Jenkins
106045e7bb
readme : add llm_client Rust crate to readme bindings ( #11628 )
...
[This crate](https://github.com/ShelbyJenkins/llm_client ) has been in a usable state for quite awhile, so I figured now is fair to add it.
It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.
It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.
So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.
2025-02-04 13:20:55 +02:00
Jhen-Jie Hong
f117d84b48
swift : fix llama-vocab api usage ( #11645 )
...
* swiftui : fix vocab api usage
* batched.swift : fix vocab api usage
2025-02-04 13:15:24 +02:00
Jhen-Jie Hong
534c46b53c
metal : use residency set for other platforms ( #11648 )
2025-02-04 13:07:18 +02:00
Georgi Gerganov
387a1598ca
authors : update
2025-02-04 13:04:10 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml
2025-02-04 12:59:21 +02:00
Christian Kastner
8f8290ada9
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
...
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.
This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 12:59:15 +02:00
ochafik
d1b66910c5
r1: revert making <|tool▁calls▁begin|> optional as somehow sampling triggers us on "<|tool▁call▁begin|><", which is already invalid per the grammar
2025-02-04 10:38:03 +00:00
ochafik
0db9881285
Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggering on just <|tool▁call▁begin|> for 7B's sake)
2025-02-04 10:30:10 +00:00
ochafik
b5b117fa1c
Merge branch 'sync-minja-4' into r1-toolcall
2025-02-04 09:45:27 +00:00
Georgi Gerganov
b34aedd558
ci : do not stale-close roadmap issues
2025-02-04 09:31:01 +02:00
ochafik
21f207156f
Update chat.cpp
2025-02-04 05:16:23 +00:00
ochafik
438ce0b8a1
fix test-chat
2025-02-04 04:51:36 +00:00
ochafik
1f5ec59809
ensure deepseek r1 thoughts parsed even w/o tool calls
2025-02-04 04:48:08 +00:00
ochafik
b6e14a4101
fix mistral expectation
2025-02-04 04:26:49 +00:00
ochafik
d44eb95c67
tool-call: ensure we don't return content when there are tool calls / warn
2025-02-04 04:18:49 +00:00
ochafik
812544ab8b
server: check that content is null when we get tool_calls
2025-02-04 04:14:15 +00:00
ochafik
d43e4f6c22
Merge branch 'sync-minja-4' into r1-toolcall
2025-02-04 04:05:02 +00:00
ochafik
f12e3507f7
Update chat.cpp
2025-02-04 04:02:18 +00:00
ochafik
56a14ddc83
fix mistral chat test: need empty tokens
2025-02-04 04:01:35 +00:00
ochafik
b1527292b6
Update test-chat.cpp
2025-02-04 03:56:03 +00:00
ochafik
09caa63451
sync
: minja
...
182de30cda
2025-02-04 03:52:59 +00:00
ochafik
86994db697
fix spaces
2025-02-04 03:47:52 +00:00