ochafik
f3e9f8b62a
fix test_thoughts
2025-02-05 12:34:27 +00:00
ochafik
d20c2ce4e7
Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall
2025-02-05 12:16:42 +00:00
ochafik
9d7c3cc51b
--think to force any model to return reasoning_content (or just parse <think> for deepseek r1)
2025-02-05 12:16:37 +00:00
Olivier Chafik
1f1f06aa26
Merge branch 'master' into r1-toolcall
2025-02-05 01:10:45 +00:00
Olivier Chafik
9f4cc8f8d3
sync
: minja (#11641 )
...
* `sync`: minja
182de30cda
https://github.com/google/minja/pull/46
https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support ( #11659 )
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-04 22:21:42 +01:00
fxzjshm
3ec9fd4b77
HIP: force max threads per block to be 1024 ( #11621 )
...
Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.
Signed-off-by: fxzjshm <fxzjshm@163.com>
2025-02-04 19:18:38 +01:00
Olivier Chafik
5d60cebbcc
Update test_tool_call.py
2025-02-04 17:48:29 +00:00
Xuan-Son Nguyen
3962fc1a79
server : add try..catch to places not covered by set_exception_handler ( #11620 )
...
* server : add try..catch to places not covered by set_exception_handler
* log_server_request: rm try catch, add reminder
2025-02-04 18:25:42 +01:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices ( #11655 )
...
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.
ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
933f7a186e
Merge branch 'master' into r1-toolcall
2025-02-04 15:56:25 +00:00
Olivier Chafik
db288b60cb
tool-call
: command r7b fix for normal responses (#11608 )
...
* fix command r7b normal response regex + add to server test
* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Olivier Chafik
b2d17287aa
update readme section about common model tool call formats
...
./build/bin/test-chat ../minja/build/tests/*.jinja 2>/dev/null
2025-02-04 14:27:38 +00:00
Olivier Chafik
39c1d8163b
return thoughts in reasoning_content field
2025-02-04 11:37:09 +00:00
Shelby Jenkins
106045e7bb
readme : add llm_client Rust crate to readme bindings ( #11628 )
...
[This crate](https://github.com/ShelbyJenkins/llm_client ) has been in a usable state for quite awhile, so I figured now is fair to add it.
It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.
It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.
So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.
2025-02-04 13:20:55 +02:00
Jhen-Jie Hong
f117d84b48
swift : fix llama-vocab api usage ( #11645 )
...
* swiftui : fix vocab api usage
* batched.swift : fix vocab api usage
2025-02-04 13:15:24 +02:00
Jhen-Jie Hong
534c46b53c
metal : use residency set for other platforms ( #11648 )
2025-02-04 13:07:18 +02:00
Georgi Gerganov
387a1598ca
authors : update
2025-02-04 13:04:10 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml
2025-02-04 12:59:21 +02:00
Christian Kastner
8f8290ada9
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
...
This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.
This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.
2025-02-04 12:59:15 +02:00
ochafik
d1b66910c5
r1: revert making <|tool▁calls▁begin|> optional as somehow sampling triggers us on "<|tool▁call▁begin|><", which is already invalid per the grammar
2025-02-04 10:38:03 +00:00
ochafik
0db9881285
Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggering on just <|tool▁call▁begin|> for 7B's sake)
2025-02-04 10:30:10 +00:00
ochafik
b5b117fa1c
Merge branch 'sync-minja-4' into r1-toolcall
2025-02-04 09:45:27 +00:00
Georgi Gerganov
b34aedd558
ci : do not stale-close roadmap issues
2025-02-04 09:31:01 +02:00
ochafik
21f207156f
Update chat.cpp
2025-02-04 05:16:23 +00:00
ochafik
438ce0b8a1
fix test-chat
2025-02-04 04:51:36 +00:00
ochafik
1f5ec59809
ensure deepseek r1 thoughts parsed even w/o tool calls
2025-02-04 04:48:08 +00:00
ochafik
b6e14a4101
fix mistral expectation
2025-02-04 04:26:49 +00:00
ochafik
d44eb95c67
tool-call: ensure we don't return content when there are tool calls / warn
2025-02-04 04:18:49 +00:00
ochafik
812544ab8b
server: check that content is null when we get tool_calls
2025-02-04 04:14:15 +00:00
ochafik
d43e4f6c22
Merge branch 'sync-minja-4' into r1-toolcall
2025-02-04 04:05:02 +00:00
ochafik
f12e3507f7
Update chat.cpp
2025-02-04 04:02:18 +00:00
ochafik
56a14ddc83
fix mistral chat test: need empty tokens
2025-02-04 04:01:35 +00:00
ochafik
b1527292b6
Update test-chat.cpp
2025-02-04 03:56:03 +00:00
ochafik
09caa63451
sync
: minja
...
182de30cda
2025-02-04 03:52:59 +00:00
ochafik
86994db697
fix spaces
2025-02-04 03:47:52 +00:00
ochafik
78b47bb0e9
fix test_calc_result
2025-02-04 03:46:26 +00:00
ochafik
326e7002b3
update test_calc_result
2025-02-04 03:13:13 +00:00
ochafik
f0154a6479
Fix / test models/templates/llama-cpp-deepseek-r1.jinja
2025-02-04 03:09:15 +00:00
ochafik
a682d1216d
fix / test parsing of r1 parser
2025-02-04 02:23:31 +00:00
ochafik
9a6847c857
move trigger_words init inside non-llguidance branch
2025-02-04 01:13:01 +00:00
ochafik
18a11f43f0
tool-call: r1: fix grammar
2025-02-04 01:12:44 +00:00
ochafik
e84ee88f50
r1: fix inadvertent newline in grammar before <|tool▁call▁end|>
2025-02-04 00:36:38 +00:00
Olivier Chafik
ce28224de8
tool-call: r1: add one more trigger approx "<|tool calls begin|>"
2025-02-04 00:28:40 +00:00
Olivier Chafik
bff549deb6
simplify hack to fix original template's backfill from minja
2025-02-04 00:14:48 +00:00
Olivier Chafik
bbd45bf6a2
sync: minja
2025-02-04 00:14:15 +00:00
Olivier Chafik
30ea3591c9
update to minja's new api
2025-02-03 23:53:27 +00:00
Olivier Chafik
11c1f0c7d4
actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options
2025-02-03 23:52:28 +00:00
Olivier Chafik
bc6d910f6d
Merge branch 'master' into r1-toolcall
2025-02-03 23:51:31 +00:00
Olivier Chafik
cde3833239
tool-call
: allow --chat-template chatml
w/ --jinja
, default to chatml upon parsing issue, avoid double bos (#11616 )
...
* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic
2025-02-03 23:49:27 +00:00