Commit graph

441 commits

Author SHA1 Message Date
ochafik
cc2c712cf9 Merge remote-tracking branch 'origin/master' into r1-toolcall 2025-02-08 14:35:10 +00:00
Christian Fillion
7ee953a64a
llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727)
The C API in llama.h claims users can implement `llama_sampler_i` to
create custom `llama_sampler`. The sampler chain takes ownership and
calls `llama_sampler_free` on them. However, `llama_sampler_free` is
hard-coded to use `delete`. This is undefined behavior if the object
wasn't also allocated via `new` from libllama's C++ runtime. Callers
in C and C-compatible languages do not use C++'s `new` operator. C++
callers may not be sharing the same heap as libllama.
2025-02-07 11:33:27 +02:00
Daniel Bevenius
b7552cfcbc
common : add default embeddings presets (#11677)
* common : add default embeddings presets

This commit adds default embeddings presets for the following models:
- bge-small-en-v1.5
- e5-small-v2
- gte-small

These can be used with llama-embedding and llama-server.

For example, with llama-embedding:
```console
./build/bin/llama-embedding --embd-gte-small-default -p "Hello, how are you?"
```

And with llama-server:
```console
./build/bin/llama-server --embd-gte-small-default
```
And the embeddings endpoint can then be called with a POST request:
```console
curl --request POST \
    --url http://localhost:8080/embeddings \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello, how are you?"}'
```

I'm not sure if these are the most common embedding models but hopefully
this can be a good starting point for discussion and further
improvements.

Refs: https://github.com/ggerganov/llama.cpp/issues/10932
2025-02-07 09:15:22 +01:00
Olivier Chafik
d1a064070f revert tool example backfill change - command 7rb just needs the right template 2025-02-05 16:33:37 +00:00
Olivier Chafik
994301da12 use existing string_strip 2025-02-05 16:33:16 +00:00
Olivier Chafik
0917e0a80d fix --think arg env 2025-02-05 16:15:09 +00:00
Olivier Chafik
e6d9b52480 align Command R7B w/ --think / reasoning_content behaviour 2025-02-05 15:47:37 +00:00
Olivier Chafik
3841a163ef fix compiler warning about parens 2025-02-05 13:05:27 +00:00
ochafik
f3e9f8b62a fix test_thoughts 2025-02-05 12:34:27 +00:00
ochafik
d20c2ce4e7 Merge branch 'r1-toolcall' of github.com:ochafik/llama.cpp into r1-toolcall 2025-02-05 12:16:42 +00:00
ochafik
9d7c3cc51b --think to force any model to return reasoning_content (or just parse <think> for deepseek r1) 2025-02-05 12:16:37 +00:00
Olivier Chafik
1f1f06aa26
Merge branch 'master' into r1-toolcall 2025-02-05 01:10:45 +00:00
Olivier Chafik
9f4cc8f8d3
sync: minja (#11641)
* `sync`: minja

182de30cda

https://github.com/google/minja/pull/46

https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices (#11655)
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.

ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
933f7a186e Merge branch 'master' into r1-toolcall 2025-02-04 15:56:25 +00:00
Olivier Chafik
db288b60cb
tool-call: command r7b fix for normal responses (#11608)
* fix command r7b normal response regex + add to server test

* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Olivier Chafik
39c1d8163b return thoughts in reasoning_content field 2025-02-04 11:37:09 +00:00
ochafik
d1b66910c5 r1: revert making <|tool▁calls▁begin|> optional as somehow sampling triggers us on "<|tool▁call▁begin|><", which is already invalid per the grammar 2025-02-04 10:38:03 +00:00
ochafik
0db9881285 Fix r1 grammar since we made <|tool▁calls▁begin|> optional (triggering on just <|tool▁call▁begin|> for 7B's sake) 2025-02-04 10:30:10 +00:00
ochafik
b5b117fa1c Merge branch 'sync-minja-4' into r1-toolcall 2025-02-04 09:45:27 +00:00
ochafik
21f207156f Update chat.cpp 2025-02-04 05:16:23 +00:00
ochafik
438ce0b8a1 fix test-chat 2025-02-04 04:51:36 +00:00
ochafik
1f5ec59809 ensure deepseek r1 thoughts parsed even w/o tool calls 2025-02-04 04:48:08 +00:00
ochafik
d44eb95c67 tool-call: ensure we don't return content when there are tool calls / warn 2025-02-04 04:18:49 +00:00
ochafik
d43e4f6c22 Merge branch 'sync-minja-4' into r1-toolcall 2025-02-04 04:05:02 +00:00
ochafik
f12e3507f7 Update chat.cpp 2025-02-04 04:02:18 +00:00
ochafik
56a14ddc83 fix mistral chat test: need empty tokens 2025-02-04 04:01:35 +00:00
ochafik
09caa63451 sync: minja
182de30cda
2025-02-04 03:52:59 +00:00
ochafik
f0154a6479 Fix / test models/templates/llama-cpp-deepseek-r1.jinja 2025-02-04 03:09:15 +00:00
ochafik
a682d1216d fix / test parsing of r1 parser 2025-02-04 02:23:31 +00:00
ochafik
9a6847c857 move trigger_words init inside non-llguidance branch 2025-02-04 01:13:01 +00:00
ochafik
18a11f43f0 tool-call: r1: fix grammar 2025-02-04 01:12:44 +00:00
ochafik
e84ee88f50 r1: fix inadvertent newline in grammar before <|tool▁call▁end|> 2025-02-04 00:36:38 +00:00
Olivier Chafik
ce28224de8 tool-call: r1: add one more trigger approx "<|tool calls begin|>" 2025-02-04 00:28:40 +00:00
Olivier Chafik
bff549deb6 simplify hack to fix original template's backfill from minja 2025-02-04 00:14:48 +00:00
Olivier Chafik
bbd45bf6a2 sync: minja 2025-02-04 00:14:15 +00:00
Olivier Chafik
30ea3591c9 update to minja's new api 2025-02-03 23:53:27 +00:00
Olivier Chafik
11c1f0c7d4 actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options 2025-02-03 23:52:28 +00:00
Olivier Chafik
cde3833239
tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616)
* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Olivier Chafik
108da907f0 sync: minja https://github.com/google/minja/pull/46 2025-02-03 23:31:49 +00:00
Olivier Chafik
1c302e18ba simpler hacky fixes for original broken template (+ fix minja example syntax polyfill) 2025-02-03 20:34:44 +00:00
Olivier Chafik
c6214ee9d6 rm unneeded vocab 2025-02-03 19:59:50 +00:00
Olivier Chafik
7dc271fb37 tool-calls: add deepseek r1 template + accommodate broken official template slightly better 2025-02-03 19:59:33 +00:00
Olivier Chafik
0be7f652e9 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 19:35:54 +00:00
Olivier Chafik
d73448de1c Simplify default chatml logic 2025-02-03 19:22:53 +00:00
Olivier Chafik
569610ee77 tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out 2025-02-03 18:57:55 +00:00
Olivier Chafik
c397bd1f5f tweak delta logic 2025-02-03 17:57:38 +00:00
Olivier Chafik
df3474e2c2 tool-calls: r1: add missing <|tool▁calls▁end|> to grammar! 2025-02-03 17:33:14 +00:00
Olivier Chafik
08271b5505 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 17:32:38 +00:00
Olivier Chafik
b2dd490926 add missing try catch around jinja parsing to default to chatml 2025-02-03 17:32:12 +00:00