Olivier Chafik
bbd45bf6a2
sync: minja
2025-02-04 00:14:15 +00:00
Olivier Chafik
30ea3591c9
update to minja's new api
2025-02-03 23:53:27 +00:00
Olivier Chafik
11c1f0c7d4
actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options
2025-02-03 23:52:28 +00:00
Olivier Chafik
bc6d910f6d
Merge branch 'master' into r1-toolcall
2025-02-03 23:51:31 +00:00
Olivier Chafik
cde3833239
tool-call
: allow --chat-template chatml
w/ --jinja
, default to chatml upon parsing issue, avoid double bos (#11616 )
...
* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Olivier Chafik
108da907f0
sync: minja https://github.com/google/minja/pull/46
2025-02-03 23:31:49 +00:00
Xuan-Son Nguyen
b3451785ac
server : (webui) revert hacky solution from #11626 ( #11634 )
2025-02-04 00:10:52 +01:00
Woof Dog
1d1e6a90bc
server : (webui) allow typing and submitting during llm response ( #11626 )
2025-02-03 23:16:27 +01:00
Olivier Chafik
1c302e18ba
simpler hacky fixes for original broken template (+ fix minja example syntax polyfill)
2025-02-03 20:34:44 +00:00
Olivier Chafik
c6214ee9d6
rm unneeded vocab
2025-02-03 19:59:50 +00:00
Olivier Chafik
7dc271fb37
tool-calls: add deepseek r1 template + accommodate broken official template slightly better
2025-02-03 19:59:33 +00:00
Olivier Chafik
0be7f652e9
Merge branch 'jinja-chatml' into r1-toolcall
2025-02-03 19:35:54 +00:00
Olivier Chafik
d73448de1c
Simplify default chatml logic
2025-02-03 19:22:53 +00:00
Olivier Chafik
569610ee77
tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out
2025-02-03 18:57:55 +00:00
Olivier Chafik
c397bd1f5f
tweak delta logic
2025-02-03 17:57:38 +00:00
Olivier Chafik
df3474e2c2
tool-calls: r1: add missing <|tool▁calls▁end|> to grammar!
2025-02-03 17:33:14 +00:00
Olivier Chafik
08271b5505
Merge branch 'jinja-chatml' into r1-toolcall
2025-02-03 17:32:38 +00:00
Olivier Chafik
b2dd490926
add missing try catch around jinja parsing to default to chatml
2025-02-03 17:32:12 +00:00
Olivier Chafik
4cb0e1d873
Merge branch 'jinja-chatml' into r1-toolcall
2025-02-03 17:15:14 +00:00
Olivier Chafik
2b3c4829a3
fix build / rm diff
2025-02-03 16:34:43 +00:00
Daniel Bevenius
5598f475be
server : remove CPPHTTPLIB_NO_EXCEPTIONS define ( #11622 )
...
This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server
code.
The motivation for this is that when using a debug build the server
would crash when an exception was throws and terminate the server
process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set
cpp_httplib will not call the exception handler, which would normally
return a 500 error to the client. This caused tests to fail when using
a debug build.
Fixes: https://github.com/ggerganov/llama.cpp/issues/11613
2025-02-03 16:45:38 +01:00
Olivier Chafik
aa98e59038
fix bad merge
2025-02-03 14:01:49 +00:00
Olivier Chafik
5d18d76b69
fix double bos issue (drop bos/eos tokens from jinja template)
2025-02-03 13:59:16 +00:00
Olivier Chafik
cf83623a47
fix typo
2025-02-03 13:58:46 +00:00
Georgi Gerganov
8ec05832fa
sync : ggml
2025-02-03 14:57:08 +02:00
Johannes Gäßler
21c84b5d2d
CUDA: fix Volta FlashAttention logic ( #11615 )
2025-02-03 14:25:56 +02:00
ochafik
a76073cf88
minimize diffs
2025-02-03 10:58:52 +00:00
ochafik
77ae97e7d6
Update test_tool_call.py
2025-02-03 10:28:30 +00:00
mashdragon
d92cb67e37
server : (webui) Fix Shift+Enter handling ( #11609 )
...
* Fix Shift+Enter handling
`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway
* build index.html.gz
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-02-03 10:42:55 +01:00
ochafik
1e9acd2d31
tool-call: allow --jinja --chat-template chatml
2025-02-03 04:07:11 +00:00
ochafik
5e6f2a21ae
add deepseek models to server tool call section in readme
2025-02-03 02:44:42 +00:00
ochafik
19bea4ecc3
tell DS R1 not to overthink (weather test)
2025-02-03 02:24:30 +00:00
ochafik
ae9d5812a7
tool-calls: add DeepSeek R1 Qwen 7B to server test_hello_world
2025-02-03 02:24:30 +00:00
ochafik
04be723b33
tool-call: fix command-r7b parsing when response is multiline
2025-02-03 02:24:30 +00:00
ochafik
73d08d49cf
tool-call: allow --jinja --chat-template chatml
2025-02-03 02:24:30 +00:00
ochafik
08716281f2
rename tests
2025-02-03 02:24:30 +00:00
ochafik
c80cb30938
update logs
2025-02-03 02:24:30 +00:00
ochafik
28345877e4
server/oai: ensure content is null when there are tool calls
2025-02-03 02:24:30 +00:00
ochafik
04d511b5b5
Avoid double bos w/ jinja
2025-02-03 02:24:30 +00:00
ochafik
130ca222c9
DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode)
2025-02-03 02:24:30 +00:00
ochafik
87de852b7f
pass vocab to common_chat_params_init
2025-02-03 02:24:30 +00:00
ochafik
d3b60b8ad8
minja: enhance backfill of templates w/o tools description (use example tool call delta!)
2025-02-03 01:03:04 +00:00
Johannes Gäßler
6eecde3cc8
HIP: fix flash_attn_stream_k_fixup warning ( #11604 )
2025-02-02 23:48:29 +01:00
uvos
396856b400
CUDA/HIP: add support for selectable warp size to mmv ( #11519 )
...
CUDA/HIP: add support for selectable warp size to mmv
2025-02-02 22:40:09 +01:00
uvos
4d0598e144
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other ( #11601 )
...
This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly
2025-02-02 22:08:05 +01:00
Olivier Chafik
90f9b88afb
nit: more informative crash when grammar sampler fails ( #11593 )
2025-02-02 19:58:34 +00:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention ( #11583 )
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
Eric Curtin
84ec8a58f7
Name colors ( #11573 )
...
It's more descriptive, use #define's so we can use compile-time
concatenations.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-02-02 15:14:48 +00:00
Olivier Chafik
bfcce4d693
tool-call
: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 )
...
* `tool-call`: support Command R7B (w/ tool_plan return)
* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override
* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
Olivier Chafik
69804487e0
Fix exotic ci env that lacks ostringstream::str ( #11581 )
2025-02-02 09:10:15 +00:00