Commit graph

417 commits

Author SHA1 Message Date
ochafik
d43e4f6c22 Merge branch 'sync-minja-4' into r1-toolcall 2025-02-04 04:05:02 +00:00
ochafik
f12e3507f7 Update chat.cpp 2025-02-04 04:02:18 +00:00
ochafik
56a14ddc83 fix mistral chat test: need empty tokens 2025-02-04 04:01:35 +00:00
ochafik
09caa63451 sync: minja
182de30cda
2025-02-04 03:52:59 +00:00
ochafik
f0154a6479 Fix / test models/templates/llama-cpp-deepseek-r1.jinja 2025-02-04 03:09:15 +00:00
ochafik
a682d1216d fix / test parsing of r1 parser 2025-02-04 02:23:31 +00:00
ochafik
9a6847c857 move trigger_words init inside non-llguidance branch 2025-02-04 01:13:01 +00:00
ochafik
18a11f43f0 tool-call: r1: fix grammar 2025-02-04 01:12:44 +00:00
ochafik
e84ee88f50 r1: fix inadvertent newline in grammar before <|tool▁call▁end|> 2025-02-04 00:36:38 +00:00
Olivier Chafik
ce28224de8 tool-call: r1: add one more trigger approx "<|tool calls begin|>" 2025-02-04 00:28:40 +00:00
Olivier Chafik
bff549deb6 simplify hack to fix original template's backfill from minja 2025-02-04 00:14:48 +00:00
Olivier Chafik
bbd45bf6a2 sync: minja 2025-02-04 00:14:15 +00:00
Olivier Chafik
30ea3591c9 update to minja's new api 2025-02-03 23:53:27 +00:00
Olivier Chafik
11c1f0c7d4 actually we want eos_token in the template to infer tool call examples, explicitly skipped in new template options 2025-02-03 23:52:28 +00:00
Olivier Chafik
cde3833239
tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616)
* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Olivier Chafik
108da907f0 sync: minja https://github.com/google/minja/pull/46 2025-02-03 23:31:49 +00:00
Olivier Chafik
1c302e18ba simpler hacky fixes for original broken template (+ fix minja example syntax polyfill) 2025-02-03 20:34:44 +00:00
Olivier Chafik
c6214ee9d6 rm unneeded vocab 2025-02-03 19:59:50 +00:00
Olivier Chafik
7dc271fb37 tool-calls: add deepseek r1 template + accommodate broken official template slightly better 2025-02-03 19:59:33 +00:00
Olivier Chafik
0be7f652e9 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 19:35:54 +00:00
Olivier Chafik
d73448de1c Simplify default chatml logic 2025-02-03 19:22:53 +00:00
Olivier Chafik
569610ee77 tool-calls: accommodate variety of wrong tool call opening tags both Qwen 32B and 7B distills like to spit out 2025-02-03 18:57:55 +00:00
Olivier Chafik
c397bd1f5f tweak delta logic 2025-02-03 17:57:38 +00:00
Olivier Chafik
df3474e2c2 tool-calls: r1: add missing <|tool▁calls▁end|> to grammar! 2025-02-03 17:33:14 +00:00
Olivier Chafik
08271b5505 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 17:32:38 +00:00
Olivier Chafik
b2dd490926 add missing try catch around jinja parsing to default to chatml 2025-02-03 17:32:12 +00:00
Olivier Chafik
4cb0e1d873 Merge branch 'jinja-chatml' into r1-toolcall 2025-02-03 17:15:14 +00:00
Olivier Chafik
2b3c4829a3 fix build / rm diff 2025-02-03 16:34:43 +00:00
Olivier Chafik
aa98e59038 fix bad merge 2025-02-03 14:01:49 +00:00
Olivier Chafik
5d18d76b69 fix double bos issue (drop bos/eos tokens from jinja template) 2025-02-03 13:59:16 +00:00
Olivier Chafik
cf83623a47 fix typo 2025-02-03 13:58:46 +00:00
ochafik
a76073cf88 minimize diffs 2025-02-03 10:58:52 +00:00
ochafik
1e9acd2d31 tool-call: allow --jinja --chat-template chatml 2025-02-03 04:07:11 +00:00
ochafik
04be723b33 tool-call: fix command-r7b parsing when response is multiline 2025-02-03 02:24:30 +00:00
ochafik
73d08d49cf tool-call: allow --jinja --chat-template chatml 2025-02-03 02:24:30 +00:00
ochafik
c80cb30938 update logs 2025-02-03 02:24:30 +00:00
ochafik
04d511b5b5 Avoid double bos w/ jinja 2025-02-03 02:24:30 +00:00
ochafik
130ca222c9 DeepSeek R1: parse thoughts / return in separate field in API (non streamed mode) 2025-02-03 02:24:30 +00:00
ochafik
87de852b7f pass vocab to common_chat_params_init 2025-02-03 02:24:30 +00:00
ochafik
d3b60b8ad8 minja: enhance backfill of templates w/o tools description (use example tool call delta!) 2025-02-03 01:03:04 +00:00
Eric Curtin
84ec8a58f7
Name colors (#11573)
It's more descriptive, use #define's so we can use compile-time
concatenations.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-02-02 15:14:48 +00:00
Olivier Chafik
bfcce4d693
tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)
* `tool-call`: support Command R7B (w/ tool_plan return)

* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override

* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
Olivier Chafik
69804487e0
Fix exotic ci env that lacks ostringstream::str (#11581) 2025-02-02 09:10:15 +00:00
Michał Moskal
ff227703d6
sampling : support for llguidance grammars (#10224)
* initial porting of previous LLG patch

* update for new APIs

* build: integrate llguidance as an external project

* use '%llguidance' as marker to enable llg lark syntax

* add some docs

* clarify docs

* code style fixes

* remove llguidance.h from .gitignore

* fix tests when llg is enabled

* pass vocab not model to llama_sampler_init_llg()

* copy test-grammar-integration.cpp to test-llguidance.cpp

* clang fmt

* fix ref-count bug

* build and run test

* gbnf -> lark syntax

* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag

* rename llguidance test file to test-grammar-llguidance.cpp

* add gh action for llg test

* align tests with LLG grammar syntax and JSON Schema spec

* llama_tokenizer() in fact requires valid utf8

* update llg

* format file

* add $LLGUIDANCE_LOG_LEVEL support

* fix whitespace

* fix warning

* include <cmath> for INFINITY

* add final newline

* fail llama_sampler_init_llg() at runtime

* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes

* simplify #includes

* improve doc string for LLAMA_LLGUIDANCE

* typo in merge

* bump llguidance to 0.6.12
2025-02-02 09:55:32 +02:00
Olivier Chafik
cfd74c86db
sync: minja (418a2364b5) (#11574) 2025-02-01 12:24:51 +00:00
Olivier Chafik
a83f528688
tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539)
* An empty tool_call_id is better than none!

* sync: minja (tool call name optional https://github.com/google/minja/pull/36)

* Force-disable parallel_tool_calls if template doesn't support it

* More debug logs

* Llama 3.x tools: accept / trigger on more varied spaced outputs

* Fix empty content for functionary v3.2 tool call

* Add proper tool call docs to server README

* readme: function calling *is* supported now

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-31 14:15:25 +00:00
Steve Grubb
1bd3047a93
common: Add missing va_end (#11529)
The va_copy man page states that va_end must be called to revert
whatever the copy did. For some implementaions, not calling va_end
has no consequences. For others it could leak memory.
2025-01-31 07:58:55 +02:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Olivier Chafik
3d804dec76
sync: minja (#11499) 2025-01-30 10:30:27 +00:00
Daniel Bevenius
b636228c0a
embedding : enable --no-warmup option (#11475)
This commit enables the `--no-warmup` option for the llama-embeddings.

The motivation for this change is to allow the user to disable the
warmup when running the the program.
2025-01-29 10:38:54 +02:00