Xuan Son Nguyen
5add261ae8
test: leave model_hf_file blank
2025-01-30 15:35:38 +01:00
Olivier Chafik
82052466d6
log prompt + nits
2025-01-30 14:29:16 +00:00
Olivier Chafik
f223df0271
Format test-chat.cpp
2025-01-30 14:10:09 +00:00
Olivier Chafik
5a64af6c70
add llama_sampler_init_grammar_lazy instead of renaming the non-lazy
2025-01-30 14:10:09 +00:00
Olivier Chafik
7d59bf44ed
deprecate llama_sampler_init_grammar -> llama_sampler_grammar_init
2025-01-30 12:49:56 +00:00
Olivier Chafik
2bb3fed337
nit: fix py import
2025-01-30 12:42:34 +00:00
Olivier Chafik
9685043274
Update scripts/fetch_server_test_models.py to new compact hf_repo syntax + switch Hermes models
2025-01-30 12:05:07 +00:00
Olivier Chafik
0c171f5463
Update test_chat_completion.py
2025-01-30 11:56:10 +00:00
Olivier Chafik
06c4ca56c7
Update test_chat_completion.py
2025-01-30 11:49:16 +00:00
Olivier Chafik
3dcde9ea83
Fix debug + verbose
2025-01-30 11:49:13 +00:00
Xuan Son Nguyen
c88f4a798d
simplify handle_apply_template
2025-01-30 12:00:54 +01:00
Xuan Son Nguyen
2d51c459c6
code style changes on test
2025-01-30 11:52:31 +01:00
Olivier Chafik
8ef37a3c07
Merge remote-tracking branch 'origin/master' into tool-call
2025-01-30 10:50:02 +00:00
Olivier Chafik
3d804dec76
sync: minja ( #11499 )
2025-01-30 10:30:27 +00:00
mgroeber9110
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )
2025-01-30 12:10:59 +02:00
Daniel Bevenius
4314e56c4f
server : use lambda instead of std::bind ( #11507 )
...
This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.
The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.
Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md
2025-01-30 11:05:00 +01:00
Isaac McFadyen
496e5bf46b
server : (docs) added response format for /apply-template [no ci] ( #11503 )
2025-01-30 10:11:53 +01:00
Guspan Tanadi
7919256c57
readme : reference examples relative links ( #11505 )
2025-01-30 06:58:02 +01:00
ochafik
9591af1fc5
increase http timeout to 12
2025-01-30 04:50:59 +00:00
ochafik
7635912f73
llama 3.2 1b now fails the weather tool call?
2025-01-30 04:49:52 +00:00
ochafik
b831a6e0d3
rm unused llama_param
2025-01-30 04:49:02 +00:00
Daniel Bevenius
e0449763a4
server : update json snippets in README.md [no ci] ( #11492 )
...
This commit updates some of JSON snippets in README.md file and
removes the `json` language tag from the code blocks.
The motivation for this changes is that if there is invalid json in a
code snippet these are highlighted in red which can make it somewhat
difficult to read and can be a little distracting.
2025-01-30 05:48:14 +01:00
ochafik
18450e690f
debug logs are back
2025-01-30 04:34:14 +00:00
ochafik
81547e6f9b
nits
2025-01-30 04:20:06 +00:00
ochafik
f8e14bffc3
split chat handler vs. parser around enum again
2025-01-30 04:11:05 +00:00
ochafik
590c97931a
Update tests readme + add raw output to verbose log
2025-01-30 00:43:30 +00:00
ochafik
774557cfb4
llama 3.1: allow {name:
& {function:
syntax even w/ builtin tools (70B model just likes that!)
2025-01-30 00:43:06 +00:00
ochafik
d86a1ae80d
Unify content + message in server_task_result_cmpl_final (+ avoid string copy)
2025-01-30 00:13:12 +00:00
ochafik
77c60e662e
Avoid passing tools twice in generic handler (now that minja passes them automatically when needed)
2025-01-30 00:09:56 +00:00
ochafik
a810c37c76
Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env)
2025-01-29 23:16:18 +00:00
ochafik
cbecb35619
Add tool call to hot topics
2025-01-29 22:44:46 +00:00
ochafik
64545ac9d5
Somehow /* bad inside block comments, ok fine.
2025-01-29 22:38:52 +00:00
ochafik
2b2456978a
Add cli mode to test-chat to generate template summaries markdown
2025-01-29 22:33:16 +00:00
ochafik
84bc083faf
Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py)
2025-01-29 21:43:14 +00:00
ochafik
bc8a61138f
nits
2025-01-29 21:42:12 +00:00
ochafik
36c776f329
Finish renaming of chat inputs vs. params [skip ci]
2025-01-29 21:29:45 +00:00
ochafik
ed7c622d78
Rename: common/chat.*, common_chat_{inputs -> params}
2025-01-29 21:18:49 +00:00
ochafik
6e676c8030
sync: minja
2025-01-29 20:31:28 +00:00
ochafik
ba27e98582
Unify llama 3.x chat handling again (allow {"type": "function", "name": ...
prefix)
2025-01-29 19:47:28 +00:00
Nigel Bosch
eb7cf15a80
server : add /apply-template endpoint for additional use cases of Minja functionality ( #11489 )
...
* add /apply-template endpoint to server
* remove unnecessary line
* add /apply-template documentation
* return only "prompt" field in /apply-template
* use suggested idea instead of my overly verbose way
2025-01-29 19:45:44 +01:00
ochafik
7b5e0803c8
Move templates/ under models/
2025-01-29 18:16:35 +00:00
ochafik
682026f84b
Create meta-llama-Llama-3.1-8B-Instruct.jinja
2025-01-29 18:09:59 +00:00
ochafik
babdefc4dd
Merge remote-tracking branch 'origin/master' into tool-call
2025-01-29 17:54:57 +00:00
ochafik
0f8af536c9
nits
2025-01-29 17:50:44 +00:00
ochafik
77dd67c28c
tool-calls: disable crashing tests
2025-01-29 17:36:18 +00:00
Rémy Oudompheng
66ee4f297c
vulkan: implement initial support for IQ2 and IQ3 quantizations ( #11360 )
...
* vulkan: initial support for IQ3_S
* vulkan: initial support for IQ3_XXS
* vulkan: initial support for IQ2_XXS
* vulkan: initial support for IQ2_XS
* vulkan: optimize Q3_K by removing branches
* vulkan: implement dequantize variants for coopmat2
* vulkan: initial support for IQ2_S
* vulkan: vertically realign code
* port failing dequant callbacks from mul_mm
* Fix array length mismatches
* vulkan: avoid using workgroup size before it is referenced
* tests: increase timeout for Vulkan llvmpipe backend
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-01-29 18:29:39 +01:00
ochafik
76f6ab19ad
Update test_tool_call.py
2025-01-29 17:04:30 +00:00
ochafik
41eec4622b
rm unused templates, rename one
2025-01-29 16:50:54 +00:00
ochafik
40cc3f2fde
Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call
2025-01-29 16:45:59 +00:00
Olivier Chafik
384f54a135
Split bulk of tool call tests to slow lane
2025-01-29 16:13:45 +00:00