Commit graph

4962 commits

Author SHA1 Message Date
Olivier Chafik
7d59bf44ed deprecate llama_sampler_init_grammar -> llama_sampler_grammar_init 2025-01-30 12:49:56 +00:00
Olivier Chafik
2bb3fed337 nit: fix py import 2025-01-30 12:42:34 +00:00
Olivier Chafik
9685043274 Update scripts/fetch_server_test_models.py to new compact hf_repo syntax + switch Hermes models 2025-01-30 12:05:07 +00:00
Olivier Chafik
0c171f5463 Update test_chat_completion.py 2025-01-30 11:56:10 +00:00
Olivier Chafik
06c4ca56c7 Update test_chat_completion.py 2025-01-30 11:49:16 +00:00
Olivier Chafik
3dcde9ea83 Fix debug + verbose 2025-01-30 11:49:13 +00:00
Xuan Son Nguyen
c88f4a798d simplify handle_apply_template 2025-01-30 12:00:54 +01:00
Xuan Son Nguyen
2d51c459c6 code style changes on test 2025-01-30 11:52:31 +01:00
Olivier Chafik
8ef37a3c07 Merge remote-tracking branch 'origin/master' into tool-call 2025-01-30 10:50:02 +00:00
Olivier Chafik
3d804dec76
sync: minja (#11499) 2025-01-30 10:30:27 +00:00
mgroeber9110
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) 2025-01-30 12:10:59 +02:00
Daniel Bevenius
4314e56c4f
server : use lambda instead of std::bind (#11507)
This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.

The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.

Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md
2025-01-30 11:05:00 +01:00
Isaac McFadyen
496e5bf46b
server : (docs) added response format for /apply-template [no ci] (#11503) 2025-01-30 10:11:53 +01:00
Guspan Tanadi
7919256c57
readme : reference examples relative links (#11505) 2025-01-30 06:58:02 +01:00
ochafik
9591af1fc5 increase http timeout to 12 2025-01-30 04:50:59 +00:00
ochafik
7635912f73 llama 3.2 1b now fails the weather tool call? 2025-01-30 04:49:52 +00:00
ochafik
b831a6e0d3 rm unused llama_param 2025-01-30 04:49:02 +00:00
Daniel Bevenius
e0449763a4
server : update json snippets in README.md [no ci] (#11492)
This commit updates some of JSON snippets in README.md file and
removes the `json` language tag from the code blocks.

The motivation for this changes is that if there is invalid json in a
code snippet these are highlighted in red which can make it somewhat
difficult to read and can be a little distracting.
2025-01-30 05:48:14 +01:00
ochafik
18450e690f debug logs are back 2025-01-30 04:34:14 +00:00
ochafik
81547e6f9b nits 2025-01-30 04:20:06 +00:00
ochafik
f8e14bffc3 split chat handler vs. parser around enum again 2025-01-30 04:11:05 +00:00
ochafik
590c97931a Update tests readme + add raw output to verbose log 2025-01-30 00:43:30 +00:00
ochafik
774557cfb4 llama 3.1: allow {name: & {function: syntax even w/ builtin tools (70B model just likes that!) 2025-01-30 00:43:06 +00:00
ochafik
d86a1ae80d Unify content + message in server_task_result_cmpl_final (+ avoid string copy) 2025-01-30 00:13:12 +00:00
ochafik
77c60e662e Avoid passing tools twice in generic handler (now that minja passes them automatically when needed) 2025-01-30 00:09:56 +00:00
ochafik
a810c37c76 Partial revert of LLAMA_CACHE=tmp (unless set explicitly in env) 2025-01-29 23:16:18 +00:00
ochafik
cbecb35619 Add tool call to hot topics 2025-01-29 22:44:46 +00:00
ochafik
64545ac9d5 Somehow /* bad inside block comments, ok fine. 2025-01-29 22:38:52 +00:00
ochafik
2b2456978a Add cli mode to test-chat to generate template summaries markdown 2025-01-29 22:33:16 +00:00
ochafik
84bc083faf Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py) 2025-01-29 21:43:14 +00:00
ochafik
bc8a61138f nits 2025-01-29 21:42:12 +00:00
ochafik
36c776f329 Finish renaming of chat inputs vs. params [skip ci] 2025-01-29 21:29:45 +00:00
ochafik
ed7c622d78 Rename: common/chat.*, common_chat_{inputs -> params} 2025-01-29 21:18:49 +00:00
ochafik
6e676c8030 sync: minja 2025-01-29 20:31:28 +00:00
ochafik
ba27e98582 Unify llama 3.x chat handling again (allow {"type": "function", "name": ... prefix) 2025-01-29 19:47:28 +00:00
Nigel Bosch
eb7cf15a80
server : add /apply-template endpoint for additional use cases of Minja functionality (#11489)
* add /apply-template endpoint to server

* remove unnecessary line

* add /apply-template documentation

* return only "prompt" field in /apply-template

* use suggested idea instead of my overly verbose way
2025-01-29 19:45:44 +01:00
ochafik
7b5e0803c8 Move templates/ under models/ 2025-01-29 18:16:35 +00:00
ochafik
682026f84b Create meta-llama-Llama-3.1-8B-Instruct.jinja 2025-01-29 18:09:59 +00:00
ochafik
babdefc4dd Merge remote-tracking branch 'origin/master' into tool-call 2025-01-29 17:54:57 +00:00
ochafik
0f8af536c9 nits 2025-01-29 17:50:44 +00:00
ochafik
77dd67c28c tool-calls: disable crashing tests 2025-01-29 17:36:18 +00:00
Rémy Oudompheng
66ee4f297c
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)
* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-01-29 18:29:39 +01:00
ochafik
76f6ab19ad Update test_tool_call.py 2025-01-29 17:04:30 +00:00
ochafik
41eec4622b rm unused templates, rename one 2025-01-29 16:50:54 +00:00
ochafik
40cc3f2fde Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call 2025-01-29 16:45:59 +00:00
Olivier Chafik
384f54a135 Split bulk of tool call tests to slow lane 2025-01-29 16:13:45 +00:00
Olivier Chafik
923c805d04 rm dead code + nits 2025-01-29 15:57:58 +00:00
Daniel Bevenius
e51c47b401
server : update auto gen files comments [no ci] (#11484)
* server : update auto gen files comments

This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.

The motivation for this change is that `deps.sh` was removed in
Commit 91c36c269b ("server : (web ui)
Various improvements, now use vite as bundler (#10599)").

* squash! server : update auto gen files comments [no ci]

Move comments about file generation to README.md.

* squash! server : update auto gen files comments [no ci]

Remove the comments in server.cpp that mention that information
can be found in the README.md file.
2025-01-29 16:34:18 +01:00
Jeff Bolz
2711d0215f
vulkan: Catch pipeline creation failure and print an error message (#11436)
* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging
2025-01-29 09:26:50 -06:00
Eric Curtin
f0d4b29edf
Parse https://ollama.com/library/ syntax (#11480)
People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-29 11:23:10 +00:00