Commit graph

4935 commits

Author SHA1 Message Date
ochafik
cbecb35619 Add tool call to hot topics 2025-01-29 22:44:46 +00:00
ochafik
64545ac9d5 Somehow /* bad inside block comments, ok fine. 2025-01-29 22:38:52 +00:00
ochafik
2b2456978a Add cli mode to test-chat to generate template summaries markdown 2025-01-29 22:33:16 +00:00
ochafik
84bc083faf Remove server tests LLAMA_CACHE override (tests are serial, and the cache is easier to prefill w/ scripts/fetch_server_test_models.py) 2025-01-29 21:43:14 +00:00
ochafik
bc8a61138f nits 2025-01-29 21:42:12 +00:00
ochafik
36c776f329 Finish renaming of chat inputs vs. params [skip ci] 2025-01-29 21:29:45 +00:00
ochafik
ed7c622d78 Rename: common/chat.*, common_chat_{inputs -> params} 2025-01-29 21:18:49 +00:00
ochafik
6e676c8030 sync: minja 2025-01-29 20:31:28 +00:00
ochafik
ba27e98582 Unify llama 3.x chat handling again (allow {"type": "function", "name": ... prefix) 2025-01-29 19:47:28 +00:00
ochafik
7b5e0803c8 Move templates/ under models/ 2025-01-29 18:16:35 +00:00
ochafik
682026f84b Create meta-llama-Llama-3.1-8B-Instruct.jinja 2025-01-29 18:09:59 +00:00
ochafik
babdefc4dd Merge remote-tracking branch 'origin/master' into tool-call 2025-01-29 17:54:57 +00:00
ochafik
0f8af536c9 nits 2025-01-29 17:50:44 +00:00
ochafik
77dd67c28c tool-calls: disable crashing tests 2025-01-29 17:36:18 +00:00
Rémy Oudompheng
66ee4f297c
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)
* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-01-29 18:29:39 +01:00
ochafik
76f6ab19ad Update test_tool_call.py 2025-01-29 17:04:30 +00:00
ochafik
41eec4622b rm unused templates, rename one 2025-01-29 16:50:54 +00:00
ochafik
40cc3f2fde Merge branch 'tool-call' of github.com:ochafik/llama.cpp into tool-call 2025-01-29 16:45:59 +00:00
Olivier Chafik
384f54a135 Split bulk of tool call tests to slow lane 2025-01-29 16:13:45 +00:00
Olivier Chafik
923c805d04 rm dead code + nits 2025-01-29 15:57:58 +00:00
Daniel Bevenius
e51c47b401
server : update auto gen files comments [no ci] (#11484)
* server : update auto gen files comments

This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.

The motivation for this change is that `deps.sh` was removed in
Commit 91c36c269b ("server : (web ui)
Various improvements, now use vite as bundler (#10599)").

* squash! server : update auto gen files comments [no ci]

Move comments about file generation to README.md.

* squash! server : update auto gen files comments [no ci]

Remove the comments in server.cpp that mention that information
can be found in the README.md file.
2025-01-29 16:34:18 +01:00
Jeff Bolz
2711d0215f
vulkan: Catch pipeline creation failure and print an error message (#11436)
* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging
2025-01-29 09:26:50 -06:00
Eric Curtin
f0d4b29edf
Parse https://ollama.com/library/ syntax (#11480)
People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-29 11:23:10 +00:00
Georgi Gerganov
815857791d
sync : ggml 2025-01-29 11:25:29 +02:00
William Tambellini
1a0e87d291
ggml : add option to not print stack on abort (ggml/1081)
* Add option to not print stack on abort

Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.

* Update ggml/src/ggml.c

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-29 11:24:53 +02:00
issixx
d2e518e9b4
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)
some threads kept looping and failed to terminate properly after an abort during CPU execution.

Co-authored-by: issi <issi@gmail.com>
2025-01-29 11:24:51 +02:00
Daniel Bevenius
b636228c0a
embedding : enable --no-warmup option (#11475)
This commit enables the `--no-warmup` option for the llama-embeddings.

The motivation for this change is to allow the user to disable the
warmup when running the the program.
2025-01-29 10:38:54 +02:00
Molly Sophia
325afb370a
llama: fix missing k_cache store for rwkv6qwen2 (#11445)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-01-29 12:07:21 +08:00
ochafik
4a1e8e9f91 refactor test-chat-handler 2025-01-29 04:00:01 +00:00
ochafik
18d5a1b2ca nits 2025-01-29 02:15:34 +00:00
ochafik
47be437356 Text fireworks v2 template 2025-01-29 01:51:07 +00:00
ochafik
4cdbb8c53f Revert breaking minja change 2025-01-29 01:50:49 +00:00
ochafik
64263910d8 Fix firefunction w/ jinja: requires two variables, use the chat handlers everywhere templates are used 2025-01-29 01:15:44 +00:00
ochafik
d603d067d5 sync: minja 2025-01-28 23:49:04 +00:00
ochafik
4f257550a2 minja: sync on https://github.com/google/minja/pull/33 2025-01-28 23:46:51 +00:00
Emreerdog
794fe23f29
cmake: add hints for locating ggml on Windows using Llama find-package (#11466) 2025-01-28 19:22:06 -04:00
peidaqi
cf8cc856d7
server : Fixed wrong function name in llamacpp server unit test (#11473)
The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen
d0c08040b6
ci : fix build CPU arm64 (#11472)
* ci : fix build CPU arm64

* failed, trying ubuntu 22

* vulkan: ubuntu 24

* vulkan : jammy --> noble
2025-01-29 00:02:56 +01:00
uvos
be5ef7963f
HIP: Supress transformation warning in softmax.cu
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
2025-01-28 23:06:32 +01:00
Nikita Sarychev
cae9fb4361
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080)
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
2025-01-28 16:42:20 +01:00
ochafik
cad1448ac7 Disable test-chat-handler on win32 like the other grammar-related tests 2025-01-28 14:46:37 +00:00
Eric Curtin
7fee2889e6
Add github protocol pulling and http:// (#11465)
As pulling protocols to llama-run

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-28 14:45:41 +00:00
ochafik
cd63ba435e beef up test-chat-handler w/ delta expectations 2025-01-28 14:40:23 +00:00
Nuno
d7d1eccacc
docker: allow installing pip packages system-wide (#11437)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 14:17:25 +00:00
someone13574
4bf3119d61
cmake : don't fail on GGML_CPU=OFF (#11457) 2025-01-28 15:15:34 +01:00
ochafik
ba10b47ae5 Add missing link dep for windows build 2025-01-28 10:52:14 +00:00
ochafik
b5a74d1a24 Simplify parser defs (incremental parsing for streaming will need more thinking) 2025-01-28 10:48:11 +00:00
Nuno
f643120bad
docker: add perplexity and bench commands to full image (#11438)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 10:42:32 +00:00
ochafik
ec4aeaf18a Revert "Allow tool use + streaming"
This reverts commit 62717145f7.
2025-01-28 10:29:17 +00:00
Akarshan Biswas
6e84b0ab8e
SYCL : SOFTMAX F16 mask support and other fixes (#11261)
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases
2025-01-28 09:56:58 +00:00