William Tambellini
1a0e87d291
ggml : add option to not print stack on abort (ggml/1081)
...
* Add option to not print stack on abort
Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.
* Update ggml/src/ggml.c
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-29 11:24:53 +02:00
issixx
d2e518e9b4
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)
...
some threads kept looping and failed to terminate properly after an abort during CPU execution.
Co-authored-by: issi <issi@gmail.com>
2025-01-29 11:24:51 +02:00
Daniel Bevenius
b636228c0a
embedding : enable --no-warmup option ( #11475 )
...
This commit enables the `--no-warmup` option for the llama-embeddings.
The motivation for this change is to allow the user to disable the
warmup when running the the program.
2025-01-29 10:38:54 +02:00
Molly Sophia
325afb370a
llama: fix missing k_cache store for rwkv6qwen2 ( #11445 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-01-29 12:07:21 +08:00
ochafik
4a1e8e9f91
refactor test-chat-handler
2025-01-29 04:00:01 +00:00
ochafik
18d5a1b2ca
nits
2025-01-29 02:15:34 +00:00
ochafik
47be437356
Text fireworks v2 template
2025-01-29 01:51:07 +00:00
ochafik
4cdbb8c53f
Revert breaking minja change
2025-01-29 01:50:49 +00:00
ochafik
64263910d8
Fix firefunction w/ jinja: requires two variables, use the chat handlers everywhere templates are used
2025-01-29 01:15:44 +00:00
ochafik
d603d067d5
sync: minja
2025-01-28 23:49:04 +00:00
ochafik
4f257550a2
minja: sync on https://github.com/google/minja/pull/33
2025-01-28 23:46:51 +00:00
Emreerdog
794fe23f29
cmake: add hints for locating ggml on Windows using Llama find-package ( #11466 )
2025-01-28 19:22:06 -04:00
peidaqi
cf8cc856d7
server : Fixed wrong function name in llamacpp server unit test ( #11473 )
...
The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True
2025-01-29 00:03:42 +01:00
Xuan-Son Nguyen
d0c08040b6
ci : fix build CPU arm64 ( #11472 )
...
* ci : fix build CPU arm64
* failed, trying ubuntu 22
* vulkan: ubuntu 24
* vulkan : jammy --> noble
2025-01-29 00:02:56 +01:00
uvos
be5ef7963f
HIP: Supress transformation warning in softmax.cu
...
loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.
2025-01-28 23:06:32 +01:00
Nikita Sarychev
cae9fb4361
HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug ( #11080 )
...
This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.
2025-01-28 16:42:20 +01:00
ochafik
cad1448ac7
Disable test-chat-handler on win32 like the other grammar-related tests
2025-01-28 14:46:37 +00:00
Eric Curtin
7fee2889e6
Add github protocol pulling and http:// ( #11465 )
...
As pulling protocols to llama-run
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-28 14:45:41 +00:00
ochafik
cd63ba435e
beef up test-chat-handler w/ delta expectations
2025-01-28 14:40:23 +00:00
Nuno
d7d1eccacc
docker: allow installing pip packages system-wide ( #11437 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 14:17:25 +00:00
someone13574
4bf3119d61
cmake : don't fail on GGML_CPU=OFF
( #11457 )
2025-01-28 15:15:34 +01:00
ochafik
ba10b47ae5
Add missing link dep for windows build
2025-01-28 10:52:14 +00:00
ochafik
b5a74d1a24
Simplify parser defs (incremental parsing for streaming will need more thinking)
2025-01-28 10:48:11 +00:00
Nuno
f643120bad
docker: add perplexity and bench commands to full image ( #11438 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-28 10:42:32 +00:00
ochafik
ec4aeaf18a
Revert "Allow tool use + streaming"
...
This reverts commit 62717145f7
.
2025-01-28 10:29:17 +00:00
Akarshan Biswas
6e84b0ab8e
SYCL : SOFTMAX F16 mask support and other fixes ( #11261 )
...
Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021 .
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).
* SYCL: SOFTMAX F16 mask support and other fixes
* test-backend-ops: Add F16 mask test cases
2025-01-28 09:56:58 +00:00
ochafik
62d45a552f
Disable slow tests where appropriate, + nits
2025-01-28 09:47:41 +00:00
ochafik
d274ffcc95
build: Add missing optional include for gcc
2025-01-28 09:29:31 +00:00
ochafik
0a51e514f6
Update test-chat-handler.cpp
2025-01-28 09:24:35 +00:00
Olivier Chafik
2f99236f77
Tool-call: do last partial parse upon limit stop
2025-01-28 09:23:19 +00:00
Olivier Chafik
6d5682909f
Cleanup dead code in llama_3_1 tool call code
2025-01-28 09:22:26 +00:00
Olivier Chafik
62717145f7
Allow tool use + streaming
2025-01-28 09:22:03 +00:00
Michael Engel
2b8525d5c8
Handle missing model in CLI parameters for llama-run ( #11399 )
...
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.
Signed-off-by: Michael Engel <mengel@redhat.com>
2025-01-28 08:32:40 +00:00
ochafik
ef9efc9ed3
Fix Llama 3.1 (incl. constrained builtin tools e.g. <|python_tag|>foo.call(arg=vallue)
)
2025-01-28 01:04:06 +00:00
ochafik
2d607f1a68
Update test-chat-handler.cpp
2025-01-27 23:29:28 +00:00
ochafik
b565ab2ab1
comment out broken tests in test_tool_call.py
2025-01-27 23:02:15 +00:00
ochafik
cafea60922
Split e2e test_tool_call from test_chat_completion
2025-01-27 22:46:33 +00:00
ochafik
90effb845f
Pass grammar laziness all the way down to sampler (need to print special trigger tokens e.g. for Nemo even w/ tool_choice=required)
2025-01-27 22:46:17 +00:00
ochafik
ad229783c5
updated tool call example to be less ambiguous (deepseek likes to rant about hello world)
2025-01-27 22:44:44 +00:00
ochafik
fa065eb095
Rehabilitate test_format_detection
2025-01-27 20:46:03 +00:00
ochafik
add9124115
fix test-chat-handler grammar tests
2025-01-27 20:13:09 +00:00
Eric Curtin
a4417ddda9
Add new hf protocol for ollama ( #11449 )
...
https://huggingface.co/docs/hub/en/ollama
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-27 19:36:10 +01:00
ochafik
118f799ae4
DeepSeek-R1: implement grammar constraints
2025-01-27 17:52:46 +00:00
ochafik
92ac336dfa
Prepare DeepSeek-R1-Distill-Llama-8B support
2025-01-27 17:26:43 +00:00
ochafik
09971e626c
Update test_chat_completion.py
2025-01-27 15:43:03 +00:00
ochafik
67709552ad
tool-call: compact json output to cap # tokens generated
2025-01-27 15:42:27 +00:00
ochafik
57f40e366b
tool-call: fix lazy grammar & mixed content + tool calls parsing
2025-01-27 15:41:54 +00:00
ochafik
2efa0c27bf
tool-call: add weather tool e2e tests
2025-01-27 15:02:09 +00:00
ochafik
15ec01e896
jinja: only add special tokens if template doesn't seem to handle them
2025-01-27 14:28:11 +00:00
ochafik
da606d8d41
tool-call: remove nonsensical code_interpreter code
2025-01-27 14:19:20 +00:00