Commit graph

4930 commits

Author SHA1 Message Date
Olivier Chafik
62717145f7 Allow tool use + streaming 2025-01-28 09:22:03 +00:00
Michael Engel
2b8525d5c8
Handle missing model in CLI parameters for llama-run (#11399)
The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.

Signed-off-by: Michael Engel <mengel@redhat.com>
2025-01-28 08:32:40 +00:00
ochafik
ef9efc9ed3 Fix Llama 3.1 (incl. constrained builtin tools e.g. <|python_tag|>foo.call(arg=vallue)) 2025-01-28 01:04:06 +00:00
ochafik
2d607f1a68 Update test-chat-handler.cpp 2025-01-27 23:29:28 +00:00
ochafik
b565ab2ab1 comment out broken tests in test_tool_call.py 2025-01-27 23:02:15 +00:00
ochafik
cafea60922 Split e2e test_tool_call from test_chat_completion 2025-01-27 22:46:33 +00:00
ochafik
90effb845f Pass grammar laziness all the way down to sampler (need to print special trigger tokens e.g. for Nemo even w/ tool_choice=required) 2025-01-27 22:46:17 +00:00
ochafik
ad229783c5 updated tool call example to be less ambiguous (deepseek likes to rant about hello world) 2025-01-27 22:44:44 +00:00
ochafik
fa065eb095 Rehabilitate test_format_detection 2025-01-27 20:46:03 +00:00
ochafik
add9124115 fix test-chat-handler grammar tests 2025-01-27 20:13:09 +00:00
Eric Curtin
a4417ddda9
Add new hf protocol for ollama (#11449)
https://huggingface.co/docs/hub/en/ollama

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-27 19:36:10 +01:00
ochafik
118f799ae4 DeepSeek-R1: implement grammar constraints 2025-01-27 17:52:46 +00:00
ochafik
92ac336dfa Prepare DeepSeek-R1-Distill-Llama-8B support 2025-01-27 17:26:43 +00:00
ochafik
09971e626c Update test_chat_completion.py 2025-01-27 15:43:03 +00:00
ochafik
67709552ad tool-call: compact json output to cap # tokens generated 2025-01-27 15:42:27 +00:00
ochafik
57f40e366b tool-call: fix lazy grammar & mixed content + tool calls parsing 2025-01-27 15:41:54 +00:00
ochafik
2efa0c27bf tool-call: add weather tool e2e tests 2025-01-27 15:02:09 +00:00
ochafik
15ec01e896 jinja: only add special tokens if template doesn't seem to handle them 2025-01-27 14:28:11 +00:00
ochafik
da606d8d41 tool-call: remove nonsensical code_interpreter code 2025-01-27 14:19:20 +00:00
Haus1
d6d24cd9ed
AMD: parse the architecture as supplied by gcnArchName (#11244)
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-01-27 14:58:17 +01:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed (#11448)
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%

* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings

* Update src/llama-vocab.cpp

---------

Co-authored-by: lexasub <empty@empty.ru>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-27 14:42:09 +01:00
ochafik
bddc1bebcc tool-call: fix special handling of special trigger tokens (Nemo) 2025-01-27 11:37:41 +00:00
Johannes Gäßler
df984e0147
llama: refactor llama_decode_impl (#11381) 2025-01-27 12:07:12 +01:00
Ihar Hrachyshka
acd38efee3
metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441)
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-01-27 09:41:59 +02:00
ochafik
ca0c837b6a nits 2025-01-27 01:08:29 +00:00
ochafik
f7078cab36 tool-call: fix functionary v3.1 required test 2025-01-26 23:23:09 +00:00
Xuan Son Nguyen
caf773f249
docker : fix ARM build and Vulkan build (#11434)
* ci : do not fail-fast for docker

* build arm64/amd64 separatedly

* fix pip

* no fast fail

* vulkan: try jammy
2025-01-26 22:45:32 +01:00
ochafik
5ec4c5e4d3 reshuffle chat handlers 2025-01-26 21:38:07 +00:00
ochafik
43385b2ff2 sync: minja 2025-01-26 21:36:25 +00:00
Georgi Gerganov
178a7eb952
metal : use residency sets (#11427)
* metal : use residency sets

ggml-ci

* metal : restore commandBufferWithUnretainedReferences calls [no ci]

* metal : release descriptors

ggml-ci

* metal : check env GGML_METAL_NO_RESIDENCY

ggml-ci

* metal : fix build + clean-up

ggml-ci
2025-01-26 20:06:16 +02:00
Nuno
6f53d8a6b4
docker: add missing vulkan library to base layer and update to 24.04 (#11422)
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-26 18:22:43 +01:00
bandoti
19f65187cb
cmake: add ggml find package (#11369)
* Add initial ggml cmake package

* Add build numbers to ggml find-package

* Expand variables with GGML_ prefix

* Guard against adding to cache variable twice

* Add git to msys2 workflow

* Handle ggml-cpu-* variants

* Link ggml/ggml-base libraries to their targets

* Replace main-cmake-pkg with simple-cmake-pkg

* Interface features require c_std_90

* Fix typo

* Removed unnecessary bracket from status message

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-26 12:07:48 -04:00
ochafik
11594557e3 Merge branch 'tool-call' into tool-call-handler 2025-01-26 15:32:53 +00:00
ochafik
3f3fc03983 nit: trailing spaces 2025-01-26 15:32:13 +00:00
Frank Mai
1d8ee06000
rpc: fix register position (#11424)
Signed-off-by: thxCode <thxcode0824@gmail.com>
2025-01-26 16:20:34 +01:00
Georgi Gerganov
2cc9b8c32c
readme : update hot topics 2025-01-26 14:30:15 +02:00
Jeff Bolz
f35726c2fb
build: apply MSVC /bigobj option to c/cpp files only (#11423) 2025-01-26 03:10:03 +01:00
Jeff Bolz
4a75d19376
vulkan: compile shaders on-demand (#11406)
Reduce first-run startup time and memory consumption.

Should fix #11339.
2025-01-25 22:29:57 +01:00
uvos
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) 2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1
build: add /bigobj to MSVC build (#11407) 2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) 2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task (#11418)
* server : fix cleaning up stream task

* one more spot
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b
docker : fix CPU ARM build (#11403)
* docker : fix CPU ARM build

* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a
ci : fix line breaks on windows builds (#11409)
* ci : fix line breaks on windows builds

* cont : another try

* ci : fix powershell line breaks
2025-01-25 13:36:48 +02:00
Olivier Chafik
51b7aab841 Update test_chat_completion.py 2025-01-25 04:57:40 +00:00
Olivier Chafik
a6463c1e35 jinja: don't add bos when jinja enabled 2025-01-25 04:52:42 +00:00
Olivier Chafik
0208b20767 Update test_chat_completion.py 2025-01-25 04:52:03 +00:00
Olivier Chafik
c479d39abd tool-call: allow special tokens that are grammar triggers 2025-01-25 04:51:53 +00:00
jiahao su
466ea66f33
CANN: Add Ascend CANN build ci (#10217)
* CANN: Add Ascend CANN build ci

* Update build.yml

* Modify cann image version

* Update build.yml

* Change to run on x86 system

* Update build.yml

* Update build.yml

* Modify format error

* Update build.yml

* Add 'Ascend NPU' label restrictions

* Exclude non PR event

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>

* Update build.yml

---------

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM (#11362)
* Add hipGraph support

* Enable VMM on rocm
2025-01-25 00:02:23 +01:00