ochafik
57f40e366b
tool-call: fix lazy grammar & mixed content + tool calls parsing
2025-01-27 15:41:54 +00:00
ochafik
2efa0c27bf
tool-call: add weather tool e2e tests
2025-01-27 15:02:09 +00:00
ochafik
15ec01e896
jinja: only add special tokens if template doesn't seem to handle them
2025-01-27 14:28:11 +00:00
ochafik
da606d8d41
tool-call: remove nonsensical code_interpreter code
2025-01-27 14:19:20 +00:00
Haus1
d6d24cd9ed
AMD: parse the architecture as supplied by gcnArchName ( #11244 )
...
The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.
2025-01-27 14:58:17 +01:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-27 14:42:09 +01:00
ochafik
bddc1bebcc
tool-call: fix special handling of special trigger tokens (Nemo)
2025-01-27 11:37:41 +00:00
Johannes Gäßler
df984e0147
llama: refactor llama_decode_impl ( #11381 )
2025-01-27 12:07:12 +01:00
Ihar Hrachyshka
acd38efee3
metal: Handle null returned from MTLCreateSystemDefaultDevice() ( #11441 )
...
This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).
2025-01-27 09:41:59 +02:00
ochafik
ca0c837b6a
nits
2025-01-27 01:08:29 +00:00
ochafik
f7078cab36
tool-call: fix functionary v3.1 required test
2025-01-26 23:23:09 +00:00
Xuan Son Nguyen
caf773f249
docker : fix ARM build and Vulkan build ( #11434 )
...
* ci : do not fail-fast for docker
* build arm64/amd64 separatedly
* fix pip
* no fast fail
* vulkan: try jammy
2025-01-26 22:45:32 +01:00
ochafik
5ec4c5e4d3
reshuffle chat handlers
2025-01-26 21:38:07 +00:00
ochafik
43385b2ff2
sync: minja
2025-01-26 21:36:25 +00:00
Georgi Gerganov
178a7eb952
metal : use residency sets ( #11427 )
...
* metal : use residency sets
ggml-ci
* metal : restore commandBufferWithUnretainedReferences calls [no ci]
* metal : release descriptors
ggml-ci
* metal : check env GGML_METAL_NO_RESIDENCY
ggml-ci
* metal : fix build + clean-up
ggml-ci
2025-01-26 20:06:16 +02:00
Nuno
6f53d8a6b4
docker: add missing vulkan library to base layer and update to 24.04 ( #11422 )
...
Signed-off-by: rare-magma <rare-magma@posteo.eu>
2025-01-26 18:22:43 +01:00
bandoti
19f65187cb
cmake: add ggml find package ( #11369 )
...
* Add initial ggml cmake package
* Add build numbers to ggml find-package
* Expand variables with GGML_ prefix
* Guard against adding to cache variable twice
* Add git to msys2 workflow
* Handle ggml-cpu-* variants
* Link ggml/ggml-base libraries to their targets
* Replace main-cmake-pkg with simple-cmake-pkg
* Interface features require c_std_90
* Fix typo
* Removed unnecessary bracket from status message
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/simple-cmake-pkg/README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-26 12:07:48 -04:00
ochafik
11594557e3
Merge branch 'tool-call' into tool-call-handler
2025-01-26 15:32:53 +00:00
ochafik
3f3fc03983
nit: trailing spaces
2025-01-26 15:32:13 +00:00
Frank Mai
1d8ee06000
rpc: fix register position ( #11424 )
...
Signed-off-by: thxCode <thxcode0824@gmail.com>
2025-01-26 16:20:34 +01:00
Georgi Gerganov
2cc9b8c32c
readme : update hot topics
2025-01-26 14:30:15 +02:00
Jeff Bolz
f35726c2fb
build: apply MSVC /bigobj option to c/cpp files only ( #11423 )
2025-01-26 03:10:03 +01:00
Jeff Bolz
4a75d19376
vulkan: compile shaders on-demand ( #11406 )
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-01-25 22:29:57 +01:00
uvos
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations ( #11420 )
2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1
build: add /bigobj to MSVC build ( #11407 )
2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for ( #11419 )
2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task ( #11418 )
...
* server : fix cleaning up stream task
* one more spot
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b
docker : fix CPU ARM build ( #11403 )
...
* docker : fix CPU ARM build
* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a
ci : fix line breaks on windows builds ( #11409 )
...
* ci : fix line breaks on windows builds
* cont : another try
* ci : fix powershell line breaks
2025-01-25 13:36:48 +02:00
Olivier Chafik
51b7aab841
Update test_chat_completion.py
2025-01-25 04:57:40 +00:00
Olivier Chafik
a6463c1e35
jinja: don't add bos when jinja enabled
2025-01-25 04:52:42 +00:00
Olivier Chafik
0208b20767
Update test_chat_completion.py
2025-01-25 04:52:03 +00:00
Olivier Chafik
c479d39abd
tool-call: allow special tokens that are grammar triggers
2025-01-25 04:51:53 +00:00
jiahao su
466ea66f33
CANN: Add Ascend CANN build ci ( #10217 )
...
* CANN: Add Ascend CANN build ci
* Update build.yml
* Modify cann image version
* Update build.yml
* Change to run on x86 system
* Update build.yml
* Update build.yml
* Modify format error
* Update build.yml
* Add 'Ascend NPU' label restrictions
* Exclude non PR event
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
* Update build.yml
---------
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM ( #11362 )
...
* Add hipGraph support
* Enable VMM on rocm
2025-01-25 00:02:23 +01:00
Johannes Gäßler
c5d9effb49
CUDA: fix FP16 cuBLAS GEMM ( #11396 )
2025-01-24 21:02:43 +01:00
uvos
9fbadaef4f
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna ( #11356 )
2025-01-24 17:50:49 +01:00
Georgi Gerganov
9755129c27
release : pack /lib in the packages ( #11392 )
...
* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4df
.
* ci : fix HIP cmake compiler options to be on first line
* ci : restore the original HIP commands
* ci : change ubuntu build from latest to 20.04
* ci : try to fix macos build rpaths
* ci : remove obsolete MacOS build
* TMP : push artifacts
* ci : change back to ubuntu latest
* ci : macos set build rpath to "@loader_path"
* ci : fix typo
* ci : change ubuntu package to 22.04
* Revert "TMP : push artifacts"
This reverts commit 537b09e70f
.
2025-01-24 18:41:30 +02:00
Jafar Uruç
a07c2c8a52
docs : Update readme to build targets for local docker build ( #11368 )
2025-01-24 14:30:13 +01:00
Johannes Gäßler
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )
2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann
1af6945eb0
cmake : avoid -march=native when reproducible build is wanted ( #11366 )
...
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.
Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.
Fixes : #11317
This patch was done while working on reproducible builds for openSUSE.
2025-01-24 13:21:35 +02:00
Eric Curtin
01f37edf1a
Update llama-run README.md ( #11386 )
...
For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )
...
* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Olivier Chafik
36ed106f84
WIP chat handlers
2025-01-24 02:31:37 +00:00
Jeff Bolz
564804b79b
tests: fix some mul_mat test gaps ( #11375 )
...
Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.
2025-01-23 14:51:24 -06:00
Eric Curtin
05f63cc9ee
Update documentation ( #11373 )
...
To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b
Add -ngl ( #11372 )
...
Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called ( #11340 )
...
* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if
2025-01-23 13:56:05 +01:00
Eric Curtin
f211d1dc10
Treat hf.co/ prefix the same as hf:// ( #11350 )
...
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 10:38:20 +00:00
amd-dwang
955a6c2d91
Vulkan-run-test: fix mmq_wg_denoms ( #11343 )
...
There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
2025-01-23 08:14:28 +01:00