Commit graph

4582 commits

Author SHA1 Message Date
Michal Moskal
00fcd984d5 include <cmath> for INFINITY 2025-01-26 12:36:06 -08:00
Michal Moskal
1afc53a338 fix warning 2025-01-26 12:33:11 -08:00
Michal Moskal
08fefd1d7c fix whitespace 2025-01-26 12:30:02 -08:00
Michal Moskal
efc36c9acf add $LLGUIDANCE_LOG_LEVEL support 2025-01-26 10:15:22 -08:00
Michal Moskal
c9e9853e6c format file 2025-01-26 10:11:39 -08:00
Michal Moskal
44e1973af0 update llg 2025-01-26 10:09:57 -08:00
Michal Moskal
ca88ce7b77 llama_tokenizer() in fact requires valid utf8 2025-01-26 10:09:51 -08:00
Michal Moskal
8e027f8dcd align tests with LLG grammar syntax and JSON Schema spec 2025-01-26 09:59:31 -08:00
Michal Moskal
0a211fcb9d add gh action for llg test 2025-01-26 09:06:38 -08:00
Michal Moskal
c7ebf57822 rename llguidance test file to test-grammar-llguidance.cpp 2025-01-26 08:54:56 -08:00
Michal Moskal
29375376fe conditionally include llguidance test based on LLAMA_LLGUIDANCE flag 2025-01-26 08:53:49 -08:00
Michal Moskal
16a5484048 gbnf -> lark syntax 2025-01-26 08:50:59 -08:00
Michal Moskal
f245ca26f5 build and run test 2025-01-26 08:49:05 -08:00
Michal Moskal
036b91fbc3 fix ref-count bug 2025-01-26 08:48:53 -08:00
Michal Moskal
58006ddb13 clang fmt 2025-01-26 08:20:26 -08:00
Michal Moskal
3675050804 copy test-grammar-integration.cpp to test-llguidance.cpp 2025-01-26 08:18:10 -08:00
Michal Moskal
a7be6669b1 pass vocab not model to llama_sampler_init_llg() 2025-01-26 08:16:56 -08:00
Michal Moskal
de269a1833 fix tests when llg is enabled 2025-01-26 08:02:37 -08:00
Michal Moskal
8cb12d43d6 remove llguidance.h from .gitignore 2025-01-25 20:45:59 -08:00
Michal Moskal
2a92bfbe06 code style fixes 2025-01-25 20:43:33 -08:00
Michal Moskal
adc4aed0af clarify docs 2025-01-25 20:35:41 -08:00
Michal Moskal
b5399d44c2 add some docs 2025-01-25 20:27:07 -08:00
Michal Moskal
afb6cac5ab use '%llguidance' as marker to enable llg lark syntax 2025-01-25 16:57:28 -08:00
Michal Moskal
f4dc4b89fa build: integrate llguidance as an external project 2025-01-25 15:49:23 -08:00
Michal Moskal
f19655c4c0 update for new APIs 2025-01-25 15:49:07 -08:00
Michal Moskal
76290d9ea0 initial porting of previous LLG patch 2025-01-25 14:43:57 -08:00
Jeff Bolz
4a75d19376
vulkan: compile shaders on-demand (#11406)
Reduce first-run startup time and memory consumption.

Should fix #11339.
2025-01-25 22:29:57 +01:00
uvos
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) 2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1
build: add /bigobj to MSVC build (#11407) 2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) 2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task (#11418)
* server : fix cleaning up stream task

* one more spot
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b
docker : fix CPU ARM build (#11403)
* docker : fix CPU ARM build

* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a
ci : fix line breaks on windows builds (#11409)
* ci : fix line breaks on windows builds

* cont : another try

* ci : fix powershell line breaks
2025-01-25 13:36:48 +02:00
jiahao su
466ea66f33
CANN: Add Ascend CANN build ci (#10217)
* CANN: Add Ascend CANN build ci

* Update build.yml

* Modify cann image version

* Update build.yml

* Change to run on x86 system

* Update build.yml

* Update build.yml

* Modify format error

* Update build.yml

* Add 'Ascend NPU' label restrictions

* Exclude non PR event

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>

* Update build.yml

---------

Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM (#11362)
* Add hipGraph support

* Enable VMM on rocm
2025-01-25 00:02:23 +01:00
Johannes Gäßler
c5d9effb49
CUDA: fix FP16 cuBLAS GEMM (#11396) 2025-01-24 21:02:43 +01:00
uvos
9fbadaef4f
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) 2025-01-24 17:50:49 +01:00
Georgi Gerganov
9755129c27
release : pack /lib in the packages (#11392)
* release : pack /lib and /include in the packages

* cmake : put libs in /bin

* TMP : push artifacts

* Revert "TMP : push artifacts"

This reverts commit 4decf2c4df.

* ci : fix HIP cmake compiler options to be on first line

* ci : restore the original HIP commands

* ci : change ubuntu build from latest to 20.04

* ci : try to fix macos build rpaths

* ci : remove obsolete MacOS build

* TMP : push artifacts

* ci : change back to ubuntu latest

* ci : macos set build rpath to "@loader_path"

* ci : fix typo

* ci : change ubuntu package to 22.04

* Revert "TMP : push artifacts"

This reverts commit 537b09e70f.
2025-01-24 18:41:30 +02:00
Jafar Uruç
a07c2c8a52
docs : Update readme to build targets for local docker build (#11368) 2025-01-24 14:30:13 +01:00
Johannes Gäßler
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) 2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann
1af6945eb0
cmake : avoid -march=native when reproducible build is wanted (#11366)
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317

This patch was done while working on reproducible builds for openSUSE.
2025-01-24 13:21:35 +02:00
Eric Curtin
01f37edf1a
Update llama-run README.md (#11386)
For consistency

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364)
* webui : put DeepSeek R1 CoT in a collapsible <details> element

* webui: refactor split

* webui: don't use regex to split cot and response

* webui: format+qol

* webui: no loading icon if the model isn't generating

* ui fix, add configs

* add jsdoc types

* only filter </think> for assistant msg

* build

* update build

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Jeff Bolz
564804b79b
tests: fix some mul_mat test gaps (#11375)
Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.
2025-01-23 14:51:24 -06:00
Eric Curtin
05f63cc9ee
Update documentation (#11373)
To show -n, -ngl, --ngl is acceptable.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b
Add -ngl (#11372)
Most other llama.cpp cli tools accept -ngl with a single dash.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called (#11340)
* server : add more clean up when cancel_tasks is called

* fix recv_with_timeout

* std::remove_if

* fix std::remove_if
2025-01-23 13:56:05 +01:00
Eric Curtin
f211d1dc10
Treat hf.co/ prefix the same as hf:// (#11350)
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://

Treat them similarly.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 10:38:20 +00:00
amd-dwang
955a6c2d91
Vulkan-run-test: fix mmq_wg_denoms (#11343)
There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
2025-01-23 08:14:28 +01:00
Jeff Bolz
1971adf55e
vulkan: sort shaders for more deterministic binary (#11315)
Fixes #11306.
2025-01-23 08:07:50 +01:00