Michal Moskal
1afc53a338
fix warning
2025-01-26 12:33:11 -08:00
Michal Moskal
08fefd1d7c
fix whitespace
2025-01-26 12:30:02 -08:00
Michal Moskal
efc36c9acf
add $LLGUIDANCE_LOG_LEVEL support
2025-01-26 10:15:22 -08:00
Michal Moskal
c9e9853e6c
format file
2025-01-26 10:11:39 -08:00
Michal Moskal
44e1973af0
update llg
2025-01-26 10:09:57 -08:00
Michal Moskal
ca88ce7b77
llama_tokenizer() in fact requires valid utf8
2025-01-26 10:09:51 -08:00
Michal Moskal
8e027f8dcd
align tests with LLG grammar syntax and JSON Schema spec
2025-01-26 09:59:31 -08:00
Michal Moskal
0a211fcb9d
add gh action for llg test
2025-01-26 09:06:38 -08:00
Michal Moskal
c7ebf57822
rename llguidance test file to test-grammar-llguidance.cpp
2025-01-26 08:54:56 -08:00
Michal Moskal
29375376fe
conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
2025-01-26 08:53:49 -08:00
Michal Moskal
16a5484048
gbnf -> lark syntax
2025-01-26 08:50:59 -08:00
Michal Moskal
f245ca26f5
build and run test
2025-01-26 08:49:05 -08:00
Michal Moskal
036b91fbc3
fix ref-count bug
2025-01-26 08:48:53 -08:00
Michal Moskal
58006ddb13
clang fmt
2025-01-26 08:20:26 -08:00
Michal Moskal
3675050804
copy test-grammar-integration.cpp to test-llguidance.cpp
2025-01-26 08:18:10 -08:00
Michal Moskal
a7be6669b1
pass vocab not model to llama_sampler_init_llg()
2025-01-26 08:16:56 -08:00
Michal Moskal
de269a1833
fix tests when llg is enabled
2025-01-26 08:02:37 -08:00
Michal Moskal
8cb12d43d6
remove llguidance.h from .gitignore
2025-01-25 20:45:59 -08:00
Michal Moskal
2a92bfbe06
code style fixes
2025-01-25 20:43:33 -08:00
Michal Moskal
adc4aed0af
clarify docs
2025-01-25 20:35:41 -08:00
Michal Moskal
b5399d44c2
add some docs
2025-01-25 20:27:07 -08:00
Michal Moskal
afb6cac5ab
use '%llguidance' as marker to enable llg lark syntax
2025-01-25 16:57:28 -08:00
Michal Moskal
f4dc4b89fa
build: integrate llguidance as an external project
2025-01-25 15:49:23 -08:00
Michal Moskal
f19655c4c0
update for new APIs
2025-01-25 15:49:07 -08:00
Michal Moskal
76290d9ea0
initial porting of previous LLG patch
2025-01-25 14:43:57 -08:00
Jeff Bolz
4a75d19376
vulkan: compile shaders on-demand ( #11406 )
...
Reduce first-run startup time and memory consumption.
Should fix #11339 .
2025-01-25 22:29:57 +01:00
uvos
26771a1491
Hip: disable VMM on hip as it seams that it dosent work in some configurations ( #11420 )
2025-01-25 21:01:12 +01:00
Jeff Bolz
ca6baf76c1
build: add /bigobj to MSVC build ( #11407 )
2025-01-25 11:26:37 -06:00
Diego Devesa
6e264a905b
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for ( #11419 )
2025-01-25 17:22:41 +01:00
Xuan Son Nguyen
49b0e3cec4
server : fix cleaning up stream task ( #11418 )
...
* server : fix cleaning up stream task
* one more spot
2025-01-25 16:36:44 +01:00
Diego Devesa
20a758155b
docker : fix CPU ARM build ( #11403 )
...
* docker : fix CPU ARM build
* add CURL to other builds
2025-01-25 15:22:29 +01:00
Georgi Gerganov
00c24acb2a
ci : fix line breaks on windows builds ( #11409 )
...
* ci : fix line breaks on windows builds
* cont : another try
* ci : fix powershell line breaks
2025-01-25 13:36:48 +02:00
jiahao su
466ea66f33
CANN: Add Ascend CANN build ci ( #10217 )
...
* CANN: Add Ascend CANN build ci
* Update build.yml
* Modify cann image version
* Update build.yml
* Change to run on x86 system
* Update build.yml
* Update build.yml
* Modify format error
* Update build.yml
* Add 'Ascend NPU' label restrictions
* Exclude non PR event
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
* Update build.yml
---------
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-01-25 00:26:01 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM ( #11362 )
...
* Add hipGraph support
* Enable VMM on rocm
2025-01-25 00:02:23 +01:00
Johannes Gäßler
c5d9effb49
CUDA: fix FP16 cuBLAS GEMM ( #11396 )
2025-01-24 21:02:43 +01:00
uvos
9fbadaef4f
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna ( #11356 )
2025-01-24 17:50:49 +01:00
Georgi Gerganov
9755129c27
release : pack /lib in the packages ( #11392 )
...
* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4df
.
* ci : fix HIP cmake compiler options to be on first line
* ci : restore the original HIP commands
* ci : change ubuntu build from latest to 20.04
* ci : try to fix macos build rpaths
* ci : remove obsolete MacOS build
* TMP : push artifacts
* ci : change back to ubuntu latest
* ci : macos set build rpath to "@loader_path"
* ci : fix typo
* ci : change ubuntu package to 22.04
* Revert "TMP : push artifacts"
This reverts commit 537b09e70f
.
2025-01-24 18:41:30 +02:00
Jafar Uruç
a07c2c8a52
docs : Update readme to build targets for local docker build ( #11368 )
2025-01-24 14:30:13 +01:00
Johannes Gäßler
8137b4bb2b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )
2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann
1af6945eb0
cmake : avoid -march=native when reproducible build is wanted ( #11366 )
...
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.
Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.
Fixes : #11317
This patch was done while working on reproducible builds for openSUSE.
2025-01-24 13:21:35 +02:00
Eric Curtin
01f37edf1a
Update llama-run README.md ( #11386 )
...
For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-24 09:39:24 +00:00
stduhpf
c07e87f38b
server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )
...
* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-24 09:02:38 +01:00
Jeff Bolz
564804b79b
tests: fix some mul_mat test gaps ( #11375 )
...
Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.
2025-01-23 14:51:24 -06:00
Eric Curtin
05f63cc9ee
Update documentation ( #11373 )
...
To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 20:04:31 +00:00
Eric Curtin
f7fb43cd0b
Add -ngl ( #11372 )
...
Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 16:16:18 +00:00
Xuan Son Nguyen
5845661640
server : add more clean up when cancel_tasks is called ( #11340 )
...
* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if
2025-01-23 13:56:05 +01:00
Eric Curtin
f211d1dc10
Treat hf.co/ prefix the same as hf:// ( #11350 )
...
ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-23 10:38:20 +00:00
amd-dwang
955a6c2d91
Vulkan-run-test: fix mmq_wg_denoms ( #11343 )
...
There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.
2025-01-23 08:14:28 +01:00
Jeff Bolz
1971adf55e
vulkan: sort shaders for more deterministic binary ( #11315 )
...
Fixes #11306 .
2025-01-23 08:07:50 +01:00
Jeff Bolz
5245729e33
vulkan: fix diag_mask_inf ( #11323 )
...
With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.
2025-01-23 08:01:17 +01:00