llama.cpp

Author	SHA1	Message	Date
Michal Moskal	00fcd984d5	include <cmath> for INFINITY	2025-01-26 12:36:06 -08:00
Michal Moskal	1afc53a338	fix warning	2025-01-26 12:33:11 -08:00
Michal Moskal	08fefd1d7c	fix whitespace	2025-01-26 12:30:02 -08:00
Michal Moskal	efc36c9acf	add $LLGUIDANCE_LOG_LEVEL support	2025-01-26 10:15:22 -08:00
Michal Moskal	c9e9853e6c	format file	2025-01-26 10:11:39 -08:00
Michal Moskal	44e1973af0	update llg	2025-01-26 10:09:57 -08:00
Michal Moskal	ca88ce7b77	llama_tokenizer() in fact requires valid utf8	2025-01-26 10:09:51 -08:00
Michal Moskal	8e027f8dcd	align tests with LLG grammar syntax and JSON Schema spec	2025-01-26 09:59:31 -08:00
Michal Moskal	0a211fcb9d	add gh action for llg test	2025-01-26 09:06:38 -08:00
Michal Moskal	c7ebf57822	rename llguidance test file to test-grammar-llguidance.cpp	2025-01-26 08:54:56 -08:00
Michal Moskal	29375376fe	conditionally include llguidance test based on LLAMA_LLGUIDANCE flag	2025-01-26 08:53:49 -08:00
Michal Moskal	16a5484048	gbnf -> lark syntax	2025-01-26 08:50:59 -08:00
Michal Moskal	f245ca26f5	build and run test	2025-01-26 08:49:05 -08:00
Michal Moskal	036b91fbc3	fix ref-count bug	2025-01-26 08:48:53 -08:00
Michal Moskal	58006ddb13	clang fmt	2025-01-26 08:20:26 -08:00
Michal Moskal	3675050804	copy test-grammar-integration.cpp to test-llguidance.cpp	2025-01-26 08:18:10 -08:00
Michal Moskal	a7be6669b1	pass vocab not model to llama_sampler_init_llg()	2025-01-26 08:16:56 -08:00
Michal Moskal	de269a1833	fix tests when llg is enabled	2025-01-26 08:02:37 -08:00
Michal Moskal	8cb12d43d6	remove llguidance.h from .gitignore	2025-01-25 20:45:59 -08:00
Michal Moskal	2a92bfbe06	code style fixes	2025-01-25 20:43:33 -08:00
Michal Moskal	adc4aed0af	clarify docs	2025-01-25 20:35:41 -08:00
Michal Moskal	b5399d44c2	add some docs	2025-01-25 20:27:07 -08:00
Michal Moskal	afb6cac5ab	use '%llguidance' as marker to enable llg lark syntax	2025-01-25 16:57:28 -08:00
Michal Moskal	f4dc4b89fa	build: integrate llguidance as an external project	2025-01-25 15:49:23 -08:00
Michal Moskal	f19655c4c0	update for new APIs	2025-01-25 15:49:07 -08:00
Michal Moskal	76290d9ea0	initial porting of previous LLG patch	2025-01-25 14:43:57 -08:00
Jeff Bolz	4a75d19376	vulkan: compile shaders on-demand (#11406 ) Reduce first-run startup time and memory consumption. Should fix #11339.	2025-01-25 22:29:57 +01:00
uvos	26771a1491	Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420 )	2025-01-25 21:01:12 +01:00
Jeff Bolz	ca6baf76c1	build: add /bigobj to MSVC build (#11407 )	2025-01-25 11:26:37 -06:00
Diego Devesa	6e264a905b	docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419 )	2025-01-25 17:22:41 +01:00
Xuan Son Nguyen	49b0e3cec4	server : fix cleaning up stream task (#11418 ) * server : fix cleaning up stream task * one more spot	2025-01-25 16:36:44 +01:00
Diego Devesa	20a758155b	docker : fix CPU ARM build (#11403 ) * docker : fix CPU ARM build * add CURL to other builds	2025-01-25 15:22:29 +01:00
Georgi Gerganov	00c24acb2a	ci : fix line breaks on windows builds (#11409 ) * ci : fix line breaks on windows builds * cont : another try * ci : fix powershell line breaks	2025-01-25 13:36:48 +02:00
jiahao su	466ea66f33	CANN: Add Ascend CANN build ci (#10217 ) * CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>	2025-01-25 00:26:01 +01:00
uvos	5f0db9522f	hip : Add hipGraph and VMM support to ROCM (#11362 ) * Add hipGraph support * Enable VMM on rocm	2025-01-25 00:02:23 +01:00
Johannes Gäßler	c5d9effb49	CUDA: fix FP16 cuBLAS GEMM (#11396 )	2025-01-24 21:02:43 +01:00
uvos	9fbadaef4f	rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356 )	2025-01-24 17:50:49 +01:00
Georgi Gerganov	9755129c27	release : pack /lib in the packages (#11392 ) * release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit `4decf2c4df`. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit `537b09e70f`.	2025-01-24 18:41:30 +02:00
Jafar Uruç	a07c2c8a52	docs : Update readme to build targets for local docker build (#11368 )	2025-01-24 14:30:13 +01:00
Johannes Gäßler	8137b4bb2b	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380 )	2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann	1af6945eb0	cmake : avoid -march=native when reproducible build is wanted (#11366 ) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: #11317 This patch was done while working on reproducible builds for openSUSE.	2025-01-24 13:21:35 +02:00
Eric Curtin	01f37edf1a	Update llama-run README.md (#11386 ) For consistency Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-24 09:39:24 +00:00
stduhpf	c07e87f38b	server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364 ) * webui : put DeepSeek R1 CoT in a collapsible <details> element * webui: refactor split * webui: don't use regex to split cot and response * webui: format+qol * webui: no loading icon if the model isn't generating * ui fix, add configs * add jsdoc types * only filter </think> for assistant msg * build * update build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-24 09:02:38 +01:00
Jeff Bolz	564804b79b	tests: fix some mul_mat test gaps (#11375 ) Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.	2025-01-23 14:51:24 -06:00
Eric Curtin	05f63cc9ee	Update documentation (#11373 ) To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 20:04:31 +00:00
Eric Curtin	f7fb43cd0b	Add -ngl (#11372 ) Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 16:16:18 +00:00
Xuan Son Nguyen	5845661640	server : add more clean up when cancel_tasks is called (#11340 ) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if	2025-01-23 13:56:05 +01:00
Eric Curtin	f211d1dc10	Treat hf.co/ prefix the same as hf:// (#11350 ) ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 10:38:20 +00:00
amd-dwang	955a6c2d91	Vulkan-run-test: fix mmq_wg_denoms (#11343 ) There should be a copy-and-paste error here. mmq_wg_denoms should be used together with warptile_mmq, instead of wg_denoms.	2025-01-23 08:14:28 +01:00
Jeff Bolz	1971adf55e	vulkan: sort shaders for more deterministic binary (#11315 ) Fixes #11306.	2025-01-23 08:07:50 +01:00

1 2 3 4 5 ...

4582 commits