llama.cpp

Author	SHA1	Message	Date
Diego Devesa	6e264a905b	docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419 )	2025-01-25 17:22:41 +01:00
Xuan Son Nguyen	49b0e3cec4	server : fix cleaning up stream task (#11418 ) * server : fix cleaning up stream task * one more spot	2025-01-25 16:36:44 +01:00
Diego Devesa	20a758155b	docker : fix CPU ARM build (#11403 ) * docker : fix CPU ARM build * add CURL to other builds	2025-01-25 15:22:29 +01:00
Georgi Gerganov	00c24acb2a	ci : fix line breaks on windows builds (#11409 ) * ci : fix line breaks on windows builds * cont : another try * ci : fix powershell line breaks	2025-01-25 13:36:48 +02:00
Olivier Chafik	51b7aab841	Update test_chat_completion.py	2025-01-25 04:57:40 +00:00
Olivier Chafik	a6463c1e35	jinja: don't add bos when jinja enabled	2025-01-25 04:52:42 +00:00
Olivier Chafik	0208b20767	Update test_chat_completion.py	2025-01-25 04:52:03 +00:00
Olivier Chafik	c479d39abd	tool-call: allow special tokens that are grammar triggers	2025-01-25 04:51:53 +00:00
jiahao su	466ea66f33	CANN: Add Ascend CANN build ci (#10217 ) * CANN: Add Ascend CANN build ci * Update build.yml * Modify cann image version * Update build.yml * Change to run on x86 system * Update build.yml * Update build.yml * Modify format error * Update build.yml * Add 'Ascend NPU' label restrictions * Exclude non PR event Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> * Update build.yml --------- Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>	2025-01-25 00:26:01 +01:00
uvos	5f0db9522f	hip : Add hipGraph and VMM support to ROCM (#11362 ) * Add hipGraph support * Enable VMM on rocm	2025-01-25 00:02:23 +01:00
Johannes Gäßler	c5d9effb49	CUDA: fix FP16 cuBLAS GEMM (#11396 )	2025-01-24 21:02:43 +01:00
uvos	9fbadaef4f	rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356 )	2025-01-24 17:50:49 +01:00
Georgi Gerganov	9755129c27	release : pack /lib in the packages (#11392 ) * release : pack /lib and /include in the packages * cmake : put libs in /bin * TMP : push artifacts * Revert "TMP : push artifacts" This reverts commit `4decf2c4df`. * ci : fix HIP cmake compiler options to be on first line * ci : restore the original HIP commands * ci : change ubuntu build from latest to 20.04 * ci : try to fix macos build rpaths * ci : remove obsolete MacOS build * TMP : push artifacts * ci : change back to ubuntu latest * ci : macos set build rpath to "@loader_path" * ci : fix typo * ci : change ubuntu package to 22.04 * Revert "TMP : push artifacts" This reverts commit `537b09e70f`.	2025-01-24 18:41:30 +02:00
Jafar Uruç	a07c2c8a52	docs : Update readme to build targets for local docker build (#11368 )	2025-01-24 14:30:13 +01:00
Johannes Gäßler	8137b4bb2b	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380 )	2025-01-24 12:38:31 +01:00
Bernhard M. Wiedemann	1af6945eb0	cmake : avoid -march=native when reproducible build is wanted (#11366 ) See https://reproducible-builds.org/ for why this is good and https://reproducible-builds.org/specs/source-date-epoch/ for the definition of this variable. Without this patch, compiling on different machines produced different binaries, which made verification of results difficult. Fixes: #11317 This patch was done while working on reproducible builds for openSUSE.	2025-01-24 13:21:35 +02:00
Eric Curtin	01f37edf1a	Update llama-run README.md (#11386 ) For consistency Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-24 09:39:24 +00:00
stduhpf	c07e87f38b	server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364 ) * webui : put DeepSeek R1 CoT in a collapsible <details> element * webui: refactor split * webui: don't use regex to split cot and response * webui: format+qol * webui: no loading icon if the model isn't generating * ui fix, add configs * add jsdoc types * only filter </think> for assistant msg * build * update build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-24 09:02:38 +01:00
Olivier Chafik	36ed106f84	WIP chat handlers	2025-01-24 02:31:37 +00:00
Jeff Bolz	564804b79b	tests: fix some mul_mat test gaps (#11375 ) Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.	2025-01-23 14:51:24 -06:00
Eric Curtin	05f63cc9ee	Update documentation (#11373 ) To show -n, -ngl, --ngl is acceptable. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 20:04:31 +00:00
Eric Curtin	f7fb43cd0b	Add -ngl (#11372 ) Most other llama.cpp cli tools accept -ngl with a single dash. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 16:16:18 +00:00
Xuan Son Nguyen	5845661640	server : add more clean up when cancel_tasks is called (#11340 ) * server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if	2025-01-23 13:56:05 +01:00
Eric Curtin	f211d1dc10	Treat hf.co/ prefix the same as hf:// (#11350 ) ollama uses hf.co/ to specify huggingface prefix, like RamaLama uses hf:// Treat them similarly. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-23 10:38:20 +00:00
amd-dwang	955a6c2d91	Vulkan-run-test: fix mmq_wg_denoms (#11343 ) There should be a copy-and-paste error here. mmq_wg_denoms should be used together with warptile_mmq, instead of wg_denoms.	2025-01-23 08:14:28 +01:00
Jeff Bolz	1971adf55e	vulkan: sort shaders for more deterministic binary (#11315 ) Fixes #11306.	2025-01-23 08:07:50 +01:00
Jeff Bolz	5245729e33	vulkan: fix diag_mask_inf (#11323 ) With robustbufferaccess disabled, this shader was showing OOB stores. There is a bounds check in the code, but the workgrouop dimensions were reversed vs CUDA and it was running the wrong number of threads. So fix the workgroup dimensions and disable robustness for this pipeline.	2025-01-23 08:01:17 +01:00
Olivier Chafik	46415d7a51	Fix lazy trigger handling	2025-01-22 19:08:19 +00:00
Olivier Chafik	c2d836f9d0	Update real tool call tests (use less models)	2025-01-22 18:47:32 +00:00
Olivier Chafik	a46de6a03a	Add grammar options + rename builder to common_grammar_builder	2025-01-22 18:36:04 +00:00
Olivier Chafik	cdfa8b9d4f	Update chat-template.hpp	2025-01-22 18:35:24 +00:00
Olivier Chafik	5e358ade59	fix msg init warning	2025-01-22 18:35:20 +00:00
Diego Devesa	6152129d05	main : update README documentation for batch size (#11353 ) * main : update README documentation for batch size * fix formatting * minor	2025-01-22 19:22:20 +01:00
Georgi Gerganov	16d3df7ab0	readme : add plugin links (#11355 )	2025-01-22 19:44:26 +02:00
Diego Devesa	12c2bdf2de	server : fix draft context not being released (#11354 )	2025-01-22 17:44:40 +01:00
Olivier Chafik	f0231a586e	fix common_chat_msg invocations	2025-01-22 16:25:51 +00:00
Olivier Chafik	d186721e41	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-22 16:22:16 +00:00
Olivier Chafik	c64d2becb1	`minja`: sync at `0f5f7f2b37` (#11352 )	2025-01-22 16:16:27 +00:00
Olivier Chafik	9ccc62b3c9	Sync minja after https://github.com/google/minja/pull/29	2025-01-22 14:32:18 +00:00
Jiří Podivín	96f4053934	Adding logprobs to /v1/completions (#11344 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2025-01-22 12:51:32 +01:00
Olivier Chafik	30d33d9f68	Update test_chat_completion.py	2025-01-22 11:42:36 +00:00
Olivier Chafik	c6a22edc57	Greedy sampling in tool call tests	2025-01-22 11:41:43 +00:00
Olivier Chafik	cce1166b37	Update tool-call.cpp	2025-01-22 11:25:26 +00:00
Olivier Chafik	a4226365bf	nits	2025-01-22 11:23:37 +00:00
Olivier Chafik	63387c6dca	smaller diff	2025-01-22 11:14:25 +00:00
Olivier Chafik	82b6e9a5c3	merge common_tool_calls into common_chat_msg	2025-01-22 11:05:05 +00:00
Olivier Chafik	01b345be0f	Merge remote-tracking branch 'origin/master' into tool-call	2025-01-22 10:02:23 +00:00
Olivier Chafik	a94f3b2727	`common`: utils to split / join / repeat strings (from json converter) (#11342 ) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp	2025-01-22 09:51:44 +00:00
tc-mb	3e3357fd77	llava : support Minicpm-omni (#11289 ) * init * add readme * update readme * no use make * update readme * update fix code * fix editorconfig-checker * no change convert py * use clip_image_u8_free	2025-01-22 09:35:48 +02:00
Olivier Chafik	2dd09c792f	more cleanups	2025-01-22 03:20:47 +00:00

1 2 3 4 5 ...

4940 commits