Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f35726c2fb 
								
							 
						 
						
							
							
								
								build: apply MSVC /bigobj option to c/cpp files only ( #11423 )  
							
							
							
						 
						
							2025-01-26 03:10:03 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4a75d19376 
								
							 
						 
						
							
							
								
								vulkan: compile shaders on-demand ( #11406 )  
							
							... 
							
							
							
							Reduce first-run startup time and memory consumption.
Should fix  #11339 . 
							
						 
						
							2025-01-25 22:29:57 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									uvos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								26771a1491 
								
							 
						 
						
							
							
								
								Hip: disable VMM on hip as it seams that it dosent work in some configurations ( #11420 )  
							
							
							
						 
						
							2025-01-25 21:01:12 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ca6baf76c1 
								
							 
						 
						
							
							
								
								build: add /bigobj to MSVC build ( #11407 )  
							
							
							
						 
						
							2025-01-25 11:26:37 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6e264a905b 
								
							 
						 
						
							
							
								
								docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for ( #11419 )  
							
							
							
						 
						
							2025-01-25 17:22:41 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								49b0e3cec4 
								
							 
						 
						
							
							
								
								server : fix cleaning up stream task ( #11418 )  
							
							... 
							
							
							
							* server : fix cleaning up stream task
* one more spot 
							
						 
						
							2025-01-25 16:36:44 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								20a758155b 
								
							 
						 
						
							
							
								
								docker : fix CPU ARM build ( #11403 )  
							
							... 
							
							
							
							* docker : fix CPU ARM build
* add CURL to other builds 
							
						 
						
							2025-01-25 15:22:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00c24acb2a 
								
							 
						 
						
							
							
								
								ci : fix line breaks on windows builds ( #11409 )  
							
							... 
							
							
							
							* ci : fix line breaks on windows builds
* cont : another try
* ci : fix powershell line breaks 
							
						 
						
							2025-01-25 13:36:48 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									jiahao su 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								466ea66f33 
								
							 
						 
						
							
							
								
								CANN: Add Ascend CANN build ci ( #10217 )  
							
							... 
							
							
							
							* CANN: Add Ascend CANN build ci
* Update build.yml
* Modify cann image version
* Update build.yml
* Change to run on x86 system
* Update build.yml
* Update build.yml
* Modify format error
* Update build.yml
* Add 'Ascend NPU' label restrictions
* Exclude non PR event
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org>
* Update build.yml
---------
Co-authored-by: Yuanhao Ji <jiyuanhao@apache.org> 
							
						 
						
							2025-01-25 00:26:01 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									uvos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5f0db9522f 
								
							 
						 
						
							
							
								
								hip : Add hipGraph and VMM support to ROCM ( #11362 )  
							
							... 
							
							
							
							* Add hipGraph support
* Enable VMM on rocm 
							
						 
						
							2025-01-25 00:02:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c5d9effb49 
								
							 
						 
						
							
							
								
								CUDA: fix FP16 cuBLAS GEMM ( #11396 )  
							
							
							
						 
						
							2025-01-24 21:02:43 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									uvos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9fbadaef4f 
								
							 
						 
						
							
							
								
								rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna ( #11356 )  
							
							
							
						 
						
							2025-01-24 17:50:49 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9755129c27 
								
							 
						 
						
							
							
								
								release : pack /lib in the packages ( #11392 )  
							
							... 
							
							
							
							* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4df537b09e70f 
							
						 
						
							2025-01-24 18:41:30 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jafar Uruç 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a07c2c8a52 
								
							 
						 
						
							
							
								
								docs : Update readme to build targets for local docker build ( #11368 )  
							
							
							
						 
						
							2025-01-24 14:30:13 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8137b4bb2b 
								
							 
						 
						
							
							
								
								CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )  
							
							
							
						 
						
							2025-01-24 12:38:31 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Bernhard M. Wiedemann 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1af6945eb0 
								
							 
						 
						
							
							
								
								cmake : avoid -march=native when reproducible build is wanted ( #11366 )  
							
							... 
							
							
							
							See https://reproducible-builds.org/  for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/ 
for the definition of this variable.
Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.
Fixes : #11317 
This patch was done while working on reproducible builds for openSUSE. 
							
						 
						
							2025-01-24 13:21:35 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								01f37edf1a 
								
							 
						 
						
							
							
								
								Update llama-run README.md ( #11386 )  
							
							... 
							
							
							
							For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-24 09:39:24 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									stduhpf 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c07e87f38b 
								
							 
						 
						
							
							
								
								server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )  
							
							... 
							
							
							
							* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-24 09:02:38 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								564804b79b 
								
							 
						 
						
							
							
								
								tests: fix some mul_mat test gaps ( #11375 )  
							
							... 
							
							
							
							Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types. 
							
						 
						
							2025-01-23 14:51:24 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05f63cc9ee 
								
							 
						 
						
							
							
								
								Update documentation ( #11373 )  
							
							... 
							
							
							
							To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 20:04:31 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7fb43cd0b 
								
							 
						 
						
							
							
								
								Add -ngl ( #11372 )  
							
							... 
							
							
							
							Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 16:16:18 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5845661640 
								
							 
						 
						
							
							
								
								server : add more clean up when cancel_tasks is called ( #11340 )  
							
							... 
							
							
							
							* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if 
							
						 
						
							2025-01-23 13:56:05 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f211d1dc10 
								
							 
						 
						
							
							
								
								Treat hf.co/ prefix the same as hf:// ( #11350 )  
							
							... 
							
							
							
							ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 10:38:20 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									amd-dwang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								955a6c2d91 
								
							 
						 
						
							
							
								
								Vulkan-run-test: fix mmq_wg_denoms ( #11343 )  
							
							... 
							
							
							
							There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms. 
							
						 
						
							2025-01-23 08:14:28 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1971adf55e 
								
							 
						 
						
							
							
								
								vulkan: sort shaders for more deterministic binary ( #11315 )  
							
							... 
							
							
							
							Fixes  #11306 . 
						
							2025-01-23 08:07:50 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5245729e33 
								
							 
						 
						
							
							
								
								vulkan: fix diag_mask_inf ( #11323 )  
							
							... 
							
							
							
							With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline. 
							
						 
						
							2025-01-23 08:01:17 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6152129d05 
								
							 
						 
						
							
							
								
								main : update README documentation for batch size ( #11353 )  
							
							... 
							
							
							
							* main : update README documentation for batch size
* fix formatting
* minor 
							
						 
						
							2025-01-22 19:22:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								16d3df7ab0 
								
							 
						 
						
							
							
								
								readme : add plugin links ( #11355 )  
							
							
							
						 
						
							2025-01-22 19:44:26 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								12c2bdf2de 
								
							 
						 
						
							
							
								
								server : fix draft context not being released ( #11354 )  
							
							
							
						 
						
							2025-01-22 17:44:40 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c64d2becb1 
								
							 
						 
						
							
							
								
								minja: sync at 0f5f7f2b37 ( #11352 )  
							
							
							
						 
						
							2025-01-22 16:16:27 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jiří Podivín 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96f4053934 
								
							 
						 
						
							
							
								
								Adding logprobs to /v1/completions ( #11344 )  
							
							... 
							
							
							
							Signed-off-by: Jiri Podivin <jpodivin@redhat.com> 
							
						 
						
							2025-01-22 12:51:32 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a94f3b2727 
								
							 
						 
						
							
							
								
								common: utils to split / join / repeat strings (from json converter) (#11342 )  
							
							... 
							
							
							
							* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp 
							
						 
						
							2025-01-22 09:51:44 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									tc-mb 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e3357fd77 
								
							 
						 
						
							
							
								
								llava : support Minicpm-omni ( #11289 )  
							
							... 
							
							
							
							* init
* add readme
* update readme
* no use make
* update readme
* update fix code
* fix editorconfig-checker
* no change convert py
* use clip_image_u8_free 
							
						 
						
							2025-01-22 09:35:48 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6171c9d258 
								
							 
						 
						
							
							
								
								Add Jinja template support ( #11016 )  
							
							... 
							
							
							
							* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-21 13:18:51 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e28245f35f 
								
							 
						 
						
							
							
								
								export-lora : fix tok_embd tensor ( #11330 )  
							
							
							
						 
						
							2025-01-21 14:07:12 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6da5bec81c 
								
							 
						 
						
							
							
								
								rpc : better caching of the base buffer pointer ( #11331 )  
							
							... 
							
							
							
							There is no need to use map, just store the base pointer in the buffer
context. 
							
						 
						
							2025-01-21 15:06:41 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2e2f8f093c 
								
							 
						 
						
							
							
								
								linenoise.cpp refactoring ( #11301 )  
							
							... 
							
							
							
							More RAII mainly
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-21 09:32:35 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2139667ec4 
								
							 
						 
						
							
							
								
								metal : fix out-of-bounds write ( #11314 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-21 08:48:13 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80d0d6b4b7 
								
							 
						 
						
							
							
								
								common : add -hfd option for the draft model ( #11318 )  
							
							... 
							
							
							
							* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes 
							
						 
						
							2025-01-20 22:29:43 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aea8ddd516 
								
							 
						 
						
							
							
								
								vulkan: fix coopmat2 validation failures ( #11284 )  
							
							... 
							
							
							
							mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.
coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3. 
							
						 
						
							2025-01-20 10:38:32 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f7add1cde 
								
							 
						 
						
							
							
								
								examples : fix add_special conditions ( #11311 )  
							
							
							
						 
						
							2025-01-20 16:36:08 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Christopher Nielsen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								90d987b105 
								
							 
						 
						
							
							
								
								mmap: add include for cerrno ( #11296 )  
							
							... 
							
							
							
							ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-20 16:02:43 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Michael Podvitskiy 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4251edd6f 
								
							 
						 
						
							
							
								
								cmake: fix shell command quoting in build-info script ( #11309 )  
							
							
							
						 
						
							2025-01-20 16:02:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec7f3ac9ab 
								
							 
						 
						
							
							
								
								llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
							
							... 
							
							
							
							* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
							
						 
						
							2025-01-20 14:35:07 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef6dada60c 
								
							 
						 
						
							
							
								
								cont : fix whitespaces ( #11305 )  
							
							
							
						 
						
							2025-01-20 09:29:32 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kyle Bruene 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae3c1db2f9 
								
							 
						 
						
							
							
								
								llama : re-add LLM_ARCH_PHIMOE ( #11305 )  
							
							... 
							
							
							
							Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor. 
							
						 
						
							2025-01-20 09:21:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								92bc493917 
								
							 
						 
						
							
							
								
								tests : increase timeout when sanitizers are enabled ( #11300 )  
							
							... 
							
							
							
							* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT 
							
						 
						
							2025-01-19 20:22:30 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9daaffe02 
								
							 
						 
						
							
							
								
								simple-chat : fix BOS being added to each message ( #11278 )  
							
							
							
						 
						
							2025-01-19 18:12:09 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Nicolò Scipione 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99487b57d4 
								
							 
						 
						
							
							
								
								SYCL: Introducing memory host pool ( #11251 )  
							
							... 
							
							
							
							* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com> 
							
						 
						
							2025-01-19 21:33:34 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a1649cc13f 
								
							 
						 
						
							
							
								
								Adding linenoise.cpp to llama-run ( #11252 )  
							
							... 
							
							
							
							This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp 
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-18 14:42:31 +00:00