Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								56e26a7f30 
								
							 
						 
						
							
							
								
								ci : change ubuntu build from latest to 20.04  
							
							
							
						 
						
							2025-01-24 15:59:09 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								194358e3b7 
								
							 
						 
						
							
							
								
								ci : restore the original HIP commands  
							
							
							
						 
						
							2025-01-24 15:41:52 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								50455ded31 
								
							 
						 
						
							
							
								
								ci : fix HIP cmake compiler options to be on first line  
							
							
							
						 
						
							2025-01-24 15:23:44 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								564353c9a3 
								
							 
						 
						
							
							
								
								Revert "TMP : push artifacts"  
							
							... 
							
							
							
							This reverts commit 4decf2c4df 
							
						 
						
							2025-01-24 15:22:36 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4decf2c4df 
								
							 
						 
						
							
							
								
								TMP : push artifacts  
							
							
							
						 
						
							2025-01-24 14:54:24 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3a35bfe1f7 
								
							 
						 
						
							
							
								
								cmake : put libs in /bin  
							
							
							
						 
						
							2025-01-24 14:42:46 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff4cb6ef4c 
								
							 
						 
						
							
							
								
								release : pack /lib and /include in the packages  
							
							
							
						 
						
							2025-01-24 13:28:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								01f37edf1a 
								
							 
						 
						
							
							
								
								Update llama-run README.md ( #11386 )  
							
							... 
							
							
							
							For consistency
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-24 09:39:24 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									stduhpf 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c07e87f38b 
								
							 
						 
						
							
							
								
								server : (webui) put DeepSeek R1 CoT in a collapsible <details> element ( #11364 )  
							
							... 
							
							
							
							* webui : put DeepSeek R1 CoT in a collapsible <details> element
* webui: refactor split
* webui: don't use regex to split cot and response
* webui: format+qol
* webui: no loading icon if the model isn't generating
* ui fix, add configs
* add jsdoc types
* only filter </think> for assistant msg
* build
* update build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-24 09:02:38 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								564804b79b 
								
							 
						 
						
							
							
								
								tests: fix some mul_mat test gaps ( #11375 )  
							
							... 
							
							
							
							Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types. 
							
						 
						
							2025-01-23 14:51:24 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05f63cc9ee 
								
							 
						 
						
							
							
								
								Update documentation ( #11373 )  
							
							... 
							
							
							
							To show -n, -ngl, --ngl is acceptable.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 20:04:31 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7fb43cd0b 
								
							 
						 
						
							
							
								
								Add -ngl ( #11372 )  
							
							... 
							
							
							
							Most other llama.cpp cli tools accept -ngl with a single dash.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 16:16:18 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5845661640 
								
							 
						 
						
							
							
								
								server : add more clean up when cancel_tasks is called ( #11340 )  
							
							... 
							
							
							
							* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if 
							
						 
						
							2025-01-23 13:56:05 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f211d1dc10 
								
							 
						 
						
							
							
								
								Treat hf.co/ prefix the same as hf:// ( #11350 )  
							
							... 
							
							
							
							ollama uses hf.co/ to specify huggingface prefix, like RamaLama
uses hf://
Treat them similarly.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-23 10:38:20 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									amd-dwang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								955a6c2d91 
								
							 
						 
						
							
							
								
								Vulkan-run-test: fix mmq_wg_denoms ( #11343 )  
							
							... 
							
							
							
							There should be a copy-and-paste error here.
*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms. 
							
						 
						
							2025-01-23 08:14:28 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1971adf55e 
								
							 
						 
						
							
							
								
								vulkan: sort shaders for more deterministic binary ( #11315 )  
							
							... 
							
							
							
							Fixes  #11306 . 
						
							2025-01-23 08:07:50 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5245729e33 
								
							 
						 
						
							
							
								
								vulkan: fix diag_mask_inf ( #11323 )  
							
							... 
							
							
							
							With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline. 
							
						 
						
							2025-01-23 08:01:17 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6152129d05 
								
							 
						 
						
							
							
								
								main : update README documentation for batch size ( #11353 )  
							
							... 
							
							
							
							* main : update README documentation for batch size
* fix formatting
* minor 
							
						 
						
							2025-01-22 19:22:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								16d3df7ab0 
								
							 
						 
						
							
							
								
								readme : add plugin links ( #11355 )  
							
							
							
						 
						
							2025-01-22 19:44:26 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								12c2bdf2de 
								
							 
						 
						
							
							
								
								server : fix draft context not being released ( #11354 )  
							
							
							
						 
						
							2025-01-22 17:44:40 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c64d2becb1 
								
							 
						 
						
							
							
								
								minja: sync at 0f5f7f2b37 ( #11352 )  
							
							
							
						 
						
							2025-01-22 16:16:27 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jiří Podivín 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96f4053934 
								
							 
						 
						
							
							
								
								Adding logprobs to /v1/completions ( #11344 )  
							
							... 
							
							
							
							Signed-off-by: Jiri Podivin <jpodivin@redhat.com> 
							
						 
						
							2025-01-22 12:51:32 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a94f3b2727 
								
							 
						 
						
							
							
								
								common: utils to split / join / repeat strings (from json converter) (#11342 )  
							
							... 
							
							
							
							* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp 
							
						 
						
							2025-01-22 09:51:44 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									tc-mb 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e3357fd77 
								
							 
						 
						
							
							
								
								llava : support Minicpm-omni ( #11289 )  
							
							... 
							
							
							
							* init
* add readme
* update readme
* no use make
* update readme
* update fix code
* fix editorconfig-checker
* no change convert py
* use clip_image_u8_free 
							
						 
						
							2025-01-22 09:35:48 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6171c9d258 
								
							 
						 
						
							
							
								
								Add Jinja template support ( #11016 )  
							
							... 
							
							
							
							* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-21 13:18:51 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e28245f35f 
								
							 
						 
						
							
							
								
								export-lora : fix tok_embd tensor ( #11330 )  
							
							
							
						 
						
							2025-01-21 14:07:12 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6da5bec81c 
								
							 
						 
						
							
							
								
								rpc : better caching of the base buffer pointer ( #11331 )  
							
							... 
							
							
							
							There is no need to use map, just store the base pointer in the buffer
context. 
							
						 
						
							2025-01-21 15:06:41 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2e2f8f093c 
								
							 
						 
						
							
							
								
								linenoise.cpp refactoring ( #11301 )  
							
							... 
							
							
							
							More RAII mainly
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-21 09:32:35 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2139667ec4 
								
							 
						 
						
							
							
								
								metal : fix out-of-bounds write ( #11314 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-21 08:48:13 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80d0d6b4b7 
								
							 
						 
						
							
							
								
								common : add -hfd option for the draft model ( #11318 )  
							
							... 
							
							
							
							* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes 
							
						 
						
							2025-01-20 22:29:43 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aea8ddd516 
								
							 
						 
						
							
							
								
								vulkan: fix coopmat2 validation failures ( #11284 )  
							
							... 
							
							
							
							mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.
coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3. 
							
						 
						
							2025-01-20 10:38:32 -06:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f7add1cde 
								
							 
						 
						
							
							
								
								examples : fix add_special conditions ( #11311 )  
							
							
							
						 
						
							2025-01-20 16:36:08 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Christopher Nielsen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								90d987b105 
								
							 
						 
						
							
							
								
								mmap: add include for cerrno ( #11296 )  
							
							... 
							
							
							
							ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-20 16:02:43 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Michael Podvitskiy 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4251edd6f 
								
							 
						 
						
							
							
								
								cmake: fix shell command quoting in build-info script ( #11309 )  
							
							
							
						 
						
							2025-01-20 16:02:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec7f3ac9ab 
								
							 
						 
						
							
							
								
								llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
							
							... 
							
							
							
							* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
							
						 
						
							2025-01-20 14:35:07 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef6dada60c 
								
							 
						 
						
							
							
								
								cont : fix whitespaces ( #11305 )  
							
							
							
						 
						
							2025-01-20 09:29:32 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Kyle Bruene 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae3c1db2f9 
								
							 
						 
						
							
							
								
								llama : re-add LLM_ARCH_PHIMOE ( #11305 )  
							
							... 
							
							
							
							Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor. 
							
						 
						
							2025-01-20 09:21:01 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								92bc493917 
								
							 
						 
						
							
							
								
								tests : increase timeout when sanitizers are enabled ( #11300 )  
							
							... 
							
							
							
							* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT 
							
						 
						
							2025-01-19 20:22:30 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9daaffe02 
								
							 
						 
						
							
							
								
								simple-chat : fix BOS being added to each message ( #11278 )  
							
							
							
						 
						
							2025-01-19 18:12:09 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Nicolò Scipione 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99487b57d4 
								
							 
						 
						
							
							
								
								SYCL: Introducing memory host pool ( #11251 )  
							
							... 
							
							
							
							* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com> 
							
						 
						
							2025-01-19 21:33:34 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a1649cc13f 
								
							 
						 
						
							
							
								
								Adding linenoise.cpp to llama-run ( #11252 )  
							
							... 
							
							
							
							This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp 
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-18 14:42:31 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4dd34ff831 
								
							 
						 
						
							
							
								
								cmake : add sanitizer flags for llama.cpp ( #11279 )  
							
							... 
							
							
							
							* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de> 
							
						 
						
							2025-01-18 16:18:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f30f099228 
								
							 
						 
						
							
							
								
								server : implement cancellable request ( #11285 )  
							
							... 
							
							
							
							* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow 
							
						 
						
							2025-01-18 14:12:05 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f26c874179 
								
							 
						 
						
							
							
								
								scripts : restore hf.sh ( #11288 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-18 13:18:32 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									LostRuins Concedo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6390a998bf 
								
							 
						 
						
							
							
								
								tts : add guide tokens support ( #11186 )  
							
							... 
							
							
							
							* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start 
							
						 
						
							2025-01-18 12:20:57 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								44e18ef939 
								
							 
						 
						
							
							
								
								vulkan: fix coopmat2 flash attention for non-contiguous inputs ( #11281 )  
							
							... 
							
							
							
							Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.
Add noncontiguous FA tests in test-backend-ops.
Fixes  #11268 . 
							
						 
						
							2025-01-18 09:26:50 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									codezjx 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3edfa7d375 
								
							 
						 
						
							
							
								
								llama.android: add field formatChat to control whether to parse special tokens when send message ( #11270 )  
							
							
							
						 
						
							2025-01-17 14:57:56 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667d72846c 
								
							 
						 
						
							
							
								
								rpc : early register backend devices ( #11262 )  
							
							... 
							
							
							
							Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
							
						 
						
							2025-01-17 10:57:09 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a133566d34 
								
							 
						 
						
							
							
								
								vocab : fix double-eos check ( #11273 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-17 09:28:00 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									David Renshaw 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								960ec65273 
								
							 
						 
						
							
							
								
								llama : fix deprecation message: vocabable -> vocab ( #11269 )  
							
							
							
						 
						
							2025-01-17 08:12:01 +01:00