ebraminio 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								437e05f714 
								
							 
						 
						
							
							
								
								server : (UI) Support for RTL text as models input or output ( #11208 )  
							
							
							
						 
						
							2025-01-13 14:46:39 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ca001f6656 
								
							 
						 
						
							
							
								
								contrib : add naming guidelines (cont) ( #11177 )  
							
							
							
						 
						
							2025-01-13 15:08:44 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00b4c3da62 
								
							 
						 
						
							
							
								
								common : support tag-based --hf-repo like on ollama ( #11195 )  
							
							... 
							
							
							
							* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn 
							
						 
						
							2025-01-13 13:56:23 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7426a26b24 
								
							 
						 
						
							
							
								
								contrib : add naming guidelines ( #11177 )  
							
							... 
							
							
							
							* contrib : add naming guidelines
* contrib : expand naming guidelines [no ci]
* contrib : cont [no ci]
* contrib : add `_t` suffix guideline [no ci]
* contrib : cont [no ci]
* minor [no ci]
* contrib : move coding guidelines to correct section [no ci]
* contrib : minor reword coding guidelines [no ci]
* contrib : add TODO for preprocessor directives [no ci]
* contrib : expand [no ci]
* minor [no ci]
* contrib : clarify `_context` suffix usage [no ci]
* contrib : filename guidelines [no ci]
* contrib : fix notes [no ci] 
							
						 
						
							2025-01-13 14:46:36 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8f70fc3d1b 
								
							 
						 
						
							
							
								
								llama : remove 'd' from bad special token log ( #11212 )  
							
							... 
							
							
							
							This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
    'tokenizer.ggml.image_token_id' = 128256d, using default id -1
``` 
							
						 
						
							2025-01-13 13:38:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1244cdcf14 
								
							 
						 
						
							
							
								
								ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL ( #11211 )  
							
							... 
							
							
							
							Build fails when using HIP and GGML_BACKEND_DL:
```
/usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status
```
This patch fixes this. 
							
						 
						
							2025-01-13 13:31:41 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								924518e2e5 
								
							 
						 
						
							
							
								
								Reset color before we exit ( #11205 )  
							
							... 
							
							
							
							We don't want colors to leak post termination of llama-run.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-12 18:23:10 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a483999a6 
								
							 
						 
						
							
							
								
								llama : fix chat template gguf key ( #11201 )  
							
							
							
						 
						
							2025-01-12 13:45:14 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08f10f69c3 
								
							 
						 
						
							
							
								
								llama : remove notion of CLS token ( #11064 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-12 12:15:53 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afa8a9ec9b 
								
							 
						 
						
							
							
								
								llama : add llama_vocab, functions -> methods, naming ( #11110 )  
							
							... 
							
							
							
							* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2025-01-12 11:32:42 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Vinesh Janarthanan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c05e8c9934 
								
							 
						 
						
							
							
								
								gguf-py: fixed local detection of gguf package ( #11180 )  
							
							... 
							
							
							
							* updated path to gguf package for non-installed setups
* added reader.py to readme
* Bumped gguf version to 0.15.0 
							
						 
						
							2025-01-11 11:42:31 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2739a71e4b 
								
							 
						 
						
							
							
								
								convert : sort print supported models [no ci] ( #11179 )  
							
							... 
							
							
							
							This commit sorts the list of supported models when printing them out.
The motivation for this change is to make it easier to find a specific
model in the list of supported models. For example:
```console
$ ./convert_hf_to_gguf.py --print-supported-models
Supported models:
- ArcticForCausalLM
- BaiChuanForCausalLM
- BaichuanForCausalLM
- BertForMaskedLM
- BertModel
- BitnetForCausalLM
- BloomForCausalLM
- BloomModel
- CamembertModel
- ChameleonForCausalLM
- ChameleonForConditionalGeneration
- ChatGLMForConditionalGeneration
- ChatGLMModel
- CodeShellForCausalLM
- Cohere2ForCausalLM
- CohereForCausalLM
- DbrxForCausalLM
- DeciLMForCausalLM
- DeepseekForCausalLM
- DeepseekV2ForCausalLM
- DeepseekV3ForCausalLM
- ExaoneForCausalLM
- FalconForCausalLM
- FalconMambaForCausalLM
- GPT2LMHeadModel
- GPTBigCodeForCausalLM
- GPTNeoXForCausalLM
- GPTRefactForCausalLM
- Gemma2ForCausalLM
- GemmaForCausalLM
- GraniteForCausalLM
- GraniteMoeForCausalLM
- GrokForCausalLM
- InternLM2ForCausalLM
- JAISLMHeadModel
- JinaBertForMaskedLM
- JinaBertModel
- LLaMAForCausalLM
- LlamaForCausalLM
- LlavaStableLMEpochForCausalLM
- MPTForCausalLM
- MT5ForConditionalGeneration
- MambaForCausalLM
- MambaLMHeadModel
- MiniCPM3ForCausalLM
- MiniCPMForCausalLM
- MistralForCausalLM
- MixtralForCausalLM
- NemotronForCausalLM
- NomicBertModel
- OLMoForCausalLM
- Olmo2ForCausalLM
- OlmoForCausalLM
- OlmoeForCausalLM
- OpenELMForCausalLM
- OrionForCausalLM
- Phi3ForCausalLM
- PhiForCausalLM
- PhiMoEForCausalLM
- PlamoForCausalLM
- QWenLMHeadModel
- Qwen2ForCausalLM
- Qwen2MoeForCausalLM
- Qwen2VLForConditionalGeneration
- RWForCausalLM
- RWKV6Qwen2ForCausalLM
- RobertaModel
- Rwkv6ForCausalLM
- StableLMEpochForCausalLM
- StableLmForCausalLM
- Starcoder2ForCausalLM
- T5EncoderModel
- T5ForConditionalGeneration
- T5WithLMHeadModel
- UMT5ForConditionalGeneration
- WavTokenizerDec
- XLMRobertaForSequenceClassification
- XLMRobertaModel
- XverseForCausalLM
``` 
							
						 
						
							2025-01-11 05:50:33 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba8a1f9c5b 
								
							 
						 
						
							
							
								
								examples : add README.md to tts example [no ci] ( #11155 )  
							
							... 
							
							
							
							* examples : add README.md to tts example [no ci]
* squash! examples : add README.md to tts example [no ci]
Fix heading to be consistent with other examples, and add a quickstart
section to README.md.
* squash! examples : add README.md to tts example [no ci]
Fix spelling mistake. 
							
						 
						
							2025-01-10 13:16:16 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff3fcabc72 
								
							 
						 
						
							
							
								
								convert : add --print-supported-models option ( #11172 )  
							
							... 
							
							
							
							* convert : add --print-supported-models option
This commit adds a new option to the convert_hf_to_gguf.py script to
print the supported models.
The motivation for this is that it can be useful to know which models
are supported by the script without having to look at the code.
Example usage:
```console
$ ./convert_hf_to_gguf.py --print-supported-models
Supported models:
- GPTNeoXForCausalLM
- BloomForCausalLM
- BloomModel
- MPTForCausalLM
- OrionForCausalLM
- BaichuanForCausalLM
- BaiChuanForCausalLM
- XverseForCausalLM
- FalconForCausalLM
- RWForCausalLM
- GPTBigCodeForCausalLM
- GPTRefactForCausalLM
- StableLmForCausalLM
- StableLMEpochForCausalLM
- LlavaStableLMEpochForCausalLM
- LLaMAForCausalLM
- LlamaForCausalLM
- MistralForCausalLM
- MixtralForCausalLM
- DeciLMForCausalLM
- BitnetForCausalLM
- GrokForCausalLM
- DbrxForCausalLM
- MiniCPMForCausalLM
- MiniCPM3ForCausalLM
- QWenLMHeadModel
- Qwen2ForCausalLM
- Qwen2VLForConditionalGeneration
- WavTokenizerDec
- Qwen2MoeForCausalLM
- GPT2LMHeadModel
- PhiForCausalLM
- Phi3ForCausalLM
- PhiMoEForCausalLM
- PlamoForCausalLM
- CodeShellForCausalLM
- InternLM2ForCausalLM
- BertModel
- BertForMaskedLM
- CamembertModel
- RobertaModel
- NomicBertModel
- XLMRobertaModel
- XLMRobertaForSequenceClassification
- GemmaForCausalLM
- Gemma2ForCausalLM
- Starcoder2ForCausalLM
- Rwkv6ForCausalLM
- RWKV6Qwen2ForCausalLM
- MambaForCausalLM
- MambaLMHeadModel
- FalconMambaForCausalLM
- CohereForCausalLM
- Cohere2ForCausalLM
- OLMoForCausalLM
- OlmoForCausalLM
- Olmo2ForCausalLM
- OlmoeForCausalLM
- JinaBertModel
- JinaBertForMaskedLM
- OpenELMForCausalLM
- ArcticForCausalLM
- DeepseekForCausalLM
- DeepseekV3ForCausalLM
- DeepseekV2ForCausalLM
- UMT5ForConditionalGeneration
- MT5ForConditionalGeneration
- T5ForConditionalGeneration
- T5WithLMHeadModel
- T5EncoderModel
- JAISLMHeadModel
- ChatGLMModel
- ChatGLMForConditionalGeneration
- NemotronForCausalLM
- ExaoneForCausalLM
- GraniteForCausalLM
- GraniteMoeForCausalLM
- ChameleonForCausalLM
- ChameleonForConditionalGeneration
```
* squash! convert : add --print-supported-models option
Fix flake8 error. 
							
						 
						
							2025-01-10 11:30:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									0cc4m 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c3f9d25706 
								
							 
						 
						
							
							
								
								Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error ( #11161 )  
							
							... 
							
							
							
							* Vulkan: Remove float16 use in shaders
* Fix validation error about subgroup_size_control extension 
							
						 
						
							2025-01-10 06:39:33 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee7136c6d1 
								
							 
						 
						
							
							
								
								llama: add support for QRWKV6 model architecture ( #11001 )  
							
							... 
							
							
							
							llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net> 
							
						 
						
							2025-01-10 09:58:08 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Akarshan Biswas 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c6860cc734 
								
							 
						 
						
							
							
								
								SYCL: Refactor ggml_sycl_compute_forward ( #11121 )  
							
							... 
							
							
							
							* SYCL: refactor ggml_sycl_compute_forward
* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy
* SYCL: add function name to noop debug
* SYCL: Some device info print refactoring and add details of XMX availability 
							
						 
						
							2025-01-10 08:13:03 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Tei Home 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1204f97270 
								
							 
						 
						
							
							
								
								doc: add cuda guide for fedora ( #11135 )  
							
							... 
							
							
							
							Since NVIDIA does not release CUDA for in-maintenance versions of Fedora, the process of setting up the CUDA toolkit on Fedora has become quite involved. This guide should help mere mortals install CUDA for development in a Fedora 39 toolbox environment, without affecting the host system. 
							
						 
						
							2025-01-09 11:32:06 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8eceb888d7 
								
							 
						 
						
							
							
								
								server : add tooltips to settings and themes btn ( #11154 )  
							
							... 
							
							
							
							* server : add tooltips to settings and themes btn
This commit adds tooltips to the settings and themes buttons in the
webui. The tooltip will be displayed below the actual buttons when
hovered over.
The motivation for this change is to clarify the purpose of the themes
button.
* squash! server : add tooltips to settings and themes btn
This commit adds a tooltip to the '...' button when a chat has been
started. The tooltip is "Chat options" which think could be a good
description as the dropdown contains options to delete or download the
current chat.
* rm tooltip for 3 dots button
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-09 11:28:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Pierrick Hymbert 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8feb4b01a 
								
							 
						 
						
							
							
								
								model: Add support for PhiMoE arch ( #11003 )  
							
							... 
							
							
							
							* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> 
							
						 
						
							2025-01-09 11:21:41 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be0e950c91 
								
							 
						 
						
							
							
								
								media : remove old img [no ci]  
							
							
							
						 
						
							2025-01-09 11:15:15 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9feae1c06 
								
							 
						 
						
							
							
								
								llama-chat : add phi 4 template ( #11148 )  
							
							
							
						 
						
							2025-01-09 10:07:33 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									hydai 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8d59d91171 
								
							 
						 
						
							
							
								
								fix: add missing msg in static_assert ( #11143 )  
							
							... 
							
							
							
							Signed-off-by: hydai <z54981220@gmail.com> 
							
						 
						
							2025-01-08 20:03:28 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Vinesh Janarthanan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8a1d9c25fa 
								
							 
						 
						
							
							
								
								gguf-py : move scripts directory ( #11116 )  
							
							... 
							
							
							
							* Moved scripts dir and fixed pyproject.toml
* updated readme
* fixed README urls
* bump pypi gguf to v0.14.0
* retrigger ci
* empty commit - trigger ci 
							
						 
						
							2025-01-08 20:54:58 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1bf839b1e8 
								
							 
						 
						
							
							
								
								Enhance user input handling for llama-run ( #11138 )  
							
							... 
							
							
							
							The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-08 18:47:05 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7cd13301c 
								
							 
						 
						
							
							
								
								ci : use actions from ggml-org ( #11140 )  
							
							
							
						 
						
							2025-01-08 16:09:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4d2b3d8804 
								
							 
						 
						
							
							
								
								lora : improve compat with mergekit-extract-lora ( #11131 )  
							
							... 
							
							
							
							* (wip) support mergekit-extracted lora
* support mergekit-extract-lora
* use lora->get_scale
* correct comment
* correct norm name & condition
* add some hints 
							
						 
						
							2025-01-08 15:59:53 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c07d437bbd 
								
							 
						 
						
							
							
								
								llama : avoid hardcoded QK_K ( #11061 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-08 16:19:36 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99a3755a3c 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2025-01-08 13:40:30 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c792dcf488 
								
							 
						 
						
							
							
								
								ggml : allow loading backend with env variable (ggml/1059)  
							
							... 
							
							
							
							ref: #1058  
							
						 
						
							2025-01-08 13:40:18 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80ccf5d725 
								
							 
						 
						
							
							
								
								ci : pin dependency to specific version ( #11137 )  
							
							... 
							
							
							
							* ci : pin dependency to specific version
* will this fix ec? 
							
						 
						
							2025-01-08 12:07:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3c1232c3f 
								
							 
						 
						
							
							
								
								arg : option to exclude arguments from specific examples ( #11136 )  
							
							... 
							
							
							
							* arg : option to exclude arguments from specific examples
ggml-ci
* readme : remove old args [no ci] 
							
						 
						
							2025-01-08 12:55:36 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									amritahs-ibm 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8cef75c743 
								
							 
						 
						
							
							
								
								llamafile : ppc64le MMA INT8 implementation ( #10912 )  
							
							... 
							
							
							
							This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.
This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.
The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> 
							
						 
						
							2025-01-08 12:54:19 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d52a69e4b 
								
							 
						 
						
							
							
								
								ci : fix cmake option ( #11125 )  
							
							
							
						 
						
							2025-01-08 11:29:34 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Mathieu Baudier 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								02f0430141 
								
							 
						 
						
							
							
								
								Disable GL_KHR_cooperative_matrix Vulkan extension if not available. ( #11117 )  
							
							... 
							
							
							
							* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.
* Perform Vulkan extensions checks in a more sensible order
* Remove unnecessary #ifdef directive 
							
						 
						
							2025-01-08 09:18:13 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									ag2s20150909 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bec2183f2c 
								
							 
						 
						
							
							
								
								fix: Vulkan shader gen binary path when Cross-compiling ( #11096 )  
							
							... 
							
							
							
							* fix: Vulkan shader gen binary path when cross compiling 
							
						 
						
							2025-01-08 09:17:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53ff6b9b9f 
								
							 
						 
						
							
							
								
								GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
							
							... 
							
							
							
							* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
							
						 
						
							2025-01-07 18:01:58 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								017cc5f446 
								
							 
						 
						
							
							
								
								ggml-backend : only offload from host buffers (fix) ( #11124 )  
							
							
							
						 
						
							2025-01-07 16:11:57 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3d50bc022 
								
							 
						 
						
							
							
								
								ggml-backend : only offload from host buffers ( #11120 )  
							
							
							
						 
						
							2025-01-07 12:38:05 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4dd490069 
								
							 
						 
						
							
							
								
								rpc : code cleanup ( #11107 )  
							
							... 
							
							
							
							Remove duplicated macros, use GGML_LOG_ERROR for errors 
							
						 
						
							2025-01-07 08:37:02 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Akarshan Biswas 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c0d6f790d0 
								
							 
						 
						
							
							
								
								SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 ( #11087 )  
							
							... 
							
							
							
							* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6
* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"
This reverts commit f62dc45f31 
							
						 
						
							2025-01-07 14:26:07 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eric Curtin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dc7cef9f37 
								
							 
						 
						
							
							
								
								llama-run : fix context size ( #11094 )  
							
							... 
							
							
							
							Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.
Signed-off-by: Eric Curtin <ecurtin@redhat.com> 
							
						 
						
							2025-01-06 23:45:28 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ecebbd292d 
								
							 
						 
						
							
							
								
								llama : remove unused headers ( #11109 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 17:52:35 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96be8c3264 
								
							 
						 
						
							
							
								
								github : add cmd line field to bug report ( #11090 )  
							
							... 
							
							
							
							* github : cmd line to bug report
* codeowners : (@ngxson) only watch dockerfile
* Apply suggestions from code review [no ci]
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* rm cmd in log output [no ci]
* rm 2 [no ci]
* no need backticks [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de> 
							
						 
						
							2025-01-06 16:34:49 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e6e7c75d94 
								
							 
						 
						
							
							
								
								server : fix extra BOS in infill endpoint ( #11106 )  
							
							... 
							
							
							
							* server : fix extra BOS in infill endpoing
ggml-ci
* server : update infill tests 
							
						 
						
							2025-01-06 15:36:08 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								09186fabbe 
								
							 
						 
						
							
							
								
								llama : remove check flash_attn with lora ( #11104 )  
							
							
							
						 
						
							2025-01-06 13:41:12 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Asghar Ghorbani 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96a1dc27c3 
								
							 
						 
						
							
							
								
								llama : prevent system info string accumulation across calls ( #11101 )  
							
							
							
						 
						
							2025-01-06 13:21:46 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6369f867a4 
								
							 
						 
						
							
							
								
								llama : rename missed batch params/vars to ubatch ( #10059 )  
							
							... 
							
							
							
							This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.
The motivation for this is that this should have been done as part of
Commit 19d900a756#9950 )") but for some reason I missed these functions in
that commit and only noticed them now (sorry). 
							
						 
						
							2025-01-06 11:28:17 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47182dd03f 
								
							 
						 
						
							
							
								
								llama : update llama_model API names ( #11063 )  
							
							... 
							
							
							
							* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
							
						 
						
							2025-01-06 10:55:18 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e6e7a6bc2 
								
							 
						 
						
							
							
								
								tokenize : escape the prompt ( #11058 )  
							
							... 
							
							
							
							* tokenize : escape the prompt
* tokenize : update help 
							
						 
						
							2025-01-06 10:54:25 +02:00