Olivier Chafik 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6171c9d258 
								
							 
						 
						
							
							
								
								Add Jinja template support ( #11016 )  
							
							... 
							
							
							
							* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2025-01-21 13:18:51 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Christopher Nielsen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								90d987b105 
								
							 
						 
						
							
							
								
								mmap: add include for cerrno ( #11296 )  
							
							... 
							
							
							
							ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
							
						 
						
							2025-01-20 16:02:43 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec7f3ac9ab 
								
							 
						 
						
							
							
								
								llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
							
							... 
							
							
							
							* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
							
						 
						
							2025-01-20 14:35:07 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef6dada60c 
								
							 
						 
						
							
							
								
								cont : fix whitespaces ( #11305 )  
							
							
							
						 
						
							2025-01-20 09:29:32 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Kyle Bruene 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae3c1db2f9 
								
							 
						 
						
							
							
								
								llama : re-add LLM_ARCH_PHIMOE ( #11305 )  
							
							... 
							
							
							
							Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor. 
							
						 
						
							2025-01-20 09:21:01 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4dd34ff831 
								
							 
						 
						
							
							
								
								cmake : add sanitizer flags for llama.cpp ( #11279 )  
							
							... 
							
							
							
							* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de> 
							
						 
						
							2025-01-18 16:18:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Radoslav Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667d72846c 
								
							 
						 
						
							
							
								
								rpc : early register backend devices ( #11262 )  
							
							... 
							
							
							
							Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
							
						 
						
							2025-01-17 10:57:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a133566d34 
								
							 
						 
						
							
							
								
								vocab : fix double-eos check ( #11273 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-17 09:28:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								681149ced2 
								
							 
						 
						
							
							
								
								llama : add llama_model_load_from_splits ( #11255 )  
							
							... 
							
							
							
							* llama : add `llama_model_load_from_splits`
* update 
							
						 
						
							2025-01-16 13:54:08 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								432df2d5f9 
								
							 
						 
						
							
							
								
								RoPE: fix back, CUDA support for back + noncont. ( #11240 )  
							
							... 
							
							
							
							* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci] 
							
						 
						
							2025-01-15 12:51:37 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bbf3e55e35 
								
							 
						 
						
							
							
								
								vocab : add dummy tokens for "no_vocab" type ( #11231 )  
							
							... 
							
							
							
							* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci] 
							
						 
						
							2025-01-14 11:54:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8f70fc3d1b 
								
							 
						 
						
							
							
								
								llama : remove 'd' from bad special token log ( #11212 )  
							
							... 
							
							
							
							This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
    'tokenizer.ggml.image_token_id' = 128256d, using default id -1
``` 
							
						 
						
							2025-01-13 13:38:20 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a483999a6 
								
							 
						 
						
							
							
								
								llama : fix chat template gguf key ( #11201 )  
							
							
							
						 
						
							2025-01-12 13:45:14 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08f10f69c3 
								
							 
						 
						
							
							
								
								llama : remove notion of CLS token ( #11064 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-12 12:15:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afa8a9ec9b 
								
							 
						 
						
							
							
								
								llama : add llama_vocab, functions -> methods, naming ( #11110 )  
							
							... 
							
							
							
							* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2025-01-12 11:32:42 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee7136c6d1 
								
							 
						 
						
							
							
								
								llama: add support for QRWKV6 model architecture ( #11001 )  
							
							... 
							
							
							
							llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net> 
							
						 
						
							2025-01-10 09:58:08 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Pierrick Hymbert 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8feb4b01a 
								
							 
						 
						
							
							
								
								model: Add support for PhiMoE arch ( #11003 )  
							
							... 
							
							
							
							* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> 
							
						 
						
							2025-01-09 11:21:41 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d9feae1c06 
								
							 
						 
						
							
							
								
								llama-chat : add phi 4 template ( #11148 )  
							
							
							
						 
						
							2025-01-09 10:07:33 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4d2b3d8804 
								
							 
						 
						
							
							
								
								lora : improve compat with mergekit-extract-lora ( #11131 )  
							
							... 
							
							
							
							* (wip) support mergekit-extracted lora
* support mergekit-extract-lora
* use lora->get_scale
* correct comment
* correct norm name & condition
* add some hints 
							
						 
						
							2025-01-08 15:59:53 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c07d437bbd 
								
							 
						 
						
							
							
								
								llama : avoid hardcoded QK_K ( #11061 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-08 16:19:36 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53ff6b9b9f 
								
							 
						 
						
							
							
								
								GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
							
							... 
							
							
							
							* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
							
						 
						
							2025-01-07 18:01:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ecebbd292d 
								
							 
						 
						
							
							
								
								llama : remove unused headers ( #11109 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 17:52:35 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								09186fabbe 
								
							 
						 
						
							
							
								
								llama : remove check flash_attn with lora ( #11104 )  
							
							
							
						 
						
							2025-01-06 13:41:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Asghar Ghorbani 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96a1dc27c3 
								
							 
						 
						
							
							
								
								llama : prevent system info string accumulation across calls ( #11101 )  
							
							
							
						 
						
							2025-01-06 13:21:46 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniel Bevenius 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6369f867a4 
								
							 
						 
						
							
							
								
								llama : rename missed batch params/vars to ubatch ( #10059 )  
							
							... 
							
							
							
							This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.
The motivation for this is that this should have been done as part of
Commit 19d900a756#9950 )") but for some reason I missed these functions in
that commit and only noticed them now (sorry). 
							
						 
						
							2025-01-06 11:28:17 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47182dd03f 
								
							 
						 
						
							
							
								
								llama : update llama_model API names ( #11063 )  
							
							... 
							
							
							
							* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
							
						 
						
							2025-01-06 10:55:18 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae2f606bb5 
								
							 
						 
						
							
							
								
								mmap : fix fileno macro clash ( #11076 )  
							
							... 
							
							
							
							* mmap : fix fileno macro clash
ggml-ci
* cont
ggml-ci 
							
						 
						
							2025-01-06 10:52:38 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								727368c60f 
								
							 
						 
						
							
							
								
								llama : use LLAMA_TOKEN_NULL ( #11062 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 10:52:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5047dd3546 
								
							 
						 
						
							
							
								
								llama : use _impl suffix instead of _internal ( #11060 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2025-01-06 10:52:01 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									fairydreaming 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9394bbd484 
								
							 
						 
						
							
							
								
								llama : Add support for DeepSeek V3 ( #11049 )  
							
							... 
							
							
							
							* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> 
							
						 
						
							2025-01-04 21:06:11 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									DAN™ 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46be942214 
								
							 
						 
						
							
							
								
								llama : add support for the cohere2 model architecture ( #10900 )  
							
							
							
						 
						
							2025-01-04 16:33:31 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f66f582927 
								
							 
						 
						
							
							
								
								llama : refactor src/llama.cpp ( #10902 )  
							
							... 
							
							
							
							* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
							
						 
						
							2025-01-03 10:18:53 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								30caac3a68 
								
							 
						 
						
							
							
								
								llama : the WPM vocabs use the CLS token as BOS ( #10930 )  
							
							... 
							
							
							
							* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment 
							
						 
						
							2024-12-24 09:44:20 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Yun Dou 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b92a14a841 
								
							 
						 
						
							
							
								
								llama : support InfiniAI Megrez 3b ( #10893 )  
							
							... 
							
							
							
							* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez 
							
						 
						
							2024-12-23 01:35:44 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									ymcki 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f0c9e034b 
								
							 
						 
						
							
							
								
								llama : support for Llama-3_1-Nemotron-51B ( #10669 )  
							
							... 
							
							
							
							* conflict resolution
* move comments after bracket to its own line 
							
						 
						
							2024-12-23 01:22:33 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Billel Mokeddem 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ae33a616f 
								
							 
						 
						
							
							
								
								llama : add Falcon3 support ( #10883 )  
							
							... 
							
							
							
							* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring 
							
						 
						
							2024-12-23 00:09:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5cab3e4aaa 
								
							 
						 
						
							
							
								
								llama : minor grammar refactor ( #10897 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-12-19 17:42:13 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Sukriti Sharma 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2fffc52b50 
								
							 
						 
						
							
							
								
								llama : fix Roberta embeddings ( #10856 )  
							
							... 
							
							
							
							* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> 
							
						 
						
							2024-12-19 15:04:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									fairydreaming 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7585edbdeb 
								
							 
						 
						
							
							
								
								convert : Add support for Microsoft Phi-4 model  ( #10817 )  
							
							... 
							
							
							
							* convert : use GPT2 vocab for Phi-4 model
* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models
* llama : do not use sliding window attention mask for Phi-4 model
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> 
							
						 
						
							2024-12-19 10:37:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0bf2d10c55 
								
							 
						 
						
							
							
								
								tts : add OuteTTS support ( #10784 )  
							
							... 
							
							
							
							* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
							
						 
						
							2024-12-18 19:27:21 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4da69d1abd 
								
							 
						 
						
							
							
								
								Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )  
							
							... 
							
							
							
							This reverts commit 382bc7f2e8 
							
						 
						
							2024-12-18 01:36:46 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									DAN™ 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d62b532c52 
								
							 
						 
						
							
							
								
								Use model->gguf_kv for loading the template instead of using the C API. ( #10868 )  
							
							... 
							
							
							
							* Bump model_template to 16384 bytes to support larger chat templates.
* Use `model->gguf_kv` for efficiency. 
							
						 
						
							2024-12-17 23:24:22 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Billel Mokeddem 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								382bc7f2e8 
								
							 
						 
						
							
							
								
								llama : add Falcon3 support ( #10864 )  
							
							
							
						 
						
							2024-12-17 17:24:56 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08ea539df2 
								
							 
						 
						
							
							
								
								unicode : improve naming style ( #10838 )  
							
							... 
							
							
							
							* unicode : improve naming style
ggml-ci
* cont [no ci] 
							
						 
						
							2024-12-16 12:31:45 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								644fd71b44 
								
							 
						 
						
							
							
								
								sampling : refactor + optimize penalties sampler ( #10803 )  
							
							... 
							
							
							
							* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2024-12-16 12:31:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Valentin Mamedov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a0974156f3 
								
							 
						 
						
							
							
								
								llama : add Deepseek MoE v1 & GigaChat models ( #10827 )  
							
							... 
							
							
							
							* Add deepseek v1 arch & gigachat template
* improve template code
* add readme
* delete comments
* remove comment
* fix format
* lint llama.cpp
* fix order of deepseek and deepseek2, move gigachat temlate to the end of func
* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need
* remove comments
* move deepseek above deepseek2
* change placement of gigachat chat template 
							
						 
						
							2024-12-15 19:02:46 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									HimariO 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba1cb19cdd 
								
							 
						 
						
							
							
								
								llama : add Qwen2VL support + multimodal RoPE ( #10361 )  
							
							... 
							
							
							
							* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-12-14 14:43:46 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb13ef85a4 
								
							 
						 
						
							
							
								
								remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ( #10797 )  
							
							... 
							
							
							
							other windows build fixes 
							
						 
						
							2024-12-12 19:02:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Djip007 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								19d8762ab6 
								
							 
						 
						
							
							
								
								ggml : refactor online repacking ( #10446 )  
							
							... 
							
							
							
							* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack
- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"
- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-12-07 14:37:50 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Riccardo Orlando 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6fe6247831 
								
							 
						 
						
							
							
								
								llama : add Minerva 7B model support ( #10673 )  
							
							... 
							
							
							
							* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py 
							
						 
						
							2024-12-05 20:30:59 +02:00