YiYing He 
								
							 
						 
						
							
							
							
							
								
							
							
								8bb33d3285 
								
							 
						 
						
							
							
								
								ggml: apply the unpad operator patch  
							
							... 
							
							
							
							Signed-off-by: YiYing He <yiying@secondstate.io> 
							
						 
						
							2025-02-04 09:38:06 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									issixx 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2e518e9b4 
								
							 
						 
						
							
							
								
								ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)  
							
							... 
							
							
							
							some threads kept looping and failed to terminate properly after an abort during CPU execution.
Co-authored-by: issi <issi@gmail.com> 
							
						 
						
							2025-01-29 11:24:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8137b4bb2b 
								
							 
						 
						
							
							
								
								CPU/CUDA: fix (GQA) mul mat back, add CUDA support ( #11380 )  
							
							
							
						 
						
							2025-01-24 12:38:31 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bd38ddea01 
								
							 
						 
						
							
							
								
								vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl ( #11166 )  
							
							... 
							
							
							
							* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl
Shaders are based on cpy.cu.
* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32
* ggml: copy q->f32 assumes some contiguity in the destination 
							
						 
						
							2025-01-16 22:47:10 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9c8dcefe17 
								
							 
						 
						
							
							
								
								CUDA: backwards pass for misc. ops, add tests ( #11257 )  
							
							... 
							
							
							
							* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers 
							
						 
						
							2025-01-16 16:43:38 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								432df2d5f9 
								
							 
						 
						
							
							
								
								RoPE: fix back, CUDA support for back + noncont. ( #11240 )  
							
							... 
							
							
							
							* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci] 
							
						 
						
							2025-01-15 12:51:37 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Molly Sophia 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee7136c6d1 
								
							 
						 
						
							
							
								
								llama: add support for QRWKV6 model architecture ( #11001 )  
							
							... 
							
							
							
							llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net> 
							
						 
						
							2025-01-10 09:58:08 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Djip007 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2cd43f4900 
								
							 
						 
						
							
							
								
								ggml : more perfo with llamafile tinyblas on x86_64 ( #10714 )  
							
							... 
							
							
							
							* more perfo with llamafile tinyblas on x86_64.
- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71  )
- reduce memory bandwidth
simple tinyblas dispache and more cache freindly
* tinyblas dynamic dispaching
* sgemm: add M blocs.
* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2
* remove not stable test 
							
						 
						
							2024-12-24 18:54:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								32d6ee6385 
								
							 
						 
						
							
							
								
								ggml : fix const usage in SSE path ( #10962 )  
							
							
							
						 
						
							2024-12-23 20:25:52 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									HimariO 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba1cb19cdd 
								
							 
						 
						
							
							
								
								llama : add Qwen2VL support + multimodal RoPE ( #10361 )  
							
							... 
							
							
							
							* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-12-14 14:43:46 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Karol Kontny 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d583cd03f6 
								
							 
						 
						
							
							
								
								ggml : Fix compilation issues on ARM platform when building without fp16 ( #10811 )  
							
							
							
						 
						
							2024-12-13 01:04:19 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb13ef85a4 
								
							 
						 
						
							
							
								
								remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ( #10797 )  
							
							... 
							
							
							
							other windows build fixes 
							
						 
						
							2024-12-12 19:02:49 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Djip007 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								19d8762ab6 
								
							 
						 
						
							
							
								
								ggml : refactor online repacking ( #10446 )  
							
							... 
							
							
							
							* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack
- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"
- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-12-07 14:37:50 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									PAB 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a8cbab201d 
								
							 
						 
						
							
							
								
								ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037)  
							
							... 
							
							
							
							* implemented cpu kernel
* add i32 test cases in test-backend-ops
* typedef `ggml_metal_kargs_set`
* implemented `kernel_set`
* memcpy 
							
						 
						
							2024-12-05 13:27:33 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									PAB 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2082d93a8 
								
							 
						 
						
							
							
								
								ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034)  
							
							... 
							
							
							
							* ggml_pad_reflect_1d defined in header
* implemented on CPU
* called the forward pass
* impl Metal kernel
* added Metal kernel
* added OP_PAD_REFLECT_1D in test-backend-ops.cpp
* add test-pad-reflect-1d test case
* test case support multiple backend 
							
						 
						
							2024-12-05 13:27:31 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								59f4db1088 
								
							 
						 
						
							
							
								
								ggml : add predefined list of CPU backend variants to build ( #10626 )  
							
							... 
							
							
							
							* ggml : add predefined list of CPU backend variants to build
* update CPU dockerfiles 
							
						 
						
							2024-12-04 14:45:40 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2803540814 
								
							 
						 
						
							
							
								
								ggml-cpu : fix HWCAP2_I8MM value ( #10646 )  
							
							
							
						 
						
							2024-12-04 14:40:44 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f0678c5ff4 
								
							 
						 
						
							
							
								
								ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-29 16:25:39 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								76b27d29c2 
								
							 
						 
						
							
							
								
								ggml : fix row condition for i8mm kernels ( #10561 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-28 14:56:37 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c202cef168 
								
							 
						 
						
							
							
								
								ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )  
							
							... 
							
							
							
							* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard 
							
						 
						
							2024-11-28 13:52:03 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5931c1f233 
								
							 
						 
						
							
							
								
								ggml : add support for dynamic loading of backends ( #10469 )  
							
							... 
							
							
							
							* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-25 15:13:39 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								55ed008b2d 
								
							 
						 
						
							
							
								
								ggml : do not use ARM features not included in the build ( #10457 )  
							
							
							
						 
						
							2024-11-23 14:41:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									FirstTimeEZ 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a43178299c 
								
							 
						 
						
							
							
								
								ggml : fix undefined reference to 'getcpu' ( #10354 )  
							
							... 
							
							
							
							https://github.com/ggerganov/llama.cpp/issues/10352  
						
							2024-11-17 10:39:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
							
							
								
							
							
								8a43e940ab 
								
							 
						 
						
							
							
								
								ggml: new optimization interface (ggml/988)  
							
							
							
						 
						
							2024-11-17 08:30:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								18429220bd 
								
							 
						 
						
							
							
								
								AVX BF16 and single scale quant optimizations ( #10212 )  
							
							... 
							
							
							
							* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge 
							
						 
						
							2024-11-15 12:47:58 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Charles Xu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1607a5e5b0 
								
							 
						 
						
							
							
								
								backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels ( #9921 )  
							
							... 
							
							
							
							* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
							
						 
						
							2024-11-15 01:28:50 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae8de6d50a 
								
							 
						 
						
							
							
								
								ggml : build backends as libraries ( #10256 )  
							
							... 
							
							
							
							* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com> 
							
						 
						
							2024-11-14 18:04:35 +01:00