Wang Qin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5c7a5aa0c3 
								
							 
						 
						
							
							
								
								ci: add error handling for Python venv creation in run.sh ( #10608 )  
							
							
							
						 
						
							2024-12-01 20:11:42 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3420909dff 
								
							 
						 
						
							
							
								
								ggml : automatic selection of best CPU backend ( #10606 )  
							
							... 
							
							
							
							* ggml : automatic selection of best CPU backend
* amx : minor opt
* add GGML_AVX_VNNI to enable avx-vnni, fix checks 
							
						 
						
							2024-12-01 16:12:41 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									alek3y 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								86dc11c5bc 
								
							 
						 
						
							
							
								
								server : bind to any port when specified ( #10590 )  
							
							
							
						 
						
							2024-12-01 13:33:12 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6acce39710 
								
							 
						 
						
							
							
								
								readme : update the usage section with examples ( #10596 )  
							
							... 
							
							
							
							* readme : update the usage section with examples
* readme : more examples 
							
						 
						
							2024-12-01 11:25:17 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Wang Qin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43957ef203 
								
							 
						 
						
							
							
								
								build: update Makefile comments for C++ version change ( #10598 )  
							
							
							
						 
						
							2024-12-01 04:19:44 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Adrien Gallouët 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0c39f44d70 
								
							 
						 
						
							
							
								
								ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() ( #10567 )  
							
							... 
							
							
							
							Signed-off-by: Adrien Gallouët <angt@huggingface.co> 
							
						 
						
							2024-11-30 09:13:18 -08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e0ba0e604 
								
							 
						 
						
							
							
								
								readme : remove old badge  
							
							
							
						 
						
							2024-11-30 10:09:21 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								abadba05be 
								
							 
						 
						
							
							
								
								readme : refresh ( #10587 )  
							
							... 
							
							
							
							* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF 
							
						 
						
							2024-11-30 09:47:07 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0533e7fb38 
								
							 
						 
						
							
							
								
								vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )  
							
							... 
							
							
							
							* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant 
							
						 
						
							2024-11-30 08:00:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b782e5c7d4 
								
							 
						 
						
							
							
								
								server : add more test cases ( #10569 )  
							
							... 
							
							
							
							* server : add split model test
* add test speculative
* add invalid cases 
							
						 
						
							2024-11-29 21:48:56 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Collins 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3a8e9af402 
								
							 
						 
						
							
							
								
								imatrix : support combine-only ( #10492 )  
							
							... 
							
							
							
							* imatrix-combine-only idea
* ensured that behavior consistent with log 
							
						 
						
							2024-11-29 19:21:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3a3048e7a 
								
							 
						 
						
							
							
								
								cleanup UI link list ( #10577 )  
							
							... 
							
							
							
							* cleanup UI link list
* sort list alphabetically
* add missing licenses 
							
						 
						
							2024-11-29 17:45:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f0678c5ff4 
								
							 
						 
						
							
							
								
								ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-29 16:25:39 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b3242bbea 
								
							 
						 
						
							
							
								
								ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )  
							
							
							
						 
						
							2024-11-29 14:49:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Alberto Cabrera Pérez 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0f77aae560 
								
							 
						 
						
							
							
								
								sycl : offload of get_rows set to 0 ( #10432 )  
							
							
							
						 
						
							2024-11-29 20:38:45 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Alberto Cabrera Pérez 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								266b8519ee 
								
							 
						 
						
							
							
								
								sycl : Reroute permuted mul_mats through oneMKL ( #10408 )  
							
							... 
							
							
							
							This PR fixes the failing MUL_MAT tests for the sycl backend. 
							
						 
						
							2024-11-29 09:49:43 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Chenguang Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								938f608742 
								
							 
						 
						
							
							
								
								CANN: RoPE operator optimization ( #10563 )  
							
							... 
							
							
							
							* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com> 
							
						 
						
							2024-11-29 14:46:55 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f095a649ec 
								
							 
						 
						
							
							
								
								vulkan: get the first command buffer submitted sooner ( #10499 )  
							
							... 
							
							
							
							This is an incremental improvement over #9118  to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU. 
							
						 
						
							2024-11-29 07:18:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Ting Lou 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								678d7994f4 
								
							 
						 
						
							
							
								
								llava: return false instead of exit ( #10546 )  
							
							
							
						 
						
							2024-11-29 01:09:46 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dc22344088 
								
							 
						 
						
							
							
								
								ggml : remove redundant copyright notice + update authors  
							
							
							
						 
						
							2024-11-28 20:46:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4c0a95b107 
								
							 
						 
						
							
							
								
								llama : add missing model types  
							
							
							
						 
						
							2024-11-28 20:45:07 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c59567689 
								
							 
						 
						
							
							
								
								server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )  
							
							... 
							
							
							
							* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3 
							
						 
						
							2024-11-28 19:17:49 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								890719311b 
								
							 
						 
						
							
							
								
								common: fix warning message when no GPU found ( #10564 )  
							
							
							
						 
						
							2024-11-28 18:15:25 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Random Fly 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7281cf13ad 
								
							 
						 
						
							
							
								
								docs: fix outdated usage of llama-simple ( #10565 )  
							
							
							
						 
						
							2024-11-28 16:03:11 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e90688edd0 
								
							 
						 
						
							
							
								
								ci : fix tag name in cuda and hip releases ( #10566 )  
							
							
							
						 
						
							2024-11-28 15:58:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								76b27d29c2 
								
							 
						 
						
							
							
								
								ggml : fix row condition for i8mm kernels ( #10561 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-28 14:56:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								eea986f215 
								
							 
						 
						
							
							
								
								cmake : fix ARM feature detection ( #10543 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-28 14:56:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c202cef168 
								
							 
						 
						
							
							
								
								ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )  
							
							... 
							
							
							
							* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard 
							
						 
						
							2024-11-28 13:52:03 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2025fa67e9 
								
							 
						 
						
							
							
								
								kompute : improve backend to pass test_backend_ops ( #10542 )  
							
							... 
							
							
							
							* kompute: op_unary: reject unsupported parameters
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: softmax: implement ALiBi support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: rope: implement neox and phi3 support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q4_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_f16 permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q6_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
---------
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-11-28 12:51:38 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Ruixin Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c6bc73951e 
								
							 
						 
						
							
							
								
								CANN: Update cann.md to display correctly in CLion ( #10538 )  
							
							
							
						 
						
							2024-11-28 15:27:11 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									leo-pony 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								605fa66c50 
								
							 
						 
						
							
							
								
								CANN: Fix SOC_TYPE compile bug ( #10519 )  
							
							... 
							
							
							
							* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version 
							
						 
						
							2024-11-28 15:25:24 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Chenguang Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b7420131bf 
								
							 
						 
						
							
							
								
								CANN: ROPE operator optimization ( #10540 )  
							
							... 
							
							
							
							* [cann] ROPE operator optimization
Co-authored-by: noemotiovon <noemotiovon@gmail.com> 
							
						 
						
							2024-11-28 14:24:46 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f912511bc 
								
							 
						 
						
							
							
								
								common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
							
							
							
						 
						
							2024-11-27 22:30:52 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									uvos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3ad5451f3b 
								
							 
						 
						
							
							
								
								Add some minimal optimizations for CDNA ( #10498 )  
							
							... 
							
							
							
							* Add some minimal optimizations for CDNA
* ggml_cuda: set launch bounds also for GCN as it helps there too 
							
						 
						
							2024-11-27 17:10:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46c69e0e75 
								
							 
						 
						
							
							
								
								ci : faster CUDA toolkit installation method and use ccache ( #10537 )  
							
							... 
							
							
							
							* ci : faster CUDA toolkit installation method and use ccache
* remove fetch-depth
* only pack CUDA runtime on master 
							
						 
						
							2024-11-27 11:03:25 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e2301f4a4 
								
							 
						 
						
							
							
								
								metal : fix group_norm support condition ( #0 )  
							
							
							
						 
						
							2024-11-27 11:22:14 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fee824a1a1 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-11-27 11:10:42 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Frankie Robertson 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9150f8fef9 
								
							 
						 
						
							
							
								
								Do not include arm_neon.h when compiling CUDA code (ggml/1028)  
							
							
							
						 
						
							2024-11-27 11:10:27 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c31ed2abfc 
								
							 
						 
						
							
							
								
								vulkan: define all quant data structures in types.comp ( #10440 )  
							
							
							
						 
						
							2024-11-27 08:32:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b3466bedf 
								
							 
						 
						
							
							
								
								vulkan: Handle GPUs with less shared memory ( #10468 )  
							
							... 
							
							
							
							There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037 ). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory. 
							
						 
						
							2024-11-27 08:30:27 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								249a7902ec 
								
							 
						 
						
							
							
								
								vulkan: further optimize q5_k mul_mat_vec ( #10479 )  
							
							
							
						 
						
							2024-11-27 08:21:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71a64989a5 
								
							 
						 
						
							
							
								
								vulkan: skip integer div/mod in get_offsets for batch_idx==0 ( #10506 )  
							
							
							
						 
						
							2024-11-27 08:08:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4a57d362e1 
								
							 
						 
						
							
							
								
								vulkan: optimize Q2_K and Q3_K mul_mat_vec ( #10459 )  
							
							
							
						 
						
							2024-11-27 08:00:50 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9b00a70b0 
								
							 
						 
						
							
							
								
								ci : fix cuda releases ( #10532 )  
							
							
							
						 
						
							2024-11-26 22:12:10 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shane A 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								de5097351c 
								
							 
						 
						
							
							
								
								Add OLMo 2 model in docs ( #10530 )  
							
							... 
							
							
							
							* Add link to OLMo 2 model in docs
* Change link to landing page 
							
						 
						
							2024-11-26 21:55:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5a349f2809 
								
							 
						 
						
							
							
								
								ci : remove nix workflows ( #10526 )  
							
							
							
						 
						
							2024-11-26 21:13:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								30ec398321 
								
							 
						 
						
							
							
								
								llama : disable warnings for 3rd party sha1 dependency ( #10527 )  
							
							
							
						 
						
							2024-11-26 21:01:47 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Tristan Druyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be0e350c8b 
								
							 
						 
						
							
							
								
								Fix HIP flag inconsistency & build docs ( #10524 )  
							
							... 
							
							
							
							* Fix inconsistency of HIP flags in cmake & make
* Fix docs regarding GGML_HIP 
							
						 
						
							2024-11-26 19:27:28 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									R0CKSTAR 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								249cd93da3 
								
							 
						 
						
							
							
								
								mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make ( #10516 )  
							
							... 
							
							
							
							Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> 
							
						 
						
							2024-11-26 17:00:41 +01:00