Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e0ba0e604 
								
							 
						 
						
							
							
								
								readme : remove old badge  
							
							
							
						 
						
							2024-11-30 10:09:21 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								abadba05be 
								
							 
						 
						
							
							
								
								readme : refresh ( #10587 )  
							
							... 
							
							
							
							* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF 
							
						 
						
							2024-11-30 09:47:07 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Eve 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0533e7fb38 
								
							 
						 
						
							
							
								
								vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )  
							
							... 
							
							
							
							* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant 
							
						 
						
							2024-11-30 08:00:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc2d2c889 
								
							 
						 
						
							
							
								
								ggml : move AMX to the CPU backend ( #10570 )  
							
							... 
							
							
							
							* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> 
							
						 
						
							2024-11-29 21:54:58 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b782e5c7d4 
								
							 
						 
						
							
							
								
								server : add more test cases ( #10569 )  
							
							... 
							
							
							
							* server : add split model test
* add test speculative
* add invalid cases 
							
						 
						
							2024-11-29 21:48:56 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Robert Collins 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3a8e9af402 
								
							 
						 
						
							
							
								
								imatrix : support combine-only ( #10492 )  
							
							... 
							
							
							
							* imatrix-combine-only idea
* ensured that behavior consistent with log 
							
						 
						
							2024-11-29 19:21:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a3a3048e7a 
								
							 
						 
						
							
							
								
								cleanup UI link list ( #10577 )  
							
							... 
							
							
							
							* cleanup UI link list
* sort list alphabetically
* add missing licenses 
							
						 
						
							2024-11-29 17:45:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f0678c5ff4 
								
							 
						 
						
							
							
								
								ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-29 16:25:39 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b3242bbea 
								
							 
						 
						
							
							
								
								ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )  
							
							
							
						 
						
							2024-11-29 14:49:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Alberto Cabrera Pérez 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0f77aae560 
								
							 
						 
						
							
							
								
								sycl : offload of get_rows set to 0 ( #10432 )  
							
							
							
						 
						
							2024-11-29 20:38:45 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Alberto Cabrera Pérez 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								266b8519ee 
								
							 
						 
						
							
							
								
								sycl : Reroute permuted mul_mats through oneMKL ( #10408 )  
							
							... 
							
							
							
							This PR fixes the failing MUL_MAT tests for the sycl backend. 
							
						 
						
							2024-11-29 09:49:43 +00:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Chenguang Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								938f608742 
								
							 
						 
						
							
							
								
								CANN: RoPE operator optimization ( #10563 )  
							
							... 
							
							
							
							* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com> 
							
						 
						
							2024-11-29 14:46:55 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f095a649ec 
								
							 
						 
						
							
							
								
								vulkan: get the first command buffer submitted sooner ( #10499 )  
							
							... 
							
							
							
							This is an incremental improvement over #9118  to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU. 
							
						 
						
							2024-11-29 07:18:02 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Ting Lou 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								678d7994f4 
								
							 
						 
						
							
							
								
								llava: return false instead of exit ( #10546 )  
							
							
							
						 
						
							2024-11-29 01:09:46 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dc22344088 
								
							 
						 
						
							
							
								
								ggml : remove redundant copyright notice + update authors  
							
							
							
						 
						
							2024-11-28 20:46:40 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4c0a95b107 
								
							 
						 
						
							
							
								
								llama : add missing model types  
							
							
							
						 
						
							2024-11-28 20:45:07 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c59567689 
								
							 
						 
						
							
							
								
								server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )  
							
							... 
							
							
							
							* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3 
							
						 
						
							2024-11-28 19:17:49 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								890719311b 
								
							 
						 
						
							
							
								
								common: fix warning message when no GPU found ( #10564 )  
							
							
							
						 
						
							2024-11-28 18:15:25 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Random Fly 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7281cf13ad 
								
							 
						 
						
							
							
								
								docs: fix outdated usage of llama-simple ( #10565 )  
							
							
							
						 
						
							2024-11-28 16:03:11 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e90688edd0 
								
							 
						 
						
							
							
								
								ci : fix tag name in cuda and hip releases ( #10566 )  
							
							
							
						 
						
							2024-11-28 15:58:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								76b27d29c2 
								
							 
						 
						
							
							
								
								ggml : fix row condition for i8mm kernels ( #10561 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-28 14:56:37 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								eea986f215 
								
							 
						 
						
							
							
								
								cmake : fix ARM feature detection ( #10543 )  
							
							... 
							
							
							
							ggml-ci 
							
						 
						
							2024-11-28 14:56:23 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shupei Fan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c202cef168 
								
							 
						 
						
							
							
								
								ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )  
							
							... 
							
							
							
							* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard 
							
						 
						
							2024-11-28 13:52:03 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Sergio López 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2025fa67e9 
								
							 
						 
						
							
							
								
								kompute : improve backend to pass test_backend_ops ( #10542 )  
							
							... 
							
							
							
							* kompute: op_unary: reject unsupported parameters
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: softmax: implement ALiBi support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: rope: implement neox and phi3 support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q4_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_f16 permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
* kompute: op_mul_mat_q6_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com>
---------
Signed-off-by: Sergio Lopez <slp@redhat.com> 
							
						 
						
							2024-11-28 12:51:38 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Ruixin Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c6bc73951e 
								
							 
						 
						
							
							
								
								CANN: Update cann.md to display correctly in CLion ( #10538 )  
							
							
							
						 
						
							2024-11-28 15:27:11 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									leo-pony 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								605fa66c50 
								
							 
						 
						
							
							
								
								CANN: Fix SOC_TYPE compile bug ( #10519 )  
							
							... 
							
							
							
							* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version 
							
						 
						
							2024-11-28 15:25:24 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Chenguang Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b7420131bf 
								
							 
						 
						
							
							
								
								CANN: ROPE operator optimization ( #10540 )  
							
							... 
							
							
							
							* [cann] ROPE operator optimization
Co-authored-by: noemotiovon <noemotiovon@gmail.com> 
							
						 
						
							2024-11-28 14:24:46 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f912511bc 
								
							 
						 
						
							
							
								
								common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
							
							
							
						 
						
							2024-11-27 22:30:52 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									uvos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3ad5451f3b 
								
							 
						 
						
							
							
								
								Add some minimal optimizations for CDNA ( #10498 )  
							
							... 
							
							
							
							* Add some minimal optimizations for CDNA
* ggml_cuda: set launch bounds also for GCN as it helps there too 
							
						 
						
							2024-11-27 17:10:08 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46c69e0e75 
								
							 
						 
						
							
							
								
								ci : faster CUDA toolkit installation method and use ccache ( #10537 )  
							
							... 
							
							
							
							* ci : faster CUDA toolkit installation method and use ccache
* remove fetch-depth
* only pack CUDA runtime on master 
							
						 
						
							2024-11-27 11:03:25 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e2301f4a4 
								
							 
						 
						
							
							
								
								metal : fix group_norm support condition ( #0 )  
							
							
							
						 
						
							2024-11-27 11:22:14 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fee824a1a1 
								
							 
						 
						
							
							
								
								sync : ggml  
							
							
							
						 
						
							2024-11-27 11:10:42 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Frankie Robertson 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9150f8fef9 
								
							 
						 
						
							
							
								
								Do not include arm_neon.h when compiling CUDA code (ggml/1028)  
							
							
							
						 
						
							2024-11-27 11:10:27 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c31ed2abfc 
								
							 
						 
						
							
							
								
								vulkan: define all quant data structures in types.comp ( #10440 )  
							
							
							
						 
						
							2024-11-27 08:32:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b3466bedf 
								
							 
						 
						
							
							
								
								vulkan: Handle GPUs with less shared memory ( #10468 )  
							
							... 
							
							
							
							There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037 ). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory. 
							
						 
						
							2024-11-27 08:30:27 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								249a7902ec 
								
							 
						 
						
							
							
								
								vulkan: further optimize q5_k mul_mat_vec ( #10479 )  
							
							
							
						 
						
							2024-11-27 08:21:59 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71a64989a5 
								
							 
						 
						
							
							
								
								vulkan: skip integer div/mod in get_offsets for batch_idx==0 ( #10506 )  
							
							
							
						 
						
							2024-11-27 08:08:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4a57d362e1 
								
							 
						 
						
							
							
								
								vulkan: optimize Q2_K and Q3_K mul_mat_vec ( #10459 )  
							
							
							
						 
						
							2024-11-27 08:00:50 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9b00a70b0 
								
							 
						 
						
							
							
								
								ci : fix cuda releases ( #10532 )  
							
							
							
						 
						
							2024-11-26 22:12:10 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Shane A 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								de5097351c 
								
							 
						 
						
							
							
								
								Add OLMo 2 model in docs ( #10530 )  
							
							... 
							
							
							
							* Add link to OLMo 2 model in docs
* Change link to landing page 
							
						 
						
							2024-11-26 21:55:29 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5a349f2809 
								
							 
						 
						
							
							
								
								ci : remove nix workflows ( #10526 )  
							
							
							
						 
						
							2024-11-26 21:13:54 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								30ec398321 
								
							 
						 
						
							
							
								
								llama : disable warnings for 3rd party sha1 dependency ( #10527 )  
							
							
							
						 
						
							2024-11-26 21:01:47 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Tristan Druyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be0e350c8b 
								
							 
						 
						
							
							
								
								Fix HIP flag inconsistency & build docs ( #10524 )  
							
							... 
							
							
							
							* Fix inconsistency of HIP flags in cmake & make
* Fix docs regarding GGML_HIP 
							
						 
						
							2024-11-26 19:27:28 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									R0CKSTAR 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								249cd93da3 
								
							 
						 
						
							
							
								
								mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make ( #10516 )  
							
							... 
							
							
							
							Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> 
							
						 
						
							2024-11-26 17:00:41 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeff Bolz 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								904109ed0d 
								
							 
						 
						
							
							
								
								vulkan: fix group_norm ( #10496 )  
							
							... 
							
							
							
							Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).
Fixes https://github.com/leejet/stable-diffusion.cpp/issues/439 . 
							
						 
						
							2024-11-26 16:45:05 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Xuan Son Nguyen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45abe0f74e 
								
							 
						 
						
							
							
								
								server : replace behave with pytest ( #10416 )  
							
							... 
							
							
							
							* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt 
							
						 
						
							2024-11-26 16:20:18 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Neo Zhang Jianyu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0bbd2262a3 
								
							 
						 
						
							
							
								
								restore the condistion to build & update pacakge when merge ( #10507 )  
							
							... 
							
							
							
							Co-authored-by: arthw <14088817+arthw@users.noreply.github.com> 
							
						 
						
							2024-11-26 21:43:47 +08:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab96610b1e 
								
							 
						 
						
							
							
								
								cmake : enable warnings in llama ( #10474 )  
							
							... 
							
							
							
							* cmake : enable warnings in llama
ggml-ci
* cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS
* cmake : get_flags -> ggml_get_flags
* speculative-simple : fix warnings
* cmake : reuse ggml_get_flags
ggml-ci
* speculative-simple : fix compile warning
ggml-ci 
							
						 
						
							2024-11-26 14:18:08 +02:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7db3846a94 
								
							 
						 
						
							
							
								
								ci : publish the docker images created during scheduled runs ( #10515 )  
							
							
							
						 
						
							2024-11-26 13:05:20 +01:00 
							
								 
							
						 
					 
				
					
						
							
								
								
									Diego Devesa 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c6807b3f28 
								
							 
						 
						
							
							
								
								ci : add ubuntu cuda build, build with one arch on windows ( #10456 )  
							
							
							
						 
						
							2024-11-26 13:05:07 +01:00