John Balis 
								
							 
						 
						
							
							
							
							
								
							
							
								fde13b3bb9 
								
							 
						 
						
							
							
								
								feat: cuda implementation for ggml_conv_transpose_1d (ggml/854)  
							
							... 
							
							
							
							* conv transpose 1d passing test for 1d input and kernel
* working for different input and output channel counts, added test for variable stride
* initial draft appears to work with stride other than 1
* working with all old and new conv1d  tests
* added a test for large tensors
* removed use cuda hardcoding
* restored test-conv-transpose.c
* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail
* fixed accumulator bug
* added test to test-backend-ops
* fixed mistake
* addressed review
* fixed includes
* removed blank lines
* style and warning fixes
* return failure when test fails
* fix supports_op
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-07-08 12:23:00 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Natsu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1d894a790e 
								
							 
						 
						
							
							
								
								cmake : add GGML_BUILD and GGML_SHARED macro definitions ( #8281 )  
							
							
							
						 
						
							2024-07-05 17:29:35 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ouadie EL FAROUKI 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1f3e1b66e2 
								
							 
						 
						
							
							
								
								Enabled more data types for oneMKL gemm_batch ( #8236 )  
							
							
							
						 
						
							2024-07-05 13:23:25 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8e558309dc 
								
							 
						 
						
							
							
								
								CUDA: MMQ support for iq4_nl, iq4_xs ( #8278 )  
							
							
							
						 
						
							2024-07-05 09:06:31 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniele 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a423800ff 
								
							 
						 
						
							
							
								
								CUDA: revert part of the RDNA1 optimizations ( #8309 )  
							
							... 
							
							
							
							The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s 
							
						 
						
							2024-07-05 09:06:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bcefa03bc0 
								
							 
						 
						
							
							
								
								CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 ( #8311 )  
							
							
							
						 
						
							2024-07-05 09:05:34 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									luoyu-intel 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9554e20b6 
								
							 
						 
						
							
							
								
								[SYCL] Fix WARP_SIZE=16 bug of Intel GPU ( #8266 )  
							
							... 
							
							
							
							* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp 
							
						 
						
							2024-07-05 13:06:13 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Neo Zhang Jianyu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f09b7cb609 
								
							 
						 
						
							
							
								
								rm get_work_group_size() by local cache for performance ( #8286 )  
							
							... 
							
							
							
							Co-authored-by: arthw <14088817+arthw@users.noreply.github.com> 
							
						 
						
							2024-07-05 10:32:29 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									AidanBeltonS 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f619024764 
								
							 
						 
						
							
							
								
								[SYCL] Remove unneeded semicolons ( #8280 )  
							
							
							
						 
						
							2024-07-04 09:07:19 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Daniele 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d23287f122 
								
							 
						 
						
							
							
								
								Define and optimize  RDNA1 ( #8085 )  
							
							
							
						 
						
							2024-07-04 01:02:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Judd 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8d6a23804 
								
							 
						 
						
							
							
								
								fix typo ( #8267 )  
							
							... 
							
							
							
							Co-authored-by: Judd <foldl@boxvest.com> 
							
						 
						
							2024-07-03 14:40:16 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									AidanBeltonS 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fadde67135 
								
							 
						 
						
							
							
								
								Dequant improvements rebase ( #8255 )  
							
							... 
							
							
							
							* Single load for half2
* Store scales in local mem
* Vec load quantized values 
							
						 
						
							2024-07-03 09:55:34 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Clint Herron 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07a3fc0608 
								
							 
						 
						
							
							
								
								Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. ( #8258 )  
							
							
							
						 
						
							2024-07-02 12:18:10 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0e0590adab 
								
							 
						 
						
							
							
								
								cuda : update supports_op for matrix multiplication ( #8245 )  
							
							
							
						 
						
							2024-07-02 09:39:38 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									luoyu-intel 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9f3b10215 
								
							 
						 
						
							
							
								
								[SYCL] Fix win build conflict of math library ( #8230 )  
							
							... 
							
							
							
							* fix win build conflict of math library
* fix the condition: !(win32 & SYCL)
* revert warp_size=16 
							
						 
						
							2024-07-02 12:50:07 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									luoyu-intel 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d08c20edde 
								
							 
						 
						
							
							
								
								[SYCL] Fix the sub group size of Intel ( #8106 )  
							
							... 
							
							
							
							* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size 
							
						 
						
							2024-07-02 10:16:00 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb5fad4c6c 
								
							 
						 
						
							
							
								
								CUDA: refactor and optimize IQ MMVQ ( #8215 )  
							
							... 
							
							
							
							* CUDA: refactor and optimize IQ MMVQ
* uint -> uint32_t
* __dp4a -> ggml_cuda_dp4a
* remove MIN_CC_DP4A checks
* change default
* try CI fix 
							
						 
						
							2024-07-01 20:39:06 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									zhentaoyu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								197fe6c1d7 
								
							 
						 
						
							
							
								
								[SYCL] Update SYCL-Rope op and Refactor ( #8157 )  
							
							... 
							
							
							
							* align with rope.cu and move sycl-op to a single file 
							
						 
						
							2024-07-01 19:39:06 +08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Johannes Gäßler 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								85a267daaa 
								
							 
						 
						
							
							
								
								CUDA: fix MMQ stream-k for --split-mode row ( #8167 )  
							
							
							
						 
						
							2024-06-27 16:26:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slaren 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								31ec3993f6 
								
							 
						 
						
							
							
								
								ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) ( #8140 )  
							
							
							
						 
						
							2024-06-26 21:34:14 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georgi Gerganov 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3f65429c4 
								
							 
						 
						
							
							
								
								llama : reorganize source code + improve CMake ( #8006 )  
							
							... 
							
							
							
							* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com> 
							
						 
						
							2024-06-26 18:33:02 +03:00