| .. | 
		
		
			
			
			
			
				| template-instances | CUDA: refactor mmq, dmmv, mmvq (#7716) | 2024-06-05 16:53:00 +02:00 | 
		
			
			
			
			
				| acc.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| acc.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| arange.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| arange.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| argsort.cu | ggml : mul_mat_id use the same tensor for all the experts (#6387) | 2024-04-03 16:07:05 +03:00 | 
		
			
			
			
			
				| argsort.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| binbcast.cu | ggml : group all experts in a single ggml_mul_mat_id (#6505) | 2024-04-18 15:18:48 +02:00 | 
		
			
			
			
			
				| binbcast.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| clamp.cu | Introduction of CUDA Graphs to LLama.cpp (#6766) | 2024-05-08 22:55:49 +02:00 | 
		
			
			
			
			
				| clamp.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| common.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| concat.cu | cuda : non-cont concat support (#7610) | 2024-05-29 15:38:26 +03:00 | 
		
			
			
			
			
				| concat.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| convert.cu | ggml : drop support for QK_K=64 (#7473) | 2024-05-23 10:00:21 +03:00 | 
		
			
			
			
			
				| convert.cuh | llama : add Command R Plus support (#6491) | 2024-04-09 11:16:13 +03:00 | 
		
			
			
			
			
				| cpy.cu | Introduction of CUDA Graphs to LLama.cpp (#6766) | 2024-05-08 22:55:49 +02:00 | 
		
			
			
			
			
				| cpy.cuh | Introduction of CUDA Graphs to LLama.cpp (#6766) | 2024-05-08 22:55:49 +02:00 | 
		
			
			
			
			
				| dequantize.cuh | llama : add Command R Plus support (#6491) | 2024-04-09 11:16:13 +03:00 | 
		
			
			
			
			
				| diagmask.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| diagmask.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| dmmv.cu | CUDA: refactor mmq, dmmv, mmvq (#7716) | 2024-06-05 16:53:00 +02:00 | 
		
			
			
			
			
				| dmmv.cuh | sync : ggml (#6351) | 2024-03-29 17:45:46 +02:00 | 
		
			
			
			
			
				| fattn-common.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| fattn-tile-f16.cu | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| fattn-tile-f16.cuh | CUDA: faster large batch FA without tensor cores (#7314) | 2024-05-17 18:54:52 +02:00 | 
		
			
			
			
			
				| fattn-tile-f32.cu | CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) | 2024-06-01 15:47:04 +02:00 | 
		
			
			
			
			
				| fattn-tile-f32.cuh | CUDA: faster large batch FA without tensor cores (#7314) | 2024-05-17 18:54:52 +02:00 | 
		
			
			
			
			
				| fattn-vec-f16.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| fattn-vec-f32.cuh | Fix FlashAttention debug test, FP32 assert (#7684) | 2024-06-01 23:26:10 +02:00 | 
		
			
			
			
			
				| fattn-wmma-f16.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| fattn.cu | CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) | 2024-06-01 15:47:04 +02:00 | 
		
			
			
			
			
				| fattn.cuh | ggml : add Flash Attention (#5021) | 2024-04-30 12:16:08 +03:00 | 
		
			
			
			
			
				| getrows.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| getrows.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| im2col.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| im2col.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| mma.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| mmq.cu | CUDA: revise q8_1 data layout for mul_mat_q (#7824) | 2024-06-09 09:42:25 +02:00 | 
		
			
			
			
			
				| mmq.cuh | CUDA: use tensor cores for MMQ (#7676) | 2024-06-10 11:45:13 +02:00 | 
		
			
			
			
			
				| mmvq.cu | CUDA: refactor mmq, dmmv, mmvq (#7716) | 2024-06-05 16:53:00 +02:00 | 
		
			
			
			
			
				| mmvq.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| norm.cu | ggml : fix YARN + add tests + add asserts (#7617) | 2024-05-29 20:17:31 +03:00 | 
		
			
			
			
			
				| norm.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| pad.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| pad.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| pool2d.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| pool2d.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| quantize.cu | CUDA: revise q8_1 data layout for mul_mat_q (#7824) | 2024-06-09 09:42:25 +02:00 | 
		
			
			
			
			
				| quantize.cuh | CUDA: revise q8_1 data layout for mul_mat_q (#7824) | 2024-06-09 09:42:25 +02:00 | 
		
			
			
			
			
				| rope.cu | ggml : refactor rope norm/neox (#7634) | 2024-06-05 11:29:20 +03:00 | 
		
			
			
			
			
				| rope.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| scale.cu | Introduction of CUDA Graphs to LLama.cpp (#6766) | 2024-05-08 22:55:49 +02:00 | 
		
			
			
			
			
				| scale.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| softmax.cu | CUDA: deduplicate FlashAttention code (#7352) | 2024-05-18 12:36:25 +02:00 | 
		
			
			
			
			
				| softmax.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| sumrows.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| sumrows.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| tsembd.cu | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| tsembd.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| unary.cu | feat: implemented sigmoid function (ggml/806) | 2024-05-11 15:38:34 +03:00 | 
		
			
			
			
			
				| unary.cuh | feat: implemented sigmoid function (ggml/806) | 2024-05-11 15:38:34 +03:00 | 
		
			
			
			
			
				| upscale.cu | ggml : add ggml_upscale_ext(ggml/814) | 2024-05-15 13:23:33 +03:00 | 
		
			
			
			
			
				| upscale.cuh | cuda : refactor into multiple files (#6269) | 2024-03-25 13:50:23 +01:00 | 
		
			
			
			
			
				| vecdotq.cuh | CUDA: refactor mmq, dmmv, mmvq (#7716) | 2024-06-05 16:53:00 +02:00 |