llama.cpp

Author	SHA1	Message	Date
0cc4m	8dd585e8cb	Variable matmul kernel using specialization constants	2023-07-09 15:50:28 +02:00
0cc4m	3bc7a80ca6	Rework command buffer handling	2023-07-09 11:37:32 +02:00
0cc4m	0ef62f511a	Fix validation errors, improve compatibility with AMD GPUs	2023-07-08 20:40:19 +02:00
0cc4m	c7c761a2b7	Add split-k optimization for small matrix multiplication Use semaphores for synchronization instead of fences or waitidle Rework async write/read for synchronization	2023-07-08 17:27:05 +02:00
0cc4m	c3d947510b	Optimize warptile matmul shader, replace blocktile with it	2023-07-07 07:13:47 +02:00
0cc4m	6d5a0ada8c	Merge pull request #2 from SlyEcho/vulkan add cmake commands	2023-07-07 05:53:11 +02:00
0cc4m	ea06a2c321	Disable glslc optimization for CMake	2023-07-07 05:52:33 +02:00
0cc4m	869ae76764	Disable glslc optimization	2023-07-05 22:23:07 +02:00
0cc4m	244939029d	Add WIP warp tile mat mul shaders	2023-07-05 22:18:12 +02:00
0cc4m	80b17e2f66	Fix trailing whitespace in vk_mem_alloc.h	2023-07-04 23:01:32 +02:00
0cc4m	e35d28fec3	Fix queue selection for AMD RADV	2023-07-04 22:57:08 +02:00
0cc4m	ae7325fdff	Fix 2d write	2023-07-04 22:42:07 +02:00
0cc4m	ade9555c48	Add 2d write operation, profiling code	2023-07-04 22:31:47 +02:00
Henri Vasserman	3d7d8d00a4	add cmake commands	2023-07-04 17:02:22 +03:00
0cc4m	24eeb97d13	Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly	2023-07-02 22:11:58 +02:00
0cc4m	36cd5d85e9	Avoid requesting dedicated memory, VMA can decide that by itself	2023-06-30 21:20:19 +02:00
0cc4m	4ea9b2fd4b	Add VMA library	2023-06-30 21:15:06 +02:00
0cc4m	c8ff09bdc7	dequant_q4_0 kernel	2023-06-30 20:48:42 +02:00
0cc4m	cb5cb4d6e2	Fix f16_to_f32 kernel	2023-06-30 20:48:03 +02:00
0cc4m	df3cdbdac7	Output FP32 in fp16 matmul shader	2023-06-30 18:37:10 +02:00
0cc4m	40c8f843f2	Fix mulmat_f16	2023-06-30 18:37:10 +02:00
0cc4m	c31e14b2fd	Enable device extensions properly, restore fp16 matmul op	2023-06-30 18:37:10 +02:00
0cc4m	fc5bb53b32	Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel	2023-06-30 18:37:10 +02:00
0cc4m	3adc7b1d60	First FP16 attempt, disabled for now	2023-06-30 18:37:10 +02:00
0cc4m	2c70df985a	Continue vulkan implementation and optimization	2023-06-30 18:36:42 +02:00
0cc4m	0c9cca00bd	Write coalescing	2023-06-30 18:36:42 +02:00
0cc4m	7c6860b483	2D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	1b4863c2b9	1D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	baf9ff536b	GEMM Kernel optimization	2023-06-30 18:36:42 +02:00
0cc4m	a42376e7ec	First matmul success	2023-06-30 18:36:42 +02:00
0cc4m	8ce84c2747	Continue implementation	2023-06-30 18:36:42 +02:00
0cc4m	2471728a9d	Add aligned malloc and free for VMA	2023-06-30 18:36:42 +02:00
0cc4m	fc4f207cfb	Matmul call	2023-06-30 18:36:41 +02:00
0cc4m	b0e65855d1	Vulkan development	2023-06-30 18:36:41 +02:00
0cc4m	a4004d4fa8	Vulkan memory management	2023-06-30 18:36:41 +02:00
0cc4m	88d4ec05a8	Continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	4a96d0eb7f	Fix matmul kernel, continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	061246fb07	Vulkan loader code	2023-06-30 18:36:41 +02:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-29 06:15:15 -07:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 )	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 )	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Erik Scholz	9d23589d63	fix pthreads setaffinity usage on android (#2020 )	2023-06-27 19:06:33 +02:00
Howard Su	0be54f75a6	baby-llama : fix build after ggml_rope change (#2016 )	2023-06-27 08:07:13 +03:00
Georgi Gerganov	181e8d9755	llama : fix rope usage after ChatGLM change	2023-06-27 00:37:33 +03:00
Georgi Gerganov	d9779021bd	ggml : add support for ChatGLM RoPE	2023-06-27 00:06:51 +03:00

1 2 3 4 5 ...

801 commits