llama.cpp

Author	SHA1	Message	Date
0cc4m	869ae76764	Disable glslc optimization	2023-07-05 22:23:07 +02:00
0cc4m	244939029d	Add WIP warp tile mat mul shaders	2023-07-05 22:18:12 +02:00
0cc4m	80b17e2f66	Fix trailing whitespace in vk_mem_alloc.h	2023-07-04 23:01:32 +02:00
0cc4m	e35d28fec3	Fix queue selection for AMD RADV	2023-07-04 22:57:08 +02:00
0cc4m	ae7325fdff	Fix 2d write	2023-07-04 22:42:07 +02:00
0cc4m	ade9555c48	Add 2d write operation, profiling code	2023-07-04 22:31:47 +02:00
0cc4m	24eeb97d13	Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly	2023-07-02 22:11:58 +02:00
0cc4m	36cd5d85e9	Avoid requesting dedicated memory, VMA can decide that by itself	2023-06-30 21:20:19 +02:00
0cc4m	4ea9b2fd4b	Add VMA library	2023-06-30 21:15:06 +02:00
0cc4m	c8ff09bdc7	dequant_q4_0 kernel	2023-06-30 20:48:42 +02:00
0cc4m	cb5cb4d6e2	Fix f16_to_f32 kernel	2023-06-30 20:48:03 +02:00
0cc4m	df3cdbdac7	Output FP32 in fp16 matmul shader	2023-06-30 18:37:10 +02:00
0cc4m	40c8f843f2	Fix mulmat_f16	2023-06-30 18:37:10 +02:00
0cc4m	c31e14b2fd	Enable device extensions properly, restore fp16 matmul op	2023-06-30 18:37:10 +02:00
0cc4m	fc5bb53b32	Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel	2023-06-30 18:37:10 +02:00
0cc4m	3adc7b1d60	First FP16 attempt, disabled for now	2023-06-30 18:37:10 +02:00
0cc4m	2c70df985a	Continue vulkan implementation and optimization	2023-06-30 18:36:42 +02:00
0cc4m	0c9cca00bd	Write coalescing	2023-06-30 18:36:42 +02:00
0cc4m	7c6860b483	2D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	1b4863c2b9	1D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	baf9ff536b	GEMM Kernel optimization	2023-06-30 18:36:42 +02:00
0cc4m	a42376e7ec	First matmul success	2023-06-30 18:36:42 +02:00
0cc4m	8ce84c2747	Continue implementation	2023-06-30 18:36:42 +02:00
0cc4m	2471728a9d	Add aligned malloc and free for VMA	2023-06-30 18:36:42 +02:00
0cc4m	fc4f207cfb	Matmul call	2023-06-30 18:36:41 +02:00
0cc4m	b0e65855d1	Vulkan development	2023-06-30 18:36:41 +02:00
0cc4m	a4004d4fa8	Vulkan memory management	2023-06-30 18:36:41 +02:00
0cc4m	88d4ec05a8	Continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	4a96d0eb7f	Fix matmul kernel, continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	061246fb07	Vulkan loader code	2023-06-30 18:36:41 +02:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-29 06:15:15 -07:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 )	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 )	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Erik Scholz	9d23589d63	fix pthreads setaffinity usage on android (#2020 )	2023-06-27 19:06:33 +02:00
Howard Su	0be54f75a6	baby-llama : fix build after ggml_rope change (#2016 )	2023-06-27 08:07:13 +03:00
Georgi Gerganov	181e8d9755	llama : fix rope usage after ChatGLM change	2023-06-27 00:37:33 +03:00
Georgi Gerganov	d9779021bd	ggml : add support for ChatGLM RoPE	2023-06-27 00:06:51 +03:00
Roman Parykin	d38e451578	readme : add Scala 3 bindings repo (#2010 )	2023-06-26 22:47:59 +03:00
David Yang	eaa6ca5a61	ggml : increase max tensor name + clean up compiler warnings in train-text (#1988 ) * Clean up compiler warnings in train-text Some brackets to disambiguate order of operations * Increase GGML_MAX_NAME Avoiding strncpy danger in train-text-from-scratch and reducing potential future name length issues	2023-06-26 22:45:32 +03:00
Gustavo Rocha Dias	aa777abbb7	readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007 ) * docs - Alternative way to build at Android, with CLBlast. * doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux. * doc- fix typo	2023-06-26 22:34:45 +03:00
Georgi Gerganov	c824d2e368	ggml : avoid conv 2d kernel round up	2023-06-26 21:03:59 +03:00
zrm	b853d45601	ggml : add NUMA support (#1556 ) * detect NUMA systems and pin work threads to nodes (linux) * disable mmap prefetch/readahead for NUMA systems * avoid sending finalize op to thread pool if it does nothing * silence robot * fix args * make --numa a param * recommendation that n_nodes evenly divide n_threads did not warrant such aggressive enforcement * lower synchronization overhead * statically allocate * move numa state to g_state * add description for --numa * ggml : minor style changes * ggml : minor style + try fix sanitizer build * llama : allow to initialize backend with NUMA support * llama : avoid ggml include in llama-util.h * ggml : style / formatting * ggml : fix handling of ops with n_threads > n_tasks > 1 * server : utilize numa parameter --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-26 20:57:59 +03:00
Georgi Gerganov	9225baef71	k-quants : fix indentation	2023-06-26 20:10:52 +03:00
katsu560	a84ab1da8d	tests : fix quantize perf (#1990 ) * fix test quantize perf * avoid the global state	2023-06-26 19:47:02 +03:00
katsu560	5743ca8092	k-quants : add AVX support to dot functions (#1916 ) * k_quants : add AVX support * k_quants : apply review comments	2023-06-26 19:46:07 +03:00

1 2 3 4 5 ...

793 commits