llama.cpp

Author	SHA1	Message	Date
0cc4m	24eeb97d13	Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly	2023-07-02 22:11:58 +02:00
Georgi Gerganov	46088f7231	ggml : fix build with OpenBLAS (close #2066 )	2023-07-02 09:46:46 +03:00
Johannes Gäßler	0bc2cdfc87	Better CUDA synchronization logic (#2057 )	2023-07-01 21:49:44 +02:00
Johannes Gäßler	befb3a3562	Test-based VRAM scratch size + context adjustment (#2056 )	2023-07-01 21:47:26 +02:00
Daniel Drake	b213227067	cmake : don't force -mcpu=native on aarch64 (#2063 ) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes #495.	2023-07-01 21:31:44 +03:00
Aaron Miller	2f8cd979ec	metal : release buffers when freeing metal context (#2062 )	2023-07-01 21:14:59 +03:00
Judd	471aab6e4c	convert : add support of baichuan-7b (#2055 ) Co-authored-by: Judd <foldl@boxvest.com>	2023-07-01 20:00:25 +03:00
Georgi Gerganov	463f2f4c4f	llama : fix return value of llama_load_session_file_internal (#2022 )	2023-07-01 19:05:09 +03:00
Rand Xie	cb44dbc7de	llama : catch llama_load_session_file_internal exceptions (#2022 ) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions	2023-07-01 19:02:58 +03:00
Georgi Gerganov	79f634a19d	embd-input : fix returning ptr to temporary	2023-07-01 18:46:00 +03:00
Georgi Gerganov	04606a1599	train : fix compile warning	2023-07-01 18:45:44 +03:00
Qingyou Meng	b1ca8f36a9	ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995 ) Will not be scheduled unless explicitly enabled.	2023-07-01 18:42:43 +03:00
0cc4m	36cd5d85e9	Avoid requesting dedicated memory, VMA can decide that by itself	2023-06-30 21:20:19 +02:00
0cc4m	4ea9b2fd4b	Add VMA library	2023-06-30 21:15:06 +02:00
0cc4m	c8ff09bdc7	dequant_q4_0 kernel	2023-06-30 20:48:42 +02:00
0cc4m	cb5cb4d6e2	Fix f16_to_f32 kernel	2023-06-30 20:48:03 +02:00
0cc4m	df3cdbdac7	Output FP32 in fp16 matmul shader	2023-06-30 18:37:10 +02:00
0cc4m	40c8f843f2	Fix mulmat_f16	2023-06-30 18:37:10 +02:00
0cc4m	c31e14b2fd	Enable device extensions properly, restore fp16 matmul op	2023-06-30 18:37:10 +02:00
0cc4m	fc5bb53b32	Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel	2023-06-30 18:37:10 +02:00
0cc4m	3adc7b1d60	First FP16 attempt, disabled for now	2023-06-30 18:37:10 +02:00
0cc4m	2c70df985a	Continue vulkan implementation and optimization	2023-06-30 18:36:42 +02:00
0cc4m	0c9cca00bd	Write coalescing	2023-06-30 18:36:42 +02:00
0cc4m	7c6860b483	2D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	1b4863c2b9	1D Blocktiling	2023-06-30 18:36:42 +02:00
0cc4m	baf9ff536b	GEMM Kernel optimization	2023-06-30 18:36:42 +02:00
0cc4m	a42376e7ec	First matmul success	2023-06-30 18:36:42 +02:00
0cc4m	8ce84c2747	Continue implementation	2023-06-30 18:36:42 +02:00
0cc4m	2471728a9d	Add aligned malloc and free for VMA	2023-06-30 18:36:42 +02:00
0cc4m	fc4f207cfb	Matmul call	2023-06-30 18:36:41 +02:00
0cc4m	b0e65855d1	Vulkan development	2023-06-30 18:36:41 +02:00
0cc4m	a4004d4fa8	Vulkan memory management	2023-06-30 18:36:41 +02:00
0cc4m	88d4ec05a8	Continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	4a96d0eb7f	Fix matmul kernel, continue implementation	2023-06-30 18:36:41 +02:00
0cc4m	061246fb07	Vulkan loader code	2023-06-30 18:36:41 +02:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-29 06:15:15 -07:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 )	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 )	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Erik Scholz	9d23589d63	fix pthreads setaffinity usage on android (#2020 )	2023-06-27 19:06:33 +02:00
Howard Su	0be54f75a6	baby-llama : fix build after ggml_rope change (#2016 )	2023-06-27 08:07:13 +03:00
Georgi Gerganov	181e8d9755	llama : fix rope usage after ChatGLM change	2023-06-27 00:37:33 +03:00
Georgi Gerganov	d9779021bd	ggml : add support for ChatGLM RoPE	2023-06-27 00:06:51 +03:00
Roman Parykin	d38e451578	readme : add Scala 3 bindings repo (#2010 )	2023-06-26 22:47:59 +03:00
David Yang	eaa6ca5a61	ggml : increase max tensor name + clean up compiler warnings in train-text (#1988 ) * Clean up compiler warnings in train-text Some brackets to disambiguate order of operations * Increase GGML_MAX_NAME Avoiding strncpy danger in train-text-from-scratch and reducing potential future name length issues	2023-06-26 22:45:32 +03:00
Gustavo Rocha Dias	aa777abbb7	readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007 ) * docs - Alternative way to build at Android, with CLBlast. * doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux. * doc- fix typo	2023-06-26 22:34:45 +03:00

1 2 3 4 5 ...

898 commits