llama.cpp

Author	SHA1	Message	Date
Concedo	3d2907d208	make gptneox and gptj work with extended context too	2023-07-02 18:28:09 +08:00
Concedo	d6b47e6a5b	Merge branch 'master' into concedo_experimental	2023-07-02 17:26:39 +08:00
Concedo	e17c8497cf	switched to NTK aware scaling	2023-07-02 17:25:08 +08:00
Concedo	e19483ca0f	increase scratch for above 4096	2023-07-02 14:55:08 +08:00
Georgi Gerganov	46088f7231	ggml : fix build with OpenBLAS (close #2066 )	2023-07-02 09:46:46 +03:00
Concedo	b85ea580d3	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-07-02 14:45:25 +08:00
Johannes Gäßler	0bc2cdfc87	Better CUDA synchronization logic (#2057 )	2023-07-01 21:49:44 +02:00
Johannes Gäßler	befb3a3562	Test-based VRAM scratch size + context adjustment (#2056 )	2023-07-01 21:47:26 +02:00
Daniel Drake	b213227067	cmake : don't force -mcpu=native on aarch64 (#2063 ) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes #495.	2023-07-01 21:31:44 +03:00
Aaron Miller	2f8cd979ec	metal : release buffers when freeing metal context (#2062 )	2023-07-01 21:14:59 +03:00
Judd	471aab6e4c	convert : add support of baichuan-7b (#2055 ) Co-authored-by: Judd <foldl@boxvest.com>	2023-07-01 20:00:25 +03:00
Concedo	ef3b8dc0d9	GPU accel for rwkv is slow, disable it	2023-07-02 00:41:46 +08:00
Concedo	e1a7042943	try out the new rwkv but it seems worse, may revert	2023-07-02 00:10:56 +08:00
Georgi Gerganov	463f2f4c4f	llama : fix return value of llama_load_session_file_internal (#2022 )	2023-07-01 19:05:09 +03:00
Rand Xie	cb44dbc7de	llama : catch llama_load_session_file_internal exceptions (#2022 ) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions	2023-07-01 19:02:58 +03:00
Georgi Gerganov	79f634a19d	embd-input : fix returning ptr to temporary	2023-07-01 18:46:00 +03:00
Georgi Gerganov	04606a1599	train : fix compile warning	2023-07-01 18:45:44 +03:00
Qingyou Meng	b1ca8f36a9	ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995 ) Will not be scheduled unless explicitly enabled.	2023-07-01 18:42:43 +03:00
Concedo	632bf27b65	more granular context size selections	2023-07-01 11:02:44 +08:00
Concedo	eda663f15f	update lite and up ver	2023-07-01 00:15:26 +08:00
Concedo	0cb8a9eab3	Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental # Conflicts: # llama.cpp	2023-06-30 23:29:38 +08:00
Concedo	67cb0b2760	Merge branch 'master' into concedo_experimental	2023-06-30 23:25:40 +08:00
Concedo	d16926dff4	Merge branch 'concedo' into concedo_experimental	2023-06-30 23:06:21 +08:00
Concedo	baf6325907	added flag for building kquants in tools	2023-06-30 23:06:11 +08:00
YellowRoseCx	30ea774e2c	Update CMakeLists.txt with dmmv_x/y/f16 (#277 )	2023-06-30 22:52:32 +08:00
bebopkim	1129d66ca9	To fix build problem on Apple Metal LLAMA_METAL=1 (#282 )	2023-06-30 22:50:38 +08:00
JohannesGaessler	600bf6d929	Test-based VRAM scratch size + context adjustment	2023-06-30 11:35:30 +02:00
Concedo	86469d15c4	fix for yr-rocm, large gpu scratch	2023-06-30 12:40:08 +08:00
Concedo	1347d3acc0	another missing flag?	2023-06-30 00:02:18 +08:00
Concedo	396f857021	make platform appropriate library	2023-06-29 23:50:48 +08:00
Concedo	f50c73a0b2	readme	2023-06-29 23:45:57 +08:00
Concedo	ad945e2c41	make instructions clearer	2023-06-29 22:13:39 +08:00
Concedo	64aba0a151	update readme	2023-06-29 21:42:04 +08:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-29 06:15:15 -07:00
Concedo	f09debb1ec	remove debug	2023-06-29 20:54:56 +08:00
Concedo	966d736582	revert cublasLt removal	2023-06-29 20:51:02 +08:00
Concedo	10a2bdfaf1	Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-06-29 20:35:17 +08:00
Concedo	c7c6e522e7	bigger scratch buffers for bigger context	2023-06-29 19:43:23 +08:00
Concedo	86b061b98c	wip on unified cublas integration, add all the small libraries but exclude the large ones	2023-06-29 18:35:31 +08:00
Concedo	c2f1ed6556	fix compile errors	2023-06-29 17:54:12 +08:00
Concedo	dff5575647	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # Makefile # ggml-opencl.cpp # llama.cpp	2023-06-29 17:35:28 +08:00
Concedo	4b3a1282f0	Add flag for lowvram directly into cublas launch param Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/lowvram' into concedo_experimental # Conflicts: # koboldcpp.py	2023-06-29 17:07:31 +08:00
Concedo	746f5fa9e9	update lite	2023-06-29 16:44:39 +08:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 )	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 )	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00

1 2 3 4 5 ...

1409 commits