llama.cpp

Author	SHA1	Message	Date
Rand Xie	cb44dbc7de	llama : catch llama_load_session_file_internal exceptions (#2022 ) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions	2023-07-01 19:02:58 +03:00
Georgi Gerganov	79f634a19d	embd-input : fix returning ptr to temporary	2023-07-01 18:46:00 +03:00
Georgi Gerganov	04606a1599	train : fix compile warning	2023-07-01 18:45:44 +03:00
Qingyou Meng	b1ca8f36a9	ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995 ) Will not be scheduled unless explicitly enabled.	2023-07-01 18:42:43 +03:00
Concedo	632bf27b65	more granular context size selections	2023-07-01 11:02:44 +08:00
Concedo	eda663f15f	update lite and up ver	2023-07-01 00:15:26 +08:00
Concedo	0cb8a9eab3	Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental # Conflicts: # llama.cpp	2023-06-30 23:29:38 +08:00
Concedo	67cb0b2760	Merge branch 'master' into concedo_experimental	2023-06-30 23:25:40 +08:00
Concedo	d16926dff4	Merge branch 'concedo' into concedo_experimental	2023-06-30 23:06:21 +08:00
Concedo	baf6325907	added flag for building kquants in tools	2023-06-30 23:06:11 +08:00
YellowRoseCx	30ea774e2c	Update CMakeLists.txt with dmmv_x/y/f16 (#277 )	2023-06-30 22:52:32 +08:00
bebopkim	1129d66ca9	To fix build problem on Apple Metal LLAMA_METAL=1 (#282 )	2023-06-30 22:50:38 +08:00
JohannesGaessler	600bf6d929	Test-based VRAM scratch size + context adjustment	2023-06-30 11:35:30 +02:00
Concedo	86469d15c4	fix for yr-rocm, large gpu scratch	2023-06-30 12:40:08 +08:00
Concedo	1347d3acc0	another missing flag?	2023-06-30 00:02:18 +08:00
Concedo	396f857021	make platform appropriate library	2023-06-29 23:50:48 +08:00
Concedo	f50c73a0b2	readme	2023-06-29 23:45:57 +08:00
Concedo	ad945e2c41	make instructions clearer	2023-06-29 22:13:39 +08:00
Concedo	64aba0a151	update readme	2023-06-29 21:42:04 +08:00
Howard Su	b8c8dda75f	Use unsigned for random seed (#2006 ) * Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-29 06:15:15 -07:00
Concedo	f09debb1ec	remove debug	2023-06-29 20:54:56 +08:00
Concedo	966d736582	revert cublasLt removal	2023-06-29 20:51:02 +08:00
Concedo	10a2bdfaf1	Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-06-29 20:35:17 +08:00
Concedo	c7c6e522e7	bigger scratch buffers for bigger context	2023-06-29 19:43:23 +08:00
Concedo	86b061b98c	wip on unified cublas integration, add all the small libraries but exclude the large ones	2023-06-29 18:35:31 +08:00
Concedo	c2f1ed6556	fix compile errors	2023-06-29 17:54:12 +08:00
Concedo	dff5575647	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # Makefile # ggml-opencl.cpp # llama.cpp	2023-06-29 17:35:28 +08:00
Concedo	4b3a1282f0	Add flag for lowvram directly into cublas launch param Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/lowvram' into concedo_experimental # Conflicts: # koboldcpp.py	2023-06-29 17:07:31 +08:00
Concedo	746f5fa9e9	update lite	2023-06-29 16:44:39 +08:00
LostRuins	96a712ca1b	Porting the improved K-Quant CUDA kernels to OpenCL (#1966 ) * Added broken new q4k quant * xx + ib0 * Fix q2_k fast kernel * Use preprocessor for QK_K * Add q6_k fast matmul kernel * ported q3k speedup successfully * ported q2k and q5k speedups * remove old dot kernels and template * fixed global const struct types * fixing address spaces * fixed string too long CI issue --------- Co-authored-by: 0cc4m <picard12@live.de>	2023-06-29 05:56:43 +02:00
m3ndax	d3494bb86b	llama : replacing auto &kv with const auto &kv (#2041 ) * Replacing auto &kv with const auto &kv * Create codacy.yml * Delete codacy.yml	2023-06-28 21:39:08 +03:00
Salvador E. Tropea	5b351e94d0	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028 ) - Not used	2023-06-28 20:27:31 +03:00
Salvador E. Tropea	6432aabb6d	cuda : fix missing const qualifier in casts (#2027 )	2023-06-28 20:26:26 +03:00
Howard Su	b922bc351b	llama : remove shards weight file support (#2000 ) * Remove multiple shards * Remove multiple file loaders * Remove llama_load_tensor_shard class * Simplify load logic * Remove dead code guess_n_parts function * Remove vocab_only from constructor of llama_model_loader * Remove alignment_prevents_mmap which is not more needed. * Remove useless check	2023-06-28 20:13:02 +03:00
Johannes Gäßler	7f9753fa12	CUDA GPU acceleration for LoRAs + f16 models (#1970 )	2023-06-28 18:35:54 +02:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Concedo	b084f4dc46	option for cublas	2023-06-28 21:16:40 +08:00
Concedo	b4698abafc	Wip, CUDA porting malloc improvements, gpu accel for non-llama, backport old quants	2023-06-28 18:20:46 +08:00
Erik Scholz	9d23589d63	fix pthreads setaffinity usage on android (#2020 )	2023-06-27 19:06:33 +02:00
Iwan Kawrakow	333c40b94c	Fixed typo	2023-06-27 19:04:00 +03:00
Iwan Kawrakow	cda30038e4	Modified RoPE with linear scaling When the context size is greater than the maximum context size during training, scale the position given to RoPE with trainign context / n_ctx.	2023-06-27 15:00:22 +03:00
Concedo	9527a783ea	fix rope inplace	2023-06-27 19:44:33 +08:00
Concedo	282376c85a	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # tests/test-quantize-perf.cpp	2023-06-27 19:15:27 +08:00
Howard Su	0be54f75a6	baby-llama : fix build after ggml_rope change (#2016 )	2023-06-27 08:07:13 +03:00
YellowRoseCx	8afa800fb6	Expose low_vram for CUDA Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA	2023-06-26 16:47:22 -05:00
Georgi Gerganov	181e8d9755	llama : fix rope usage after ChatGLM change	2023-06-27 00:37:33 +03:00
Georgi Gerganov	d9779021bd	ggml : add support for ChatGLM RoPE	2023-06-27 00:06:51 +03:00
Roman Parykin	d38e451578	readme : add Scala 3 bindings repo (#2010 )	2023-06-26 22:47:59 +03:00
David Yang	eaa6ca5a61	ggml : increase max tensor name + clean up compiler warnings in train-text (#1988 ) * Clean up compiler warnings in train-text Some brackets to disambiguate order of operations * Increase GGML_MAX_NAME Avoiding strncpy danger in train-text-from-scratch and reducing potential future name length issues	2023-06-26 22:45:32 +03:00
Gustavo Rocha Dias	aa777abbb7	readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007 ) * docs - Alternative way to build at Android, with CLBlast. * doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux. * doc- fix typo	2023-06-26 22:34:45 +03:00

... 4 5 6 7 8 ...

1645 commits