llama.cpp

Author	SHA1	Message	Date
xaedes	410a47a79e	minor code format improvement	2023-05-01 14:43:46 +02:00
xaedes	124fdca973	successfully test view backward	2023-05-01 14:43:46 +02:00
xaedes	cecd6c7665	bug fix for add_at forward required for view backward pass src0 values must be copied to dst, because during addition we don't touch all dst elements in contrast to the normal add function.	2023-05-01 14:43:46 +02:00
xaedes	83fa6b3bcb	fix ggml_compute_forward_dup_same_cont for when nelements < nthreads when more threads are used than elements exist ie1 was less than ie0, resulting in invalid negative byte count argument in memcpy	2023-05-01 14:43:46 +02:00
xaedes	c1a8893de3	de-duplicate ggml_forward_dup code taking care of contiguous tensors of same type. with this we can duplicate tensor of any typ as long as they are contiguous.	2023-05-01 02:42:27 +02:00
xaedes	38675e537c	add shape annotations for llama	2023-05-01 02:42:13 +02:00
xaedes	93106504fd	align shape annotations	2023-05-01 02:42:05 +02:00
xaedes	fea42be47a	successfully test soft_max backward	2023-05-01 02:41:58 +02:00
xaedes	1a80e9a0fa	correctly implement softmax backward pass using new operation ggml_diag ggml_diag constructs diagonal matrices with entries. ggml_diag(shape[a,1,c,d]) -> shape[a,a,c,d]	2023-05-01 02:41:30 +02:00
xaedes	54ab300cc4	add test-opt.c this uses ggml_opt to train a,b for minimal e=sum(sqr(c - a*b)) for random initial a,b,c	2023-05-01 02:41:30 +02:00
xaedes	ecf949b10f	successfully test reshape backward	2023-05-01 02:41:30 +02:00
xaedes	c483a7dac5	bug fix for reshape backward pass	2023-05-01 02:41:30 +02:00
xaedes	b2bd8222da	successfully test cpy backward	2023-05-01 02:41:30 +02:00
xaedes	0ea8201c86	bug fix for cpy backward pass	2023-05-01 02:41:30 +02:00
xaedes	7571147242	successfully test rope backward	2023-05-01 02:41:30 +02:00
xaedes	b583136cfa	improve performance of sqr backward pass use scale(x,y) instead of mul(x,repeat(y,x))	2023-05-01 02:41:30 +02:00
xaedes	bfe507213c	improve performance of sum backward pass use add1(x,y) instead of add(x,repeat(y,x))	2023-05-01 02:41:30 +02:00
xaedes	0197bcb0ff	successfully test scale backward	2023-05-01 02:41:29 +02:00
xaedes	a367eb9eda	bug fix for scale backward pass use sum instead of mean for gradient of scalar scale parameter	2023-05-01 02:41:29 +02:00
xaedes	671e5922e2	successfully test silu backward	2023-05-01 02:41:29 +02:00
xaedes	6fb08b4554	bug fixes for silu_back	2023-05-01 02:41:29 +02:00
xaedes	9d6fc28f18	disable graph dot export as it floods console	2023-05-01 02:41:29 +02:00
xaedes	9345f4c3a5	test both gradients of mul_mat	2023-05-01 02:41:29 +02:00
xaedes	20e3c1d2b4	use GGML_PRINT_DEBUG for debug messages which will otherwise flood the console	2023-05-01 02:41:29 +02:00
xaedes	0da26753fd	add test-grad0.c	2023-05-01 02:41:29 +02:00
xaedes	4e1f81d32f	implement backward pass for ggml_get_rows and for new operation ggml_get_rows_back	2023-05-01 02:41:29 +02:00
xaedes	488decfdc5	implement backward pass of ggml_rope and ggml_rope_back	2023-05-01 02:41:28 +02:00
xaedes	36d8a051d4	remove already resolved TODO	2023-05-01 02:41:28 +02:00
xaedes	b908007471	norm & rms_norm can not be threaded: after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees.	2023-05-01 02:41:28 +02:00
xaedes	b164343529	implement 5 of 6 missing backward pass operations used by llama - GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_SILU - GGML_OP_SOFT_MAX add necessary ggml operations GGML_OP_ADD1, GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK, GGML_OP_DIAG_MASK_ZERO, and GGML_OP_ROPE_BACK GGML_OP_ADD1 is necessary to add a scalar value in the backward pass of GGML_OP_SOFT_MAX GGML_OP_ADD1 could also be replaced by using GGML_OP_ADD and GGML_OP_REPEAT, but the performance would be worse. additionally GGML_OP_REPEAT will return unexpected value when the the input to GGML_OP_SOFT_MAX contains only a single scalar. in this case GGML_OP_REPEAT will not return the value that should be repeated (src1) but the value which shape the result should take (src0). So in this case it can not replace GGML_OP_ADD1. GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK and GGML_OP_ROPE_BACK are necessary for backward pass of GGML_OP_SILU, GGML_OP_RMS_NORM and GGML_OP_ROPE. The backward pass for these functions cannot be easily composed of existing operations. Since the backward pass builds a computation graph we need operations forward pass implementations of the the required backward passes. Sounds a bit confusing at first, I know... GGML_OP_DIAG_MASK_ZERO is necessary for backward pass of GGML_OP_DIAG_MASK_INF. Some operations where previously inplace-only. for backward pass there needs to be non-inplace variants. staying consistent with other operations that have non-inplace and inplace variants, the operations are changed to non-inplace and functions with "_inplace" are added which are inplace. in llama we need to call the inplace variants so that it is implemented as before. for llama backward pass we need to use the non-inplace variants. still not completely implemented backward passes for llama: - GGML_OP_ROPE: needs forward pass for GGML_OP_ROPE_BACK - GGML_OP_GET_ROWS: only necessary for tokenizer	2023-05-01 02:41:28 +02:00
xaedes	73ac18d856	implement 8 of 14 missing backward pass operations used by llama - GGML_OP_ADD_AT - GGML_OP_CPY - GGML_OP_MUL_MAT (src0.grad) - GGML_OP_PERMUTE - GGML_OP_RESHAPE - GGML_OP_SCALE - GGML_OP_TRANSPOSE - GGML_OP_VIEW implement additional ggml operation GGML_OP_ADD_AT, which is necessary for backward pass of GGML_OP_VIEW. this operation adds src1 to src0 with data offset, i.e. to view(src0, ..., offset). the values are return in a tensor size of src0. values outside of [data+offset:data+offset+nbytes(src1)] are just the original values from src0. still missing backward passes for llama: - GGML_OP_DIAG_MASK_INF - GGML_OP_GET_ROWS - GGML_OP_RMS_NORM - GGML_OP_ROPE - GGML_OP_SILU - GGML_OP_SOFT_MAX	2023-05-01 02:41:28 +02:00
Georgi Gerganov	7ff0dcd320	ggml : fix UB (int << 31)	2023-04-30 22:28:51 +03:00
Pavol Rusnak	6f79699286	build: add armv{6,7,8} support to cmake (#1251 ) - flags copied from Makefile - updated comments in both CMakeLists.txt and Makefile to match reality	2023-04-30 20:48:38 +02:00
jon-chuang	a5d30b1f53	common : better default number of threads (#934 ) * commit * fix * try-catch * apply code review * improve * improve * add macos headers * done * remove color * fix windows * minor * fix * Apply suggestions from code review Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> * remove * minor * minor --------- Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com> Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>	2023-04-30 21:41:35 +03:00
0cc4m	76a884920a	ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225 ) * Implement q5_0, q5_1 and q8_0 * Work around q5_0 OpenCL issue * Fix q8_0 dequant kernel * Move cl kernels into ggml-opencl.c * Use two memcpy calls for q5_0 buffer transfer	2023-04-30 21:34:52 +03:00
Georgi Gerganov	6bc4400e67	ggml : add Q5 WASM SIMD + GGML_FTYPE	2023-04-30 19:07:43 +03:00
Stephan Walter	f0d70f147d	Various fixes to mat_mul benchmark (#1253 )	2023-04-30 12:32:37 +00:00
Georgi Gerganov	3e5aa8a1c4	ggml : fix labels for GGML_OP_ALIBI	2023-04-30 10:25:46 +03:00
Georgi Gerganov	c3ca7a5f05	ggml : fix 32-bit ARM NEON	2023-04-29 21:34:23 +03:00
Georgi Gerganov	e8c051611a	ggml : use vzip instead of vuzp for consistency	2023-04-29 21:12:56 +03:00
Georgi Gerganov	0b5a935099	ggml : fix visibility and unused warnings	2023-04-29 19:28:36 +03:00
Georgi Gerganov	ec728e44d7	ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229 )	2023-04-29 18:43:42 +03:00
Georgi Gerganov	214b6a3570	ggml : adjust mul_mat_f16 work memory (#1226 ) * llama : minor - remove explicity int64_t cast * ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS * ggml : add asserts to guard for incorrect wsize	2023-04-29 18:43:28 +03:00
Georgi Gerganov	305eb5afd5	build : fix reference to old llama_util.h	2023-04-29 13:53:12 +03:00
Georgi Gerganov	84ca9c2ecf	examples : fix save-load-state + rename llama-util.h	2023-04-29 13:48:11 +03:00
Georgi Gerganov	334637e43e	common : change default parameters to pre-#1126 (#1223 )	2023-04-29 09:51:06 +03:00
Ivan Stepanov	dd7eff57d8	llama : new sampling algorithms (#1126 ) * Sample interface, new samplers. New samplers: - locally typical sampling - tail free sampling - frequency and presence penalty - mirostat Ignore EOS fix: -inf should be used. * mirostat * Added --logit-bias and --no-penalize-nl, removed std::span * Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and k) Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and k) * Save and load example adjust * Tests * Windows build fix * Windows test fix	2023-04-29 08:34:41 +03:00
slaren	7fc50c051a	cuBLAS: use host pinned memory and dequantize while copying (#1207 ) * cuBLAS: dequantize simultaneously while copying memory * cuBLAS: use host pinned memory * cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory * cuBLAS: also pin kv cache * fix rebase	2023-04-29 02:04:18 +02:00
Henri Vasserman	b1ee8f59b4	cuBLAS: non-contiguous tensor support (#1215 ) * Cuda: non-contiguous tensor support * remove extra stuff * rename * fix error * more fixes, now OpenBLAS and CLBlast build too * now then?	2023-04-29 01:31:56 +02:00
Stephan Walter	36d19a603b	Remove Q4_3 which is no better than Q5 (#1218 )	2023-04-28 23:10:43 +00:00

1 2 3 4 5 ...

504 commits