llama.cpp

Author	SHA1	Message	Date
xaedes	05cb629c8e	replace inefficient repeat backward pass with dedicated repeat_back operation	2023-05-28 18:00:17 +02:00
xaedes	c47df09842	simplify backward pass for SQRT	2023-05-28 17:32:01 +02:00
xaedes	6d40cc3a44	remove trailing whitespace	2023-05-22 20:56:35 +02:00
xaedes	d3acbf644e	simplify code	2023-05-22 20:53:57 +02:00
xaedes	0651679302	save checkpoint only when it was trained	2023-05-22 16:56:28 +02:00
xaedes	cc440bd438	fix bug in get_samples which corrupted training targets	2023-05-22 16:55:52 +02:00
xaedes	b763d6f1f2	remove unused functions	2023-05-22 16:54:21 +02:00
xaedes	42d9b4cfc2	store optimizer state in training checkpoint and add learning schedule persistent optimizer state allows to resume training without resetting the optimizer learning schedule consists of linear warmup ramp followed by cosine decay with restarts	2023-05-21 21:36:04 +02:00
xaedes	37c69435f0	print suppressed newline tokens as string "\n" printing too much actual newlines is suppressed to avoid flooding the console.	2023-05-21 21:17:46 +02:00
xaedes	93eb8f7752	add forward function without using cache, for more performant training during training on whole samples no cache is required. removing the cache and simplifying the remaining code results in performance and memory usage improvement.	2023-05-21 21:14:49 +02:00
xaedes	2afd218479	fix bug in llama_sample_token_mirostat_v2 when all candidates are filtered out through mu threshold, the following soft_max operation will fail. so keep at least one.	2023-05-21 21:12:10 +02:00
xaedes	ec1783c3e0	add ggml_opt_context, so that we can properly resume training otherwise the optimizer states, tracking statistics about the error function and its derivates, will reset to zero each time ggml_opt is called, hindering convergence on resumed training. now the optimizer context and all its memory is stored in a separate struct.	2023-05-21 21:10:16 +02:00
xaedes	1eee9255e7	add missing default parameters for adam optimizer	2023-05-21 15:03:51 +02:00
xaedes	57c2f4f909	fix random weight initialization scale	2023-05-21 12:18:47 +02:00
xaedes	96514971dd	use inplace operations in cross_entropy_loss	2023-05-21 12:17:57 +02:00
xaedes	ef17d99f65	implement AdamW in ggml_opt_adam by adding weight decay parameter (default 0.001f) also add a schedule parameter (default 1.0f) that can be used to scale alpha and decay according to learning schedule. setting the decay parameter to zero disables AdamW resulting in normal Adam optimizer. since the difference between Adam and AdamW is minimal it is not implemented as another optimizer, but integrated into the existing Adam optimizer.	2023-05-20 14:54:57 +02:00
xaedes	f4e9ce7998	enable gradient propagation for inplace add1 and scale operations those functions backward passes don't need the original src0, so they also work when forward is inplace	2023-05-20 14:49:30 +02:00
xaedes	a6aafdd719	add ggml_add1_inplace to header	2023-05-20 14:47:56 +02:00
xaedes	08a330a136	add cmake target for baby-llama-text	2023-05-19 18:41:26 +02:00
xaedes	332003584e	sample with non-greedy sampling parameters at the end of training	2023-05-19 18:41:06 +02:00
xaedes	e19ead6e3f	print used memory before and after optimization	2023-05-19 18:40:20 +02:00
xaedes	da86a1d736	fix cross entropy loss - add target probabilities for each sample which is then used in cross entropy loss	2023-05-19 18:39:38 +02:00
xaedes	09b304d015	remove duplicate include	2023-05-19 18:36:05 +02:00
xaedes	37f5b76df1	ggml fixes to support backward pass on inplace operations	2023-05-19 18:35:40 +02:00
xaedes	44d83558bc	use different arguments for input and output checkpoint	2023-05-19 18:34:18 +02:00
xaedes	d8b0666429	initialize rng with srand	2023-05-19 18:29:47 +02:00
xaedes	25fe1c3815	use inplace functions where possible	2023-05-19 14:53:21 +02:00
xaedes	b241b9cb6c	save train trained model to checkpoint and load model to be trained from checkpoint	2023-05-17 13:49:32 +02:00
xaedes	d328472f16	fix get_samples call, add model tensor names, increase model size, start training samples after newline	2023-05-17 12:52:20 +02:00
xaedes	e063135d0b	add llama sampler, shuffle samples and constrain sampling to tokens occurring in train data	2023-05-15 21:12:28 +02:00
xaedes	ec881156f6	improve ggml_out_prod performance - change iteration order (>15s -> 10s runtime) - parallelize over one more dimension: over dst matrix rows (10s -> <5s runtime)	2023-05-15 14:42:24 +02:00
xaedes	19fb91899b	better weight initialization improves training convergence at start	2023-05-15 14:19:38 +02:00
xaedes	f3cf7df21f	better weight initialization improves training convergence at start	2023-05-15 14:18:57 +02:00
xaedes	efa4bb78ea	add ggml_out_prod and use it for mul_mat backward pass for improved performance performance stats report improvement from 37 seconds to 16 seconds runtime during my training tests	2023-05-15 14:17:42 +02:00
xaedes	a703d7a85f	activate threading in baby-llama-text	2023-05-14 21:00:55 +02:00
xaedes	d9b5268728	avoid printing too much newlines in baby-llama-text	2023-05-14 20:57:47 +02:00
xaedes	c054079fb8	improve performance of mul_mat backward pass avoid transpose by using mul_mat with swapped arguments	2023-05-14 20:56:50 +02:00
xaedes	1f2b76de01	fix bug in ggml_compute_forward_soft_max_back_f32 on DEBUG build	2023-05-14 20:55:24 +02:00
xaedes	69108167cd	fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 memcpy needs to be synchronized across threads to avoid race conditions. => do it in INIT phase	2023-05-14 20:54:57 +02:00
xaedes	4339f8cf28	improve softmax backward pass go from quadratic runtime to linear runtime by simplifying the formulas	2023-05-14 17:55:02 +02:00
xaedes	ec1aea09ec	implement ggml_soft_max_back for more performant backward pass of soft_max avoids creating big intermediate matrices of size n_embd x n_embd for llama layers and n_vocab x n_vocab for cross entropy loss	2023-05-14 17:16:26 +02:00
xaedes	f89c278d83	fix race condition bug in ggml_compute_forward_diag_mask_f32	2023-05-14 17:00:19 +02:00
xaedes	6e968d22b0	add text generating baby-llama from scratch example	2023-05-14 16:07:08 +02:00
xaedes	6e88dc93bd	update python bindings	2023-05-13 19:05:24 +02:00
xaedes	ed6b64fb98	add python bindings for functions to get and set the whole llama state (rng, logits, embedding and kv_cache)	2023-05-13 16:32:08 +02:00
xaedes	5f6b715071	fix decoding error. adds errors=ignore parameter	2023-05-13 16:31:13 +02:00
xaedes	bc9e84daca	add python wrapper https://gist.github.com/abetlen/2b90e5f153f6efd00931d098de5c73ce	2023-05-13 16:31:13 +02:00
Georgi Gerganov	5a5aeb1e91	llama : fix unused warning	2023-05-13 16:55:14 +03:00
Georgi Gerganov	66841fdb0e	ggml : multi-thread mul and diag_mask ops (#1428 )	2023-05-13 16:48:03 +03:00
Johannes Gäßler	905d87b70a	ggml : GPU-accelerated token generation (#1412 ) * CUDA kernel for q4_0 dequant. + mat. vec. mult. * Added q4_1 via template * Added missing __syncthreads(); * --gpu_layers -> --gpu-layers * Shorter dequantize_mul_mat_vec line * q5_0 dequantize_mul_mat kernel * More readable dequantize_mul_mat_vec logic * dequantize_mul_mat_vec kernels for q5_1, q8_0, f16 * llama : offload "output" tensor to GPU too + coding style fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-13 16:38:36 +03:00

1 2 3 4 5 ...

591 commits