llama.cpp

Author	SHA1	Message	Date
Matteo Boschini	1873ff586b	metal : add gqa8 kernel to allow llama-2-70B on metal (#2459 ) * Added gqa8 kernel to allow llama-2-70B on metal * Update ggml-metal.m Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> * Extend kernel_mul_mat_f16_f32 to handle gqa broadcast * Added ne03==ne13 assertion --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-08-01 10:43:12 +03:00
Johannes Gäßler	49e7cb5bb1	CUDA: fixed LLAMA_FAST compilation option (#2473 )	2023-07-31 21:02:19 +02:00
Johannes Gäßler	b772bba42e	CUDA: fixed cmake F16 option (#2471 )	2023-07-31 19:52:22 +02:00
Johannes Gäßler	0728c5a8b9	CUDA: mmq CLI option, fixed mmq build issues (#2453 )	2023-07-31 15:44:35 +02:00
Johannes Gäßler	1215ed7d5c	CUDA: Implemented row flattening for non-glm RoPE (#2468 )	2023-07-31 14:32:30 +02:00
Johannes Gäßler	2dbf518911	CUDA: fewer memory bank conflicts for mul_mat_q (#2458 )	2023-07-31 13:18:51 +02:00
slaren	9d2382b3e4	Fix Metal backend broken from the allocator changes (#2455 ) * fix Metal backend broken from the allocator changes	2023-07-31 11:02:53 +02:00
slaren	a113689571	ggml : add graph tensor allocator (#2411 ) * ggml : add graph tensor allocator * ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset * ggml : refactor ggml_view_Nd into ggml_view_tensor_offset	2023-07-30 15:58:01 +02:00
Johannes Gäßler	11f3ca06b8	CUDA: Quantized matrix matrix multiplication (#2160 ) * mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds	2023-07-29 23:04:44 +02:00
Johannes Gäßler	9baf9ef304	CUDA: faster multi GPU synchronization (#2448 )	2023-07-29 23:04:10 +02:00
xaedes	22cb368dd9	remove trailing whitespace	2023-07-28 23:55:30 +02:00
xaedes	c1a5e116a4	llama training : fix ggml_rms_norm_back calls to pass configurable eps	2023-07-28 23:13:20 +02:00
xaedes	ecdc16163e	ggml : update ggml_rms_norm_back with configurable eps	2023-07-28 23:13:20 +02:00
xaedes	87035b96f7	remove out-commented vectorized code of opt_adam the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead	2023-07-28 23:13:20 +02:00
xaedes	0f6a8ab519	tighten abs error bounds for sqrt in test-grad0	2023-07-28 23:13:20 +02:00
xaedes	47055c929f	tighten abs error bounds for flash_attn in test-grad0	2023-07-28 23:13:20 +02:00
xaedes	dbbc263313	add conditional compilation of using F16 exp in flash attention uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention	2023-07-28 23:13:20 +02:00
xaedes	1065c3b7b9	tighten abs error bounds for cross_entropy_loss in test-grad0	2023-07-28 23:13:20 +02:00
xaedes	24a4b099f3	change sampling parameters for prediction after training to defaults of common.h and clarify what is context for prediction and what are generated tokens	2023-07-28 23:13:19 +02:00
xaedes	17a0898d50	fix increase of model.train_samples and model.train_tokens now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations	2023-07-28 23:13:19 +02:00
xaedes	58024d3e5f	rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup	2023-07-28 23:13:19 +02:00
xaedes	e6ff0728e0	add minimum number of tensor dimensions to apply weight decay (default 2) this allows to not apply weight decay to bias parameters	2023-07-28 23:13:19 +02:00
xaedes	d7aa4d9576	use optimization callback in training allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration	2023-07-28 23:13:19 +02:00
xaedes	bfc3119139	add optimization callback to ggml_opt_resume_g this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)). can be used for dynamic learning schedule and setting input data for batches before each iteration	2023-07-28 23:13:18 +02:00
xaedes	e843d6e71c	measure and print total training time	2023-07-28 23:13:18 +02:00
xaedes	ff759d957c	remove unused function argument from get_example_targets_batch	2023-07-28 23:13:18 +02:00
xaedes	ce937bc431	replace memcpy with reshape operation so that the graph is not cut at the input this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it	2023-07-28 23:13:18 +02:00
xaedes	c6a18e15c1	add more training parameters: --enable-restart N Only for Adam optimizer. Enable restarts of cos-decay --disable-restart N Only for Adam optimizer. Disable restarts of cos-decay --opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero. --opt-delta N Maximum delta for delta convergence test. Disabled when <= zero. --opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero. --adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero. --adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha	2023-07-28 23:13:18 +02:00
xaedes	d0fbb7d328	llama : fix rope usage in train-text-from-scratch after ChatGLM change	2023-07-28 23:13:17 +02:00
xaedes	fc379a2de3	disable gradient checkpointing debug output	2023-07-28 23:13:17 +02:00
xaedes	3744a9be74	improve gradient checkpointing sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal. since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different: ``` given: n, u, v objective: minimize(au+bv) where ab=n, a>0, b>0 b=n/a minimize(au+vn/a) diff(au+vn/a, a) = u - (vn/a)/a diff(au+vn/a, a) == 0 u - (vn/a)/a == 0 u == vn/(aa) uaa = vn aa = vn/u a = sqrt(n*v/u) ``` this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.	2023-07-28 23:13:17 +02:00
xaedes	51dc77092f	change cross_entropy_loss to output average over all rows this helps keeping the loss and gradients in a sane range	2023-07-28 23:13:17 +02:00
xaedes	87febeec91	improve finite differences of test-grad0 by using double instead of float	2023-07-28 23:13:17 +02:00
xaedes	864e7e3aa1	fix test-grad0 for soft_max dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)	2023-07-28 23:13:17 +02:00
xaedes	2d1e6e0675	fix test-grad0 for cross_entropy_loss the second argument to cross_entropy_loss must sum up to 1 for each row	2023-07-28 23:13:17 +02:00
xaedes	2c6985f79e	bug fixes for cross entropy loss ggml_cross_entropy_loss: sums where not correctly added in workload of each thread ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16 cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup. so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.	2023-07-28 23:13:16 +02:00
xaedes	97964a4cc9	change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW btw: the default weight decay parameter for torch.optim.AdamW is 0.01	2023-07-28 23:13:16 +02:00
xaedes	f175ead6ef	change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT	2023-07-28 23:13:16 +02:00
xaedes	a80f184e6d	change AdamW decay parameter to work like the torch AdamW decay parameter It is now relative to Adam learning rate `alpha*sched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]	2023-07-28 23:13:16 +02:00
xaedes	ed4319e1a7	add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);	2023-07-28 23:13:16 +02:00
xaedes	e05e4414ac	remove unused compute buffer 3	2023-07-28 23:12:00 +02:00
xaedes	6e3f95bf06	implement gradient checkpointing for training reduces memory overhead from O(n_layer) to O(sqrt(n_layer)) as explained in readme of https://github.com/cybertronai/gradient-checkpointing	2023-07-28 23:11:59 +02:00
xaedes	d7003a98cc	Fix reset of unused g->nodes and g->grads to NULL	2023-07-28 21:30:22 +02:00
xaedes	d395b19c8c	add gradient clipping to AdamW	2023-07-28 21:18:41 +02:00
xaedes	d39c8e6863	remove unnecessary Adam(W) optimizer tensors. reduces optimizer memory overhead from 7modelsize to 2modelsize. additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t. bumps training checkpoint file version, but old checkpoints can still be read. new version with less tensors is saved.	2023-07-28 21:17:57 +02:00
xaedes	5d124d0cb4	fix track_max_mem in forward_batch_wo_cache_flash_attn_train	2023-07-28 21:17:56 +02:00
klosax	8a88e5855c	perplexity : add Hellaswag calculation (#2389 ) * common.h : add hellaswag / remove perplexity-lines * common.cpp : add hellaswag / remove perplexity-lines * perplexity.cpp : add hellswag scores / remove perplexity-lines * perplexity.cpp : clean up * common.h : change default param value * common.cpp : Change default param * perplexity.cpp : alter wording * common.h : alter wording * common.cpp : alter wording	2023-07-28 21:25:36 +03:00
Lee	a9559bf77b	ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405 )	2023-07-28 21:17:45 +03:00
eric8607242	ee1b497c98	llama : support more diverse tokenizers? (#2420 ) * supporting more diverse tokenizers * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-28 21:10:05 +03:00
Georgi Gerganov	d73b8d48b4	examples : fix whitespace	2023-07-28 21:05:08 +03:00

... 3 4 5 6 7 ...

1172 commits