llama.cpp

Author	SHA1	Message	Date
xaedes	24a4b099f3	change sampling parameters for prediction after training to defaults of common.h and clarify what is context for prediction and what are generated tokens	2023-07-28 23:13:19 +02:00
xaedes	17a0898d50	fix increase of model.train_samples and model.train_tokens now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations	2023-07-28 23:13:19 +02:00
xaedes	58024d3e5f	rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup	2023-07-28 23:13:19 +02:00
xaedes	e6ff0728e0	add minimum number of tensor dimensions to apply weight decay (default 2) this allows to not apply weight decay to bias parameters	2023-07-28 23:13:19 +02:00
xaedes	d7aa4d9576	use optimization callback in training allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration	2023-07-28 23:13:19 +02:00
xaedes	bfc3119139	add optimization callback to ggml_opt_resume_g this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)). can be used for dynamic learning schedule and setting input data for batches before each iteration	2023-07-28 23:13:18 +02:00
xaedes	e843d6e71c	measure and print total training time	2023-07-28 23:13:18 +02:00
xaedes	ff759d957c	remove unused function argument from get_example_targets_batch	2023-07-28 23:13:18 +02:00
xaedes	ce937bc431	replace memcpy with reshape operation so that the graph is not cut at the input this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it	2023-07-28 23:13:18 +02:00
xaedes	c6a18e15c1	add more training parameters: --enable-restart N Only for Adam optimizer. Enable restarts of cos-decay --disable-restart N Only for Adam optimizer. Disable restarts of cos-decay --opt-past N Number of optimization iterations to track for delta convergence test. Disabled when zero. --opt-delta N Maximum delta for delta convergence test. Disabled when <= zero. --opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero. --adam-epsf N AdamW epsilon for convergence test. Disabled when <= zero. --adam-min-alpha N Adam minimum learning rate alpha, usually 0.1 * alpha	2023-07-28 23:13:18 +02:00
xaedes	d0fbb7d328	llama : fix rope usage in train-text-from-scratch after ChatGLM change	2023-07-28 23:13:17 +02:00
xaedes	fc379a2de3	disable gradient checkpointing debug output	2023-07-28 23:13:17 +02:00
xaedes	3744a9be74	improve gradient checkpointing sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal. since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different: ``` given: n, u, v objective: minimize(au+bv) where ab=n, a>0, b>0 b=n/a minimize(au+vn/a) diff(au+vn/a, a) = u - (vn/a)/a diff(au+vn/a, a) == 0 u - (vn/a)/a == 0 u == vn/(aa) uaa = vn aa = vn/u a = sqrt(n*v/u) ``` this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.	2023-07-28 23:13:17 +02:00
xaedes	51dc77092f	change cross_entropy_loss to output average over all rows this helps keeping the loss and gradients in a sane range	2023-07-28 23:13:17 +02:00
xaedes	87febeec91	improve finite differences of test-grad0 by using double instead of float	2023-07-28 23:13:17 +02:00
xaedes	864e7e3aa1	fix test-grad0 for soft_max dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)	2023-07-28 23:13:17 +02:00
xaedes	2d1e6e0675	fix test-grad0 for cross_entropy_loss the second argument to cross_entropy_loss must sum up to 1 for each row	2023-07-28 23:13:17 +02:00
xaedes	2c6985f79e	bug fixes for cross entropy loss ggml_cross_entropy_loss: sums where not correctly added in workload of each thread ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16 cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup. so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.	2023-07-28 23:13:16 +02:00
xaedes	97964a4cc9	change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW btw: the default weight decay parameter for torch.optim.AdamW is 0.01	2023-07-28 23:13:16 +02:00
xaedes	f175ead6ef	change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT	2023-07-28 23:13:16 +02:00
xaedes	a80f184e6d	change AdamW decay parameter to work like the torch AdamW decay parameter It is now relative to Adam learning rate `alpha*sched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]	2023-07-28 23:13:16 +02:00
xaedes	ed4319e1a7	add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);	2023-07-28 23:13:16 +02:00
xaedes	e05e4414ac	remove unused compute buffer 3	2023-07-28 23:12:00 +02:00
xaedes	6e3f95bf06	implement gradient checkpointing for training reduces memory overhead from O(n_layer) to O(sqrt(n_layer)) as explained in readme of https://github.com/cybertronai/gradient-checkpointing	2023-07-28 23:11:59 +02:00
xaedes	d7003a98cc	Fix reset of unused g->nodes and g->grads to NULL	2023-07-28 21:30:22 +02:00
xaedes	d395b19c8c	add gradient clipping to AdamW	2023-07-28 21:18:41 +02:00
xaedes	d39c8e6863	remove unnecessary Adam(W) optimizer tensors. reduces optimizer memory overhead from 7modelsize to 2modelsize. additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t. bumps training checkpoint file version, but old checkpoints can still be read. new version with less tensors is saved.	2023-07-28 21:17:57 +02:00
xaedes	5d124d0cb4	fix track_max_mem in forward_batch_wo_cache_flash_attn_train	2023-07-28 21:17:56 +02:00
klosax	8a88e5855c	perplexity : add Hellaswag calculation (#2389 ) * common.h : add hellaswag / remove perplexity-lines * common.cpp : add hellaswag / remove perplexity-lines * perplexity.cpp : add hellswag scores / remove perplexity-lines * perplexity.cpp : clean up * common.h : change default param value * common.cpp : Change default param * perplexity.cpp : alter wording * common.h : alter wording * common.cpp : alter wording	2023-07-28 21:25:36 +03:00
Lee	a9559bf77b	ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405 )	2023-07-28 21:17:45 +03:00
eric8607242	ee1b497c98	llama : support more diverse tokenizers? (#2420 ) * supporting more diverse tokenizers * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-28 21:10:05 +03:00
Georgi Gerganov	d73b8d48b4	examples : fix whitespace	2023-07-28 21:05:08 +03:00
nhamanasu	34ae1caf7f	examples : server chat mode with llama2 (#2400 ) * add: server chat mode with llama2 * fix: remove the unnecessary last \n	2023-07-28 21:02:10 +03:00
Weird Constructor	d91f3f0c55	readme : fix the description of the Tail free sampling (TFS) method (#2431 )	2023-07-28 11:44:43 +03:00
Rand Xie	65cdf34bdc	llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433 )	2023-07-28 11:42:53 +03:00
niansa/tuxifan	edcc7ae7d2	Obtaining LLaMA 2 instructions (#2308 ) * Obtaining LLaMA 2 instructions * Removed sharing warning for LLaMA 2 * Linked TheBloke's GGML repos * Add LLaMA 2 to list of supported models * Added LLaMA 2 usage instructions * Added links to LLaMA 2 70B models	2023-07-28 03:14:11 +02:00
mj-shifu	7c529cede6	convert.py : Update to support 70B HF format model files (#2427 ) * convert.py : fix llama 2 70b conversion from Huggingface	2023-07-27 14:39:17 -06:00
Georgi Gerganov	1a941869cb	metal : disable graph concurrency optimization due to bug (#2413 )	2023-07-27 11:00:54 +03:00
slaren	b5472ea0ad	ggml : fix assert in ggml_set_unary_op (#2410 )	2023-07-26 23:57:23 +02:00
Cebtenzzre	6df1f5940f	make : build with -Wmissing-prototypes (#2394 )	2023-07-26 21:00:04 +03:00
slaren	5488fb789e	ggml : allocate graphs in a context (#2392 ) * ggml : graph allocation in contexts * allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx * llama.cpp : allocate graph in the context * add GGML_PAD --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-26 15:56:53 +02:00
Kawrakow	eb542d3932	Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-07-25 18:35:53 +03:00
slaren	07aaa0f63f	ggml : fix ggml_flash_attn to use op_params (#2387 ) * ggml : fix ggml_flash_attn to use op_params	2023-07-25 16:20:12 +02:00
ldwang	fce48caf9a	convert.py : support bpe tokenizer (#2228 ) * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert, fix Signed-off-by: ldwang <ftgreat@gmail.com> --------- Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-07-25 16:22:09 +03:00
Jiahao Li	875086bdb9	ggml : relax contiguous constraints in activation function (#2371 )	2023-07-25 15:58:32 +03:00
slaren	da1889834a	ggml : improve graph build time via hash table lookup (#2329 ) * improve graph build time * ggml_tensor : use 1 bit per flag * use a hash table instead	2023-07-25 15:32:20 +03:00
Hesen Peng	82552b7f54	build : fix line breaking error in build-info.sh (#2349 ) * fix line breaking * build number line break removal	2023-07-25 15:24:09 +03:00
Xiao-Yong Jin	0c06204fb3	main : add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS (#2304 ) * add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS The BOS precedes the string specified by `--in-prefix`. Model generated EOS is now kept in the context. It provides a way to strictly following the prompt format used in Llama-2-chat. The EOS handling also benefits some existing finetunes that uses EOS to mark the end of turn. * examples/common: move input_prefix_bos to other bools	2023-07-25 15:19:11 +03:00
Eve	1fed755b1f	ci : add non-AVX scalar build/test (#2356 ) * noavx build and test * we don't need to remove f16c in windows	2023-07-25 15:16:13 +03:00
katsu560	be2301bcda	k_quants : add AVX support to dot functions with QK_K as 64 (#2339 ) * add AVX to ggml_vec_dot_q2_K_q8_K() * add AVX to ggml_vec_dot_q3_K_q8_K() * add AVX to ggml_vec_dot_q4_K_q8_K() * add AVX to ggml_vec_dot_q5_K_q8_K() * add AVX to ggml_vec_dot_q6_K_q8_K() * refactor AVX code in ggml_vec_dot_q6_K_q8_K()	2023-07-25 15:13:41 +03:00

1 2 3 4 5 ...

954 commits