llama.cpp

Author	SHA1	Message	Date
Maxime	503db28849	llama : fix name shadowing and C4146 (#1526 ) * Fix name shadowing and C4146 * Fix if macros not using defined when required * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update llama-util.h Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Code style Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-20 10:22:37 +03:00
Georgi Gerganov	8a203f9fa1	llama : fix compile warnings in llama_set_state_data()	2023-05-20 10:14:43 +03:00
Georgi Gerganov	4fd3e29297	ggml : fix scalar implementation of Q4_1 dot	2023-05-20 10:13:19 +03:00
Georgi Gerganov	2d5db48371	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508 ) * ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics	2023-05-19 22:17:18 +03:00
Georgi Gerganov	6986c7835a	tests : add missing header	2023-05-19 21:17:28 +03:00
Evan Jones	943e6081cc	examples : add persistent chat (#1495 ) * examples : add persistent chat * examples : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-19 20:39:51 +03:00
Jason McCartney	7694b52b9a	main : make reverse prompt option act as a stop token in non-interactive mode (#1032 ) * Make reverse prompt option act as a stop token in non-interactive scenarios * Making requested review changes * Update gpt_params_parse and fix a merge error * Revert "Update gpt_params_parse and fix a merge error" This reverts commit `2bb2ff1748`. * Update gpt_params_parse and fix a merge error take 2	2023-05-19 20:24:59 +03:00
David Kennedy	79e3efb0e9	readme : adds WizardLM to the list of supported models (#1485 )	2023-05-19 20:16:30 +03:00
Georgi Gerganov	4b7e245adf	minor : fix compile warnings	2023-05-19 20:14:51 +03:00
xaedes	08a330a136	add cmake target for baby-llama-text	2023-05-19 18:41:26 +02:00
xaedes	332003584e	sample with non-greedy sampling parameters at the end of training	2023-05-19 18:41:06 +02:00
xaedes	e19ead6e3f	print used memory before and after optimization	2023-05-19 18:40:20 +02:00
xaedes	da86a1d736	fix cross entropy loss - add target probabilities for each sample which is then used in cross entropy loss	2023-05-19 18:39:38 +02:00
xaedes	09b304d015	remove duplicate include	2023-05-19 18:36:05 +02:00
xaedes	37f5b76df1	ggml fixes to support backward pass on inplace operations	2023-05-19 18:35:40 +02:00
xaedes	44d83558bc	use different arguments for input and output checkpoint	2023-05-19 18:34:18 +02:00
xaedes	d8b0666429	initialize rng with srand	2023-05-19 18:29:47 +02:00
xaedes	25fe1c3815	use inplace functions where possible	2023-05-19 14:53:21 +02:00
Erik Scholz	5ea4339273	make kv_f16 the default for api users (#1517 )	2023-05-18 19:31:01 +02:00
DannyDaemonic	ee9654138a	Fixes #1511 lambda issue for w64devkit (mingw) (#1513 ) * Fix for w64devkit and mingw	2023-05-18 19:30:40 +02:00
Stephan Walter	dc271c52ed	Remove unused n_parts parameter (#1509 )	2023-05-17 22:12:01 +00:00
rankaiyx	c238b5873a	benchmark-matmul: Print the average of the test results (#1490 )	2023-05-17 16:47:58 +02:00
xaedes	b241b9cb6c	save train trained model to checkpoint and load model to be trained from checkpoint	2023-05-17 13:49:32 +02:00
xaedes	d328472f16	fix get_samples call, add model tensor names, increase model size, start training samples after newline	2023-05-17 12:52:20 +02:00
Tom Jobbins	2b2646931b	convert.py: Support models which are stored in a single pytorch_model.bin (#1469 ) * Support models in a single pytorch_model.bin * Remove spurious line with typo	2023-05-17 00:04:35 +02:00
Ilya Kurdyukov	42627421ec	~7% faster Q5_1 AVX2 code (#1477 )	2023-05-16 18:36:47 +00:00
András Salamon	9560655409	define default model path once, sync path with readme (#1366 )	2023-05-16 17:46:34 +02:00
sandyiscool	2a5ee023ad	Add alternate include path for openblas (#1476 ) In some linux distributions (fedora, for example), the include path for openblas is located at '/usr/local/include'	2023-05-16 10:30:15 +02:00
xaedes	e063135d0b	add llama sampler, shuffle samples and constrain sampling to tokens occurring in train data	2023-05-15 21:12:28 +02:00
xaedes	ec881156f6	improve ggml_out_prod performance - change iteration order (>15s -> 10s runtime) - parallelize over one more dimension: over dst matrix rows (10s -> <5s runtime)	2023-05-15 14:42:24 +02:00
xaedes	19fb91899b	better weight initialization improves training convergence at start	2023-05-15 14:19:38 +02:00
xaedes	f3cf7df21f	better weight initialization improves training convergence at start	2023-05-15 14:18:57 +02:00
xaedes	efa4bb78ea	add ggml_out_prod and use it for mul_mat backward pass for improved performance performance stats report improvement from 37 seconds to 16 seconds runtime during my training tests	2023-05-15 14:17:42 +02:00
zrm	63d20469b8	fix get_num_physical_cores() (#1436 ) * fix get_num_physical_cores() had been broken on complex topologies because "cpu cores" in /proc/cpuinfo is per-"physical id" * Add spaces to maintain consistent formatting --------- Co-authored-by: slaren <ddevesa@gmail.com>	2023-05-15 04:25:42 +02:00
slaren	b5c9295eef	benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458 ) * benchmark-matmul: fix command line parsing, replace macros with functions, report results in GFLOPS	2023-05-14 22:46:00 +02:00
xaedes	a703d7a85f	activate threading in baby-llama-text	2023-05-14 21:00:55 +02:00
xaedes	d9b5268728	avoid printing too much newlines in baby-llama-text	2023-05-14 20:57:47 +02:00
xaedes	c054079fb8	improve performance of mul_mat backward pass avoid transpose by using mul_mat with swapped arguments	2023-05-14 20:56:50 +02:00
xaedes	1f2b76de01	fix bug in ggml_compute_forward_soft_max_back_f32 on DEBUG build	2023-05-14 20:55:24 +02:00
xaedes	69108167cd	fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 memcpy needs to be synchronized across threads to avoid race conditions. => do it in INIT phase	2023-05-14 20:54:57 +02:00
Johannes Gäßler	eb363627fd	cuda : deduplicated dequantization code (#1453 )	2023-05-14 21:53:23 +03:00
xaedes	4339f8cf28	improve softmax backward pass go from quadratic runtime to linear runtime by simplifying the formulas	2023-05-14 17:55:02 +02:00
xaedes	79b2d5b69d	ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 (#1454 ) * fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 memcpy needs to be synchronized across threads to avoid race conditions. => do it in INIT phase * remove trailing whitespace * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-05-14 18:55:02 +03:00
Georgi Gerganov	13c351ad72	ggml : various fixes (#1450 ) - `ggml_rope()` - `ggml_diag_mask_inf()` multi-threaded - compatibility with scratch buffers	2023-05-14 18:22:50 +03:00
xaedes	ec1aea09ec	implement ggml_soft_max_back for more performant backward pass of soft_max avoids creating big intermediate matrices of size n_embd x n_embd for llama layers and n_vocab x n_vocab for cross entropy loss	2023-05-14 17:16:26 +02:00
xaedes	f89c278d83	fix race condition bug in ggml_compute_forward_diag_mask_f32	2023-05-14 17:00:19 +02:00
xaedes	6e968d22b0	add text generating baby-llama from scratch example	2023-05-14 16:07:08 +02:00
katsu560	60f8c361ca	ggml : add AVX support based on AVX2 code (#1430 )	2023-05-14 10:03:51 +00:00
Georgi Gerganov	601a033475	ggml : add GGML_QNT_VERSION to track quantization format changes https://github.com/ggerganov/ggml/issues/150#issuecomment-1546625668	2023-05-14 10:20:19 +03:00
xaedes	6e88dc93bd	update python bindings	2023-05-13 19:05:24 +02:00

... 2 3 4 5 6 ...

749 commits