llama.cpp

Author	SHA1	Message	Date
xaedes	d9626743ac	add option to use scratch buffers in training or not make it configurable because currently training with scratch buffers implies flash attention and optimization over all parameters.	2023-06-01 20:59:19 +02:00
xaedes	0d4b87de3d	improve training memory usage with scratch buffers instead of relying on the automatic backward pass, we manually create the graph for the backward pass. it turns out that all backward pass operations need only temporary memory which can be reused after each layer. will compute backward pass for ALL model parameters	2023-06-01 19:50:48 +02:00
xaedes	765b290010	bug fix for ggml_compute_forward_get_rows_back_f32 the result should be set to zero, not to whatever data is in opt0	2023-06-01 19:42:51 +02:00
xaedes	3164f93381	fix formulas in comments	2023-06-01 19:41:55 +02:00
xaedes	0e269665cd	add ggml_opt_resume_g which accepts forward and backward cgraphs	2023-06-01 19:41:28 +02:00
xaedes	83a34444af	remove trailing whitespace	2023-05-31 15:02:38 +02:00
xaedes	01fc3faf71	add explicit cast to fix compile error "error: non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long long') to 'uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]"	2023-05-31 15:00:54 +02:00
xaedes	f88fb2bdc5	add #include <climits>	2023-05-31 12:38:26 +02:00
xaedes	7f172c1070	replace auto parameters in lambda function	2023-05-31 00:26:24 +02:00
xaedes	8fd8599f61	rename baby-llama-text to train-text-from-scratch	2023-05-30 17:07:03 +02:00
xaedes	21b11b55d4	remove python bindings	2023-05-30 17:03:09 +02:00
xaedes	a5317498c2	Merge branch 'master' into text-from-scratch # Conflicts: # ggml.c : number of operations and GGML_ASSERT vs assert	2023-05-30 16:57:17 +02:00
xaedes	1074a81e81	add train params to specify memory size	2023-05-30 16:06:20 +02:00
xaedes	ad966da955	remove unnecessary comments	2023-05-30 15:58:22 +02:00
xaedes	ec8e262d1d	add train_params and command line option parser	2023-05-30 15:53:55 +02:00
xaedes	fcbc4457d6	add option to train with flash attention and move options to the top of the main function training from scratch also works with flash attention training convergence and generation results after fix number of iterations are worse than when not using flash attention. maybe there still lingers a bug in the flash attention backward pass? but training works, just with slower convergence. flash attention is still worth to use, because it requires way less memory and is faster with high n_ctx	2023-05-30 13:18:17 +02:00
xaedes	70c08318af	test flash attention backward pass need to set loose error bounds to pass. the finitie differences are close to numeric limits and often return quite different values than the backward pass. reducing eps further lets the gradients vanish completely. likewise setting eps to big results in wronger values. the softmax in the middle of the function is probably the most responsible for the numeric issues using finite differences.	2023-05-29 23:51:40 +02:00
xaedes	38560b6d51	bugfixes for backward pass of flash attention	2023-05-29 23:45:58 +02:00
xaedes	22a7279ffb	implement backward pass of flash attention	2023-05-29 22:00:40 +02:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
xaedes	56895e28f6	get vocabulary for exporting training checkpoint to llama compatible model file	2023-05-29 02:25:18 +02:00
xaedes	4b81c32d5b	add export of training checkpoint to llama compatible model file	2023-05-29 01:27:09 +02:00
xaedes	2da5c8cf24	set default model.type for unknown models with few layers	2023-05-29 01:21:01 +02:00
xaedes	bf4d9b3b81	add llama_get_vocab to get the vocabulary as output parameters	2023-05-29 01:20:26 +02:00
xaedes	89475fb320	slightly improve how cross entropy loss is compute btw: directly implemented cross entropy loss seems to have way lower magnitudes than when implemented with softmax and log. probably the input to log gets closer to zero due to float numerics. maybe the multiplication by (1.0-eps)/sum is more accurate..	2023-05-28 22:40:58 +02:00
xaedes	5f5aa20078	remove trailing whitespace	2023-05-28 22:00:56 +02:00
xaedes	1fbd19abe1	use ggml_cross_entropy_loss in text training example	2023-05-28 22:00:26 +02:00
xaedes	f056a04a80	add tests for cross_entropy_loss backward pass finite differences regularly results in estimated gradient of zero, despite the backward pass giving non zero gradient. _probably_ the finite differences fails due to numerical issues	2023-05-28 21:59:17 +02:00
xaedes	71aaf8dedf	add ggml_cross_entropy_loss with backward pass for faster training cross entropy loss can also be implemented using softmax and log, but as dedicated operation it is faster and especially avoids unnecessary memory overhead.	2023-05-28 21:57:38 +02:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
xaedes	05cb629c8e	replace inefficient repeat backward pass with dedicated repeat_back operation	2023-05-28 18:00:17 +02:00
xaedes	c47df09842	simplify backward pass for SQRT	2023-05-28 17:32:01 +02:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00
Henri Vasserman	83c54e6da5	[CI] CLBlast: Fix directory name (#1606 )	2023-05-27 14:18:25 +02:00
Georgi Gerganov	bdbda1b17a	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	2023-05-27 12:23:16 +03:00
Kerfuffle	66874d4fbc	Some improvements to loading the session with --prompt-cache (#1550 ) Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.	2023-05-25 20:18:01 -06:00
Johannes Gäßler	1fcdcc28b1	cuda : performance optimizations (#1530 ) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options	2023-05-26 00:07:29 +03:00
Henri Vasserman	ac7876ac20	Update CLBlast to 1.6.0 (#1580 ) * Update CLBlast to 1.6.0	2023-05-24 10:30:09 +03:00
Evan Jones	c31bbe934b	readme : add docs for chat-persistent.sh (#1568 ) * readme : add docs for chat-persistent.sh * Update README.md	2023-05-24 09:24:01 +03:00

1 2 3 4 5 ...

681 commits