Commit graph

682 commits

Author SHA1 Message Date
Georgi Gerganov
b58d73ca8c
ci : disable temporary 2023-06-02 18:10:14 +02:00
xaedes
d9626743ac
add option to use scratch buffers in training or not
make it configurable because currently training with scratch buffers implies flash attention and optimization over all parameters.
2023-06-01 20:59:19 +02:00
xaedes
0d4b87de3d
improve training memory usage with scratch buffers
instead of relying on the automatic backward pass, we manually create the graph for the backward pass.
it turns out that all backward pass operations need only temporary memory which can be reused after each layer.

will compute backward pass for ALL model parameters
2023-06-01 19:50:48 +02:00
xaedes
765b290010
bug fix for ggml_compute_forward_get_rows_back_f32
the result should be set to zero, not to whatever data is in opt0
2023-06-01 19:42:51 +02:00
xaedes
3164f93381
fix formulas in comments 2023-06-01 19:41:55 +02:00
xaedes
0e269665cd
add ggml_opt_resume_g which accepts forward and backward cgraphs 2023-06-01 19:41:28 +02:00
xaedes
83a34444af
remove trailing whitespace 2023-05-31 15:02:38 +02:00
xaedes
01fc3faf71
add explicit cast to fix compile error
"error: non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long long') to 'uint32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]"
2023-05-31 15:00:54 +02:00
xaedes
f88fb2bdc5
add #include <climits> 2023-05-31 12:38:26 +02:00
xaedes
7f172c1070
replace auto parameters in lambda function 2023-05-31 00:26:24 +02:00
xaedes
8fd8599f61
rename baby-llama-text to train-text-from-scratch 2023-05-30 17:07:03 +02:00
xaedes
21b11b55d4
remove python bindings 2023-05-30 17:03:09 +02:00
xaedes
a5317498c2
Merge branch 'master' into text-from-scratch
# Conflicts:
#	ggml.c : number of operations and GGML_ASSERT vs assert
2023-05-30 16:57:17 +02:00
xaedes
1074a81e81
add train params to specify memory size 2023-05-30 16:06:20 +02:00
xaedes
ad966da955
remove unnecessary comments 2023-05-30 15:58:22 +02:00
xaedes
ec8e262d1d
add train_params and command line option parser 2023-05-30 15:53:55 +02:00
xaedes
fcbc4457d6
add option to train with flash attention and move options to the top of the main function
training from scratch also works with flash attention
training convergence and generation results after fix number of iterations are worse than when not using flash attention.
maybe there still lingers a bug in the flash attention backward pass?
but training works, just with slower convergence.

flash attention is still worth to use, because it requires way less memory and is faster with high n_ctx
2023-05-30 13:18:17 +02:00
xaedes
70c08318af
test flash attention backward pass
need to set loose error bounds to pass.
the finitie differences are close to numeric limits and often return quite different values than the backward pass.
reducing eps further lets the gradients vanish completely.
likewise setting eps to big results in wronger values.
the softmax in the middle of the function is probably the most responsible for the numeric issues using finite differences.
2023-05-29 23:51:40 +02:00
xaedes
38560b6d51
bugfixes for backward pass of flash attention 2023-05-29 23:45:58 +02:00
xaedes
22a7279ffb
implement backward pass of flash attention 2023-05-29 22:00:40 +02:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API 2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi 2023-05-29 19:30:49 +03:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts (Fixes #1585) (#1609)
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies (#1621)
Git added to build packages for version information in docker image

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00
xaedes
56895e28f6
get vocabulary for exporting training checkpoint to llama compatible model file 2023-05-29 02:25:18 +02:00
xaedes
4b81c32d5b
add export of training checkpoint to llama compatible model file 2023-05-29 01:27:09 +02:00
xaedes
2da5c8cf24
set default model.type for unknown models with few layers 2023-05-29 01:21:01 +02:00
xaedes
bf4d9b3b81
add llama_get_vocab to get the vocabulary as output parameters 2023-05-29 01:20:26 +02:00
xaedes
89475fb320
slightly improve how cross entropy loss is compute
btw: directly implemented cross entropy loss seems to have way lower magnitudes than when implemented with softmax and log.
probably the input to log gets closer to zero due to float numerics.
maybe the multiplication by (1.0-eps)/sum is more accurate..
2023-05-28 22:40:58 +02:00
xaedes
5f5aa20078
remove trailing whitespace 2023-05-28 22:00:56 +02:00
xaedes
1fbd19abe1
use ggml_cross_entropy_loss in text training example 2023-05-28 22:00:26 +02:00
xaedes
f056a04a80
add tests for cross_entropy_loss backward pass
finite differences regularly results in estimated gradient of zero, despite the backward pass giving non zero gradient.
_probably_ the finite differences fails due to numerical issues
2023-05-28 21:59:17 +02:00
xaedes
71aaf8dedf
add ggml_cross_entropy_loss with backward pass for faster training
cross entropy loss can also be implemented using softmax and log, but as dedicated operation it is faster and especially avoids unnecessary memory overhead.
2023-05-28 21:57:38 +02:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
xaedes
05cb629c8e
replace inefficient repeat backward pass with dedicated repeat_back operation 2023-05-28 18:00:17 +02:00
xaedes
c47df09842
simplify backward pass for SQRT 2023-05-28 17:32:01 +02:00
apcameron
a6704643b6
ggml : add support for the RISCV architecture (#1616) 2023-05-27 23:03:25 +03:00
Kerfuffle
0df7d63e5b
Include server in releases + other build system cleanups (#1610)
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.

Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)

Fix issue where `vdot` binary wasn't removed when running `make clean`.

Fix compile warnings in `server` example.

Add `.hpp` files to trigger workflow (the server example has one).
2023-05-27 11:04:14 -06:00
Henri Vasserman
97c9b77c4f
Add documentation about CLBlast (#1604)
Installing, compiling and using.
2023-05-27 18:47:55 +03:00
Henri Vasserman
0ecb1bbbeb
[CI] Fix openblas (#1613)
* Fix OpenBLAS build

* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
2023-05-27 17:24:06 +03:00
Georgi Gerganov
93618031c7
ggml : add ggml_tensor_overhead() 2023-05-27 16:19:56 +03:00
Henri Vasserman
83c54e6da5
[CI] CLBlast: Fix directory name (#1606) 2023-05-27 14:18:25 +02:00
Georgi Gerganov
bdbda1b17a
ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) 2023-05-27 12:23:16 +03:00
Kerfuffle
66874d4fbc
Some improvements to loading the session with --prompt-cache (#1550)
Improvements to loading the session with `--prompt-cache` in the `main` example.

1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt.
2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.
2023-05-25 20:18:01 -06:00
Johannes Gäßler
1fcdcc28b1
cuda : performance optimizations (#1530)
* xor hack

* block y dim

* loop unrolling

* Fixed cmake LLAMA_CUDA_BY option

* Removed hipblas compatibility code

* Define GGML_CUDA_DMMV_BLOCK_Y if not defined

* Fewer iters, more ops per iter

* Renamed DMMV X/Y compilation options
2023-05-26 00:07:29 +03:00
Henri Vasserman
ac7876ac20
Update CLBlast to 1.6.0 (#1580)
* Update CLBlast to 1.6.0
2023-05-24 10:30:09 +03:00