Commit graph

1015 commits

Author SHA1 Message Date
slaren
a113689571
ggml : add graph tensor allocator (#2411)
* ggml : add graph tensor allocator

* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset

* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization (#2448) 2023-07-29 23:04:10 +02:00
xaedes
22cb368dd9
remove trailing whitespace 2023-07-28 23:55:30 +02:00
xaedes
c1a5e116a4
llama training : fix ggml_rms_norm_back calls to pass configurable eps 2023-07-28 23:13:20 +02:00
xaedes
ecdc16163e
ggml : update ggml_rms_norm_back with configurable eps 2023-07-28 23:13:20 +02:00
xaedes
87035b96f7
remove out-commented vectorized code of opt_adam
the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead
2023-07-28 23:13:20 +02:00
xaedes
0f6a8ab519
tighten abs error bounds for sqrt in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
47055c929f
tighten abs error bounds for flash_attn in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
dbbc263313
add conditional compilation of using F16 exp in flash attention
uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention
2023-07-28 23:13:20 +02:00
xaedes
1065c3b7b9
tighten abs error bounds for cross_entropy_loss in test-grad0 2023-07-28 23:13:20 +02:00
xaedes
24a4b099f3
change sampling parameters for prediction after training to defaults of common.h
and clarify what is context for prediction and what are generated tokens
2023-07-28 23:13:19 +02:00
xaedes
17a0898d50
fix increase of model.train_samples and model.train_tokens
now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations
2023-07-28 23:13:19 +02:00
xaedes
58024d3e5f
rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup 2023-07-28 23:13:19 +02:00
xaedes
e6ff0728e0
add minimum number of tensor dimensions to apply weight decay (default 2)
this allows to not apply weight decay to bias parameters
2023-07-28 23:13:19 +02:00
xaedes
d7aa4d9576
use optimization callback in training
allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters

reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
2023-07-28 23:13:19 +02:00
xaedes
bfc3119139
add optimization callback to ggml_opt_resume_g
this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)).

can be used for dynamic learning schedule and setting input data for batches before each iteration
2023-07-28 23:13:18 +02:00
xaedes
e843d6e71c
measure and print total training time 2023-07-28 23:13:18 +02:00
xaedes
ff759d957c
remove unused function argument from get_example_targets_batch 2023-07-28 23:13:18 +02:00
xaedes
ce937bc431
replace memcpy with reshape operation so that the graph is not cut at the input
this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it
2023-07-28 23:13:18 +02:00
xaedes
c6a18e15c1
add more training parameters:
--enable-restart N         Only for Adam optimizer. Enable restarts of cos-decay
--disable-restart N        Only for Adam optimizer. Disable restarts of cos-decay
--opt-past N               Number of optimization iterations to track for delta convergence test. Disabled when zero.
--opt-delta N              Maximum delta for delta convergence test. Disabled when <= zero.
--opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero.
--adam-epsf N              AdamW epsilon for convergence test. Disabled when <= zero.
--adam-min-alpha N         Adam minimum learning rate alpha, usually 0.1 * alpha
2023-07-28 23:13:18 +02:00
xaedes
d0fbb7d328
llama : fix rope usage in train-text-from-scratch after ChatGLM change 2023-07-28 23:13:17 +02:00
xaedes
fc379a2de3
disable gradient checkpointing debug output 2023-07-28 23:13:17 +02:00
xaedes
3744a9be74
improve gradient checkpointing
sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal.
since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different:

```
  given: n, u, v
  objective: minimize(a*u+b*v) where a*b=n, a>0, b>0
  b=n/a
  minimize(a*u+v*n/a)
  diff(a*u+v*n/a, a) = u - (v*n/a)/a
  diff(a*u+v*n/a, a) == 0
  u - (v*n/a)/a == 0
  u == v*n/(a*a)
  u*a*a = v*n
  a*a = v*n/u
  a = sqrt(n*v/u)
```

this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.
2023-07-28 23:13:17 +02:00
xaedes
51dc77092f
change cross_entropy_loss to output average over all rows
this helps keeping the loss and gradients in a sane range
2023-07-28 23:13:17 +02:00
xaedes
87febeec91
improve finite differences of test-grad0 by using double instead of float 2023-07-28 23:13:17 +02:00
xaedes
864e7e3aa1
fix test-grad0 for soft_max
dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work
instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)
2023-07-28 23:13:17 +02:00
xaedes
2d1e6e0675
fix test-grad0 for cross_entropy_loss
the second argument to cross_entropy_loss must sum up to 1 for each row
2023-07-28 23:13:17 +02:00
xaedes
2c6985f79e
bug fixes for cross entropy loss
ggml_cross_entropy_loss: sums where not correctly added in workload of each thread
ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues

guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16

cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup.
so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.
2023-07-28 23:13:16 +02:00
xaedes
97964a4cc9
change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW
btw: the default weight decay parameter for torch.optim.AdamW is 0.01
2023-07-28 23:13:16 +02:00
xaedes
f175ead6ef
change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT 2023-07-28 23:13:16 +02:00
xaedes
a80f184e6d
change AdamW decay parameter to work like the torch AdamW decay parameter
It is now relative to Adam learning rate `alpha*sched`.
Before that it was relative to `sched` only.

`alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
2023-07-28 23:13:16 +02:00
xaedes
ed4319e1a7
add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes
GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);
2023-07-28 23:13:16 +02:00
xaedes
e05e4414ac
remove unused compute buffer 3 2023-07-28 23:12:00 +02:00
xaedes
6e3f95bf06
implement gradient checkpointing for training
reduces memory overhead from O(n_layer) to O(sqrt(n_layer))

as explained in readme of https://github.com/cybertronai/gradient-checkpointing
2023-07-28 23:11:59 +02:00
xaedes
d7003a98cc
Fix reset of unused g->nodes and g->grads to NULL 2023-07-28 21:30:22 +02:00
xaedes
d395b19c8c
add gradient clipping to AdamW 2023-07-28 21:18:41 +02:00
xaedes
d39c8e6863
remove unnecessary Adam(W) optimizer tensors.
reduces optimizer memory overhead from 7*modelsize to 2*modelsize.

additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t.

bumps training checkpoint file version, but old checkpoints can still be read.
new version with less tensors is saved.
2023-07-28 21:17:57 +02:00
xaedes
5d124d0cb4
fix track_max_mem in forward_batch_wo_cache_flash_attn_train 2023-07-28 21:17:56 +02:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation (#2389)
* common.h : add hellaswag / remove perplexity-lines

* common.cpp : add hellaswag / remove perplexity-lines

* perplexity.cpp : add hellswag scores / remove perplexity-lines

* perplexity.cpp : clean up

* common.h : change default param value

* common.cpp : Change default param

* perplexity.cpp : alter wording

* common.h : alter wording

* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405) 2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? (#2420)
* supporting more diverse tokenizers

* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace 2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 (#2400)
* add: server chat mode with llama2

* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method (#2431) 2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) 2023-07-28 11:42:53 +03:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions (#2308)
* Obtaining LLaMA 2 instructions

* Removed sharing warning for LLaMA 2

* Linked TheBloke's GGML repos

* Add LLaMA 2 to list of supported models

* Added LLaMA 2 usage instructions

* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files (#2427)
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00
Georgi Gerganov
1a941869cb
metal : disable graph concurrency optimization due to bug (#2413) 2023-07-27 11:00:54 +03:00
slaren
b5472ea0ad
ggml : fix assert in ggml_set_unary_op (#2410) 2023-07-26 23:57:23 +02:00