xaedes
05cb629c8e
replace inefficient repeat backward pass with dedicated repeat_back operation
2023-05-28 18:00:17 +02:00
xaedes
c47df09842
simplify backward pass for SQRT
2023-05-28 17:32:01 +02:00
xaedes
6d40cc3a44
remove trailing whitespace
2023-05-22 20:56:35 +02:00
xaedes
d3acbf644e
simplify code
2023-05-22 20:53:57 +02:00
xaedes
0651679302
save checkpoint only when it was trained
2023-05-22 16:56:28 +02:00
xaedes
cc440bd438
fix bug in get_samples which corrupted training targets
2023-05-22 16:55:52 +02:00
xaedes
b763d6f1f2
remove unused functions
2023-05-22 16:54:21 +02:00
xaedes
42d9b4cfc2
store optimizer state in training checkpoint and add learning schedule
...
persistent optimizer state allows to resume training without resetting the optimizer
learning schedule consists of linear warmup ramp followed by cosine decay with restarts
2023-05-21 21:36:04 +02:00
xaedes
37c69435f0
print suppressed newline tokens as string "\n"
...
printing too much actual newlines is suppressed to avoid flooding the console.
2023-05-21 21:17:46 +02:00
xaedes
93eb8f7752
add forward function without using cache, for more performant training
...
during training on whole samples no cache is required.
removing the cache and simplifying the remaining code results in performance and memory usage improvement.
2023-05-21 21:14:49 +02:00
xaedes
2afd218479
fix bug in llama_sample_token_mirostat_v2
...
when all candidates are filtered out through mu threshold, the following soft_max operation will fail.
so keep at least one.
2023-05-21 21:12:10 +02:00
xaedes
ec1783c3e0
add ggml_opt_context, so that we can properly resume training
...
otherwise the optimizer states, tracking statistics about the error function and its derivates,
will reset to zero each time ggml_opt is called, hindering convergence on resumed training.
now the optimizer context and all its memory is stored in a separate struct.
2023-05-21 21:10:16 +02:00
xaedes
1eee9255e7
add missing default parameters for adam optimizer
2023-05-21 15:03:51 +02:00
xaedes
57c2f4f909
fix random weight initialization scale
2023-05-21 12:18:47 +02:00
xaedes
96514971dd
use inplace operations in cross_entropy_loss
2023-05-21 12:17:57 +02:00
xaedes
ef17d99f65
implement AdamW in ggml_opt_adam by adding weight decay parameter (default 0.001f)
...
also add a schedule parameter (default 1.0f) that can be used to scale alpha and decay according to learning schedule.
setting the decay parameter to zero disables AdamW resulting in normal Adam optimizer.
since the difference between Adam and AdamW is minimal it is not implemented as another optimizer, but integrated into the existing Adam optimizer.
2023-05-20 14:54:57 +02:00
xaedes
f4e9ce7998
enable gradient propagation for inplace add1 and scale operations
...
those functions backward passes don't need the original src0, so they also work when forward is inplace
2023-05-20 14:49:30 +02:00
xaedes
a6aafdd719
add ggml_add1_inplace to header
2023-05-20 14:47:56 +02:00
xaedes
08a330a136
add cmake target for baby-llama-text
2023-05-19 18:41:26 +02:00
xaedes
332003584e
sample with non-greedy sampling parameters at the end of training
2023-05-19 18:41:06 +02:00
xaedes
e19ead6e3f
print used memory before and after optimization
2023-05-19 18:40:20 +02:00
xaedes
da86a1d736
fix cross entropy loss
...
- add target probabilities for each sample which is then used in cross entropy loss
2023-05-19 18:39:38 +02:00
xaedes
09b304d015
remove duplicate include
2023-05-19 18:36:05 +02:00
xaedes
37f5b76df1
ggml fixes to support backward pass on inplace operations
2023-05-19 18:35:40 +02:00
xaedes
44d83558bc
use different arguments for input and output checkpoint
2023-05-19 18:34:18 +02:00
xaedes
d8b0666429
initialize rng with srand
2023-05-19 18:29:47 +02:00
xaedes
25fe1c3815
use inplace functions where possible
2023-05-19 14:53:21 +02:00
xaedes
b241b9cb6c
save train trained model to checkpoint and load model to be trained from checkpoint
2023-05-17 13:49:32 +02:00
xaedes
d328472f16
fix get_samples call, add model tensor names, increase model size, start training samples after newline
2023-05-17 12:52:20 +02:00
xaedes
e063135d0b
add llama sampler, shuffle samples and constrain sampling to tokens occurring in train data
2023-05-15 21:12:28 +02:00
xaedes
ec881156f6
improve ggml_out_prod performance
...
- change iteration order (>15s -> 10s runtime)
- parallelize over one more dimension: over dst matrix rows (10s -> <5s runtime)
2023-05-15 14:42:24 +02:00
xaedes
19fb91899b
better weight initialization improves training convergence at start
2023-05-15 14:19:38 +02:00
xaedes
f3cf7df21f
better weight initialization improves training convergence at start
2023-05-15 14:18:57 +02:00
xaedes
efa4bb78ea
add ggml_out_prod and use it for mul_mat backward pass for improved performance
...
performance stats report improvement from 37 seconds to 16 seconds runtime during my training tests
2023-05-15 14:17:42 +02:00
xaedes
a703d7a85f
activate threading in baby-llama-text
2023-05-14 21:00:55 +02:00
xaedes
d9b5268728
avoid printing too much newlines in baby-llama-text
2023-05-14 20:57:47 +02:00
xaedes
c054079fb8
improve performance of mul_mat backward pass
...
avoid transpose by using mul_mat with swapped arguments
2023-05-14 20:56:50 +02:00
xaedes
1f2b76de01
fix bug in ggml_compute_forward_soft_max_back_f32 on DEBUG build
2023-05-14 20:55:24 +02:00
xaedes
69108167cd
fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
...
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
2023-05-14 20:54:57 +02:00
xaedes
4339f8cf28
improve softmax backward pass
...
go from quadratic runtime to linear runtime by simplifying the formulas
2023-05-14 17:55:02 +02:00
xaedes
ec1aea09ec
implement ggml_soft_max_back for more performant backward pass of soft_max
...
avoids creating big intermediate matrices of size n_embd x n_embd for llama layers and n_vocab x n_vocab for cross entropy loss
2023-05-14 17:16:26 +02:00
xaedes
f89c278d83
fix race condition bug in ggml_compute_forward_diag_mask_f32
2023-05-14 17:00:19 +02:00
xaedes
6e968d22b0
add text generating baby-llama from scratch example
2023-05-14 16:07:08 +02:00
xaedes
6e88dc93bd
update python bindings
2023-05-13 19:05:24 +02:00
xaedes
ed6b64fb98
add python bindings for functions to get and set the whole llama state
...
(rng, logits, embedding and kv_cache)
2023-05-13 16:32:08 +02:00
xaedes
5f6b715071
fix decoding error. adds errors=ignore parameter
2023-05-13 16:31:13 +02:00
xaedes
bc9e84daca
add python wrapper
...
https://gist.github.com/abetlen/2b90e5f153f6efd00931d098de5c73ce
2023-05-13 16:31:13 +02:00
Georgi Gerganov
5a5aeb1e91
llama : fix unused warning
2023-05-13 16:55:14 +03:00
Georgi Gerganov
66841fdb0e
ggml : multi-thread mul and diag_mask ops ( #1428 )
2023-05-13 16:48:03 +03:00
Johannes Gäßler
905d87b70a
ggml : GPU-accelerated token generation ( #1412 )
...
* CUDA kernel for q4_0 dequant. + mat. vec. mult.
* Added q4_1 via template
* Added missing __syncthreads();
* --gpu_layers -> --gpu-layers
* Shorter dequantize_mul_mat_vec line
* q5_0 dequantize_mul_mat kernel
* More readable dequantize_mul_mat_vec logic
* dequantize_mul_mat_vec kernels for q5_1, q8_0, f16
* llama : offload "output" tensor to GPU too + coding style fixes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-13 16:38:36 +03:00