change AdamW decay parameter to work like the torch AdamW decay parameter

It is now relative to Adam learning rate `alpha*sched`.
Before that it was relative to `sched` only.

`alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
This commit is contained in:
xaedes 2023-06-29 21:31:25 +02:00
parent ed4319e1a7
commit a80f184e6d
No known key found for this signature in database
GPG key ID: 30030EDD817EA2B1

2
ggml.c
View file

@ -17351,8 +17351,8 @@ static enum ggml_opt_result ggml_opt_adam(
// constants // constants
const float sched = params.adam.sched; const float sched = params.adam.sched;
const float decay = params.adam.decay * sched;
const float alpha = params.adam.alpha * sched; const float alpha = params.adam.alpha * sched;
const float decay = params.adam.decay * alpha;
const float beta1 = params.adam.beta1; const float beta1 = params.adam.beta1;
const float beta2 = params.adam.beta2; const float beta2 = params.adam.beta2;
const float eps = params.adam.eps; const float eps = params.adam.eps;