change AdamW decay parameter to work like the torch AdamW decay parameter
It is now relative to Adam learning rate `alpha*sched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
This commit is contained in:
parent
ed4319e1a7
commit
a80f184e6d
1 changed files with 1 additions and 1 deletions
2
ggml.c
2
ggml.c
|
@ -17351,8 +17351,8 @@ static enum ggml_opt_result ggml_opt_adam(
|
||||||
|
|
||||||
// constants
|
// constants
|
||||||
const float sched = params.adam.sched;
|
const float sched = params.adam.sched;
|
||||||
const float decay = params.adam.decay * sched;
|
|
||||||
const float alpha = params.adam.alpha * sched;
|
const float alpha = params.adam.alpha * sched;
|
||||||
|
const float decay = params.adam.decay * alpha;
|
||||||
const float beta1 = params.adam.beta1;
|
const float beta1 = params.adam.beta1;
|
||||||
const float beta2 = params.adam.beta2;
|
const float beta2 = params.adam.beta2;
|
||||||
const float eps = params.adam.eps;
|
const float eps = params.adam.eps;
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue