change AdamW decay parameter to work like the torch AdamW decay parameter

It is now relative to Adam learning rate `alpha*sched`. Before that it was relative to `sched` only. `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
2023-06-29 21:31:25 +02:00 · 2023-06-29 21:31:25 +02:00 · a80f184e6d
commit a80f184e6d
parent ed4319e1a7
1 changed files with 1 additions and 1 deletions
--- a/ggml.c
+++ b/ggml.c
@ -17351,8 +17351,8 @@ static enum ggml_opt_result ggml_opt_adam(
    // constants
    const float sched = params.adam.sched;
    const float decay = params.adam.decay * sched;
    const float alpha = params.adam.alpha * sched;
    const float decay = params.adam.decay * alpha;
    const float beta1 = params.adam.beta1;
    const float beta2 = params.adam.beta2;
    const float eps   = params.adam.eps;