Commit graph

5 commits

Author SHA1 Message Date
Georgi Gerganov
a1616e9f72
Merge branch 'master' into gg/flash-attn
ggml-ci
2024-04-29 17:19:25 +03:00
DAN™
e00b4a8f81
Fix more int overflow during quant (PPL/CUDA). (#6563)
* Fix more int overflow during quant.

* Fix some more int overflow in softmax.

* Revert back to int64_t.
2024-04-29 00:38:44 +02:00
Georgi Gerganov
f725ca90fb
ggml : ggml_soft_max support F16/F32 mask/pos
ggml-ci
2024-04-22 14:53:11 +03:00
Georgi Gerganov
08e69c5008
cuda : adapt soft_max to F16 mask and pos 2024-03-28 19:40:11 +02:00
slaren
ae1f211ce2
cuda : refactor into multiple files (#6269) 2024-03-25 13:50:23 +01:00