Georgi Gerganov
|
a1616e9f72
|
Merge branch 'master' into gg/flash-attn
ggml-ci
|
2024-04-29 17:19:25 +03:00 |
|
DAN™
|
e00b4a8f81
|
Fix more int overflow during quant (PPL/CUDA). (#6563)
* Fix more int overflow during quant.
* Fix some more int overflow in softmax.
* Revert back to int64_t.
|
2024-04-29 00:38:44 +02:00 |
|
Georgi Gerganov
|
f725ca90fb
|
ggml : ggml_soft_max support F16/F32 mask/pos
ggml-ci
|
2024-04-22 14:53:11 +03:00 |
|
Georgi Gerganov
|
08e69c5008
|
cuda : adapt soft_max to F16 mask and pos
|
2024-03-28 19:40:11 +02:00 |
|
slaren
|
ae1f211ce2
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
|