llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	a1616e9f72	Merge branch 'master' into gg/flash-attn ggml-ci	2024-04-29 17:19:25 +03:00
DAN™	e00b4a8f81	Fix more int overflow during quant (PPL/CUDA). (#6563 ) * Fix more int overflow during quant. * Fix some more int overflow in softmax. * Revert back to int64_t.	2024-04-29 00:38:44 +02:00
Georgi Gerganov	f725ca90fb	ggml : ggml_soft_max support F16/F32 mask/pos ggml-ci	2024-04-22 14:53:11 +03:00
Georgi Gerganov	08e69c5008	cuda : adapt soft_max to F16 mask and pos	2024-03-28 19:40:11 +02:00
slaren	ae1f211ce2	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00