uvos
396856b400
CUDA/HIP: add support for selectable warp size to mmv ( #11519 )
...
CUDA/HIP: add support for selectable warp size to mmv
2025-02-02 22:40:09 +01:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention ( #11583 )
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
uvos
5f0db9522f
hip : Add hipGraph and VMM support to ROCM ( #11362 )
...
* Add hipGraph support
* Enable VMM on rocm
2025-01-25 00:02:23 +01:00
Johannes Gäßler
46e3556e01
CUDA: add BF16 support ( #11093 )
...
* CUDA: add BF16 support
2025-01-06 02:33:52 +01:00
uvos
3ad5451f3b
Add some minimal optimizations for CDNA ( #10498 )
...
* Add some minimal optimizations for CDNA
* ggml_cuda: set launch bounds also for GCN as it helps there too
2024-11-27 17:10:08 +01:00
R0CKSTAR
c35e586ea5
musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) ( #9526 )
...
* mtgpu: add mp_21 support
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* mtgpu: enable unified memory
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-22 16:55:49 +02:00
Georgi Gerganov
d13edb17ed
ggml : fix builds ( #0 )
...
ggml-ci
2024-09-20 21:15:05 +03:00
R0CKSTAR
b34e023480
musa: remove Clang builtins mapping ( #9421 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-11 03:46:55 +02:00
R0CKSTAR
439b3fc75a
cuda : organize vendor-specific headers into vendors directory ( #8746 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-07-29 14:56:12 +02:00