CUDA: faster FlashAttention, kernel for bs == 1
This commit is contained in:
parent
08e69c5008
commit
75aa7b4b18
1 changed files with 937 additions and 482 deletions
1271
ggml-cuda/fattn.cu
1271
ggml-cuda/fattn.cu
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue