CUDA: faster FlashAttention, kernel for bs == 1 · 75aa7b4b18 - vbatts/llama.cpp - Git - Batts Cloud

vbatts/llama.cpp

CUDA: faster FlashAttention, kernel for bs == 1

This commit is contained in:

Johannes Gäßler

2024-03-29 23:02:39 +01:00

• committed by

Georgi Gerganov

parent 08e69c5008

commit 75aa7b4b18

1 changed files with 937 additions and 482 deletions

1271

ggml-cuda/fattn.cu

View file

File diff suppressed because it is too large Load diff