Georgi Gerganov
|
ce281b904c
|
llama : disable FA for AMD
|
2024-04-24 17:54:32 +03:00 |
|
Georgi Gerganov
|
c70bfd7bcb
|
cuda : "constexpr dim3" -> "const dim3"
ggml-ci
|
2024-04-22 20:31:23 +03:00 |
|
Georgi Gerganov
|
5408d55506
|
cuda : uint -> uint32_t
|
2024-04-22 19:12:06 +03:00 |
|
Johannes Gäßler
|
87968de9a9
|
fix KQ FP32 precision fpr parallel_blocks > 1
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
0bc67dd1c8
|
Calculate KQ as FP32 if KQV has GGML_PREC_F32
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
a5b0e2dea0
|
store temp KQ in registers
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
ef9e1593f3
|
flush softmax exp below threshold to 0
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
6a3b84236d
|
fix flash_attn_vec_f16 race condition
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
34f93bbb39
|
CUDA: refactor host code, dyn. par. blocks
|
2024-04-18 13:15:32 +02:00 |
|
Johannes Gäßler
|
ee19a4ab7e
|
fix KV cache padding, NaN from INFINITY (#6438)
|
2024-04-02 17:26:22 +02:00 |
|
Johannes Gäßler
|
c63dfdf765
|
fix cmake build
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
bb0d51accd
|
fix excessive KQ_b loads
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
e1ecd3b129
|
fix compile warnings
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
3f777acf06
|
Multiple parallel blocks for batch size 1
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
68d793bee8
|
no ncols == 64
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
cca6d027a3
|
4 warps, 256 stride for all D
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
269374ed81
|
adjust kernel selection logic
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
81da919864
|
no vec for hs, no hs==256 ncols==32 for Volta
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
d59ac670bf
|
16 cols for Phi-2
|
2024-04-02 13:48:13 +03:00 |
|
Johannes Gäßler
|
75aa7b4b18
|
CUDA: faster FlashAttention, kernel for bs == 1
|
2024-04-02 13:48:13 +03:00 |
|
Georgi Gerganov
|
6be02b5969
|
cuda : fix build
|
2024-03-27 10:31:52 +02:00 |
|
Georgi Gerganov
|
013721df2b
|
Merge branch 'master' into gg/flash-attn
|
2024-03-27 10:24:09 +02:00 |
|