Commit graph

4 commits

Author SHA1 Message Date
Jeff Bolz
466300fe14
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)
Do masking on whole dwords, fetch all scales at once.
2025-01-16 22:23:49 +01:00
Jeff Bolz
206bc53422
vulkan: optimize coopmat2 q2_k dequant function (#11130) 2025-01-16 22:16:39 +01:00
Jeff Bolz
a91a41364b
vulkan: optimize coopmat2 dequant functions (#10855)
Change the code to do 16b loads when possible and extract the appropriate
component late, so the code is effectively decoding a pair of elements and
then selecting one. This can allow more commoning to happen in the compiler
when neighboring elements are loaded.
2024-12-21 08:04:45 +01:00
Jeff Bolz
c9c6e01dae
vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) 2024-12-05 20:15:05 +01:00