vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
This commit is contained in:
Rémy Oudompheng 2025-01-29 18:29:39 +01:00 committed by GitHub
parent e51c47b401
commit 66ee4f297c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 1616 additions and 40 deletions

View file

@ -104,8 +104,8 @@ ACC_TYPE Max(const in uint32_t row, const in uint32_t col, const in ACC_TYPE ele
#endif
void main() {
#if defined(DATA_A_IQ4_NL)
init_iq4nl_shmem();
#if defined(DATA_A_IQ2_XXS) || defined(DATA_A_IQ2_XS) || defined(DATA_A_IQ2_S) || defined(DATA_A_IQ3_XXS) || defined(DATA_A_IQ3_S) || defined(DATA_A_IQ4_NL)
init_iq_shmem(gl_WorkGroupSize);
#endif
const uint32_t N = p.N;