iq1_s: we can do even better

Spent one of the 4 scale bits on a signs of a 0.125 shift. I.e., quants are now -1 + delta, delta, 1 + delta, where delta is +/- 0.125. CUDA works, same performance as before. PPL(LLaMA-v2-7B) is now 11.85!
2024-03-11 13:12:33 +02:00 · 2024-03-11 13:12:33 +02:00 · 82380acf10
commit 82380acf10
parent be858f6205
3 changed files with 44 additions and 33 deletions
--- a/ggml-common.h
+++ b/ggml-common.h
@ -645,6 +645,7 @@ GGML_TABLE_BEGIN(uint32_t, iq3s_grid, 512)
 GGML_TABLE_END()

 #define NGRID_IQ1S 2048
+#define IQ1S_DELTA 0.125f
 #if defined(GGML_COMMON_IMPL_C)
 GGML_TABLE_BEGIN(uint64_t, iq1s_grid, NGRID_IQ1S)
    0xffffffffffffffff, 0xffffffffffffff01, 0xffffffffffff0000, 0xffffffffffff01ff,