ggml : remove q1_3 and q2_2

* llama : remove the separate scale tensors of BitNet b1.58 They won't be needed, since the remaining ternary quant types have built-in scales.
2024-08-02 19:52:19 -04:00 · 2024-08-02 19:52:19 -04:00 · 04eec58112
commit 04eec58112
parent 45719a2472
12 changed files with 45 additions and 693 deletions
--- a/include/llama.h
+++ b/include/llama.h
@ -168,8 +168,6 @@ extern "C" {
        LLAMA_FTYPE_MOSTLY_Q4_0_8_8      = 35, // except 1d tensors
        LLAMA_FTYPE_MOSTLY_TQ1_0         = 36, // except 1d tensors
        LLAMA_FTYPE_MOSTLY_TQ2_0         = 37, // except 1d tensors
-        LLAMA_FTYPE_MOSTLY_Q1_3          = 38, // except 1d tensors
-        LLAMA_FTYPE_MOSTLY_Q2_2          = 39, // except 1d tensors

        LLAMA_FTYPE_GUESSED = 1024, // not specified in the model file
    };