better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-01-10 09:42:46 +08:00 · 2025-01-10 09:42:46 +08:00 · 324afba5cc
commit 324afba5cc
parent d8a304c2ef
1 changed files with 2 additions and 2 deletions
--- a/src/llama-quant.cpp
+++ b/src/llama-quant.cpp
@ -621,8 +621,8 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
    qs.n_ffn_down = qs.n_ffn_gate = qs.n_ffn_up = (int)model.hparams.n_layer;
-    // sanity checks
+    // sanity checks for models that have attention layers
-    if (!llama_model_is_recurrent(&model))
+    if (qs.n_attention_wv != 0)
    {
        const auto & n_head_kv_iter = model.hparams.n_head_kv_arr.begin();
        // attention layers have a non-zero number of kv heads