ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler. * convert : transpose Mamba-2 A, D and reshape SSM_NORM This breaks existing conversions of Mamba-2 models to avoid some reshapes. Not sure if it's a good idea, but it makes the graph slightly cleaner. * llama : more appropriate SSM_SCAN and SSM_CONV buft support checks
2024-11-04 11:36:37 -05:00 · 2024-11-04 11:36:37 -05:00 · 3bc7103d2e
commit 3bc7103d2e
parent 7d16e1bc8c
7 changed files with 98 additions and 95 deletions
--- a/ggml/include/ggml.h
+++ b/ggml/include/ggml.h
@ -1828,7 +1828,6 @@ extern "C" {
            struct ggml_tensor  * A,
            struct ggml_tensor  * B,
            struct ggml_tensor  * C,
-            struct ggml_tensor  * D,
            struct ggml_tensor  * ids);

    // partition into non-overlapping windows with padding if needed