ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks
This commit is contained in:
Francis Couture-Harpin 2024-11-04 11:36:37 -05:00
parent 7d16e1bc8c
commit 3bc7103d2e
7 changed files with 98 additions and 95 deletions

View file

@ -1828,7 +1828,6 @@ extern "C" {
struct ggml_tensor * A,
struct ggml_tensor * B,
struct ggml_tensor * C,
struct ggml_tensor * D,
struct ggml_tensor * ids);
// partition into non-overlapping windows with padding if needed