llama: remove redundant reshape in build_kv_store

This commit removes the reshape of the V matrix in the build_kv_store.

The motivation for this is that V matrix has the shape:
```console
(gdb) p *v_cur
$46 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
       buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
       8388608}, op = GGML_OP_MUL_MAT, op_params = {
       0 <repeats 16 times>}, flags = 0, grad = 0x0,
       src = {0xb496b0, 0x7ffef1c40950, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
       0x0, 0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
       view_src = 0x0, view_offs = 0, data = 0x0,
       name = "Vcur-0", '\000' <repeats 57 times>, extra = 0x0,
       padding = "\000\000\000\000\000\000\000"}
```
And after reshaping this tensor we get:
```console
gdb) p *ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens)
$44 = {type = GGML_TYPE_F32, backend = GGML_BACKEND_TYPE_CPU,
       buffer = 0x0, ne = {4096, 512, 1, 1}, nb = {4, 16384, 8388608,
       8388608}, op = GGML_OP_RESHAPE, op_params = {
       0 <repeats 16 times>}, flags = 0, grad = 0x0,
       src = {0x7ffef1c40e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
       0x0}, perf_runs = 0, perf_cycles = 0, perf_time_us = 0,
       view_src = 0x7ffef1c40e00, view_offs = 0, data = 0x0,
       name = "Vcur-0 (reshaped)", '\000' <repeats 46 times>, extra = 0x0,
       padding = "\000\000\000\000\000\000\000"}
```
I noticed that the `src` and `view_src` fields are different but that the
dimensions are the same. From the code comment it seems like the reshape
call is not needed and perhaps the above can motivate the removal of the
reshape call.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit is contained in:
Daniel Bevenius 2024-03-28 15:17:43 +01:00
parent 0308f5e3d7
commit 61f0ae73ef
Failed to extract signature

View file

@ -5523,8 +5523,7 @@ static void llm_build_kv_store(
GGML_ASSERT(kv.size == n_ctx);
// compute the transposed [n_tokens, n_embd] V matrix
struct ggml_tensor * v_cur_t = ggml_transpose(ctx, ggml_reshape_2d(ctx, v_cur, n_embd_v_gqa, n_tokens));
//struct ggml_tensor * v_cur_t = ggml_transpose(ctx, v_cur); // TODO: reshape above is likely not needed
struct ggml_tensor * v_cur_t = ggml_transpose(ctx, v_cur);
cb(v_cur_t, "v_cur_t", il);
struct ggml_tensor * k_cache_view = ggml_view_1d(ctx, kv.k_l[il], n_tokens*n_embd_k_gqa,