Convert vector to f16 for dequantize mul mat vec (#1913)
* Convert vector to f16 for dmmv * compile option * Added compilation option description to README * Changed cmake CUDA_ARCHITECTURES from "OFF" to "native"
This commit is contained in:
parent
b24c3049d9
commit
16b9cd1939
5 changed files with 158 additions and 68 deletions
|
@ -1620,7 +1620,7 @@ static bool llama_eval_internal(
|
|||
model.layers[il].w1,
|
||||
cur);
|
||||
offload_func(cur);
|
||||
ggml_set_name(cur, "result_w2");
|
||||
ggml_set_name(cur, "result_w1");
|
||||
|
||||
// SILU activation
|
||||
cur = ggml_silu(ctx0, cur);
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue