Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
* k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
parent
c5b8595e3f
commit
76aa30a263
5 changed files with 424 additions and 15 deletions
|
@ -249,6 +249,9 @@ static ggml_type ggml_type_from_name(const std::string & s) {
|
|||
if (s == "q5_1") {
|
||||
return GGML_TYPE_Q5_1;
|
||||
}
|
||||
if (s == "iq4_nl") {
|
||||
return GGML_TYPE_IQ4_NL;
|
||||
}
|
||||
|
||||
return GGML_TYPE_COUNT;
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue