2-bit quantizations (#4897)

* imatrix: load

* imatrix: WIP

* imatrix: Add Q2_K quantization

* imatrix: also guard against Q2_K_S quantization without importance matrix

* imatrix: guard even more against low-bit quantization misuse

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
Kawrakow 2024-01-14 09:45:56 +02:00 committed by GitHub
parent 807179ec58
commit 147b17ac94
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
9 changed files with 1149 additions and 82 deletions

View file

@ -249,6 +249,7 @@ extern "C" {
bool quantize_output_tensor; // quantize output.weight
bool only_copy; // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
bool pure; // disable k-quant mixtures and quantize all tensors to the same type
void * imatrix; // pointer to importance matrix data
} llama_model_quantize_params;
// grammar types