Allow quantize to only copy tensors, some other improvements (#2931)

* Allow quantize tool to only copy tensors to allow repackaging models.

* Slightly better logic when requantizing.

* Change help message to go to `stdout`.
This commit is contained in:
Kerfuffle 2023-09-01 08:02:48 -06:00 committed by GitHub
parent 0d58936686
commit 5d6f19f16b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 37 additions and 13 deletions

View file

@ -164,6 +164,7 @@ extern "C" {
enum llama_ftype ftype; // quantize to this llama_ftype
bool allow_requantize; // allow quantizing non-f32/f16 tensors
bool quantize_output_tensor; // quantize output.weight
bool only_copy; // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
} llama_model_quantize_params;
// grammar types