recommend Q4_K_M quantization method

This commit is contained in:
Eve 2024-02-07 02:24:26 +00:00 committed by GitHub
parent 0eff982f61
commit 6f2014a029
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -680,18 +680,18 @@ python3 convert.py models/mymodel/
# [Optional] for models using BPE tokenizers
python convert.py models/mymodel/ --vocabtype bpe
# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-q4_0.gguf q4_0
# quantize the model to 4-bits (using Q4_K_M method)
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
# update the gguf filetype to current version if older version is now unsupported
./quantize ./models/mymodel/ggml-model-q4_0.gguf ./models/mymodel/ggml-model-q4_0-v2.gguf COPY
./quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY
```
### Run the quantized model
```bash
# start inference on a gguf model
./main -m ./models/mymodel/ggml-model-q4_0.gguf -n 128
./main -m ./models/mymodel/ggml-model-Q4_K_M.gguf -n 128
```
When running the larger models, make sure you have enough disk space to store all the intermediate files.