recommend Q4_K_M quantization method

This commit is contained in:
Eve 2024-02-07 02:24:26 +00:00 committed by GitHub
parent 0eff982f61
commit 6f2014a029
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -680,18 +680,18 @@ python3 convert.py models/mymodel/
# [Optional] for models using BPE tokenizers # [Optional] for models using BPE tokenizers
python convert.py models/mymodel/ --vocabtype bpe python convert.py models/mymodel/ --vocabtype bpe
# quantize the model to 4-bits (using q4_0 method) # quantize the model to 4-bits (using Q4_K_M method)
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-q4_0.gguf q4_0 ./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
# update the gguf filetype to current version if older version is now unsupported # update the gguf filetype to current version if older version is now unsupported
./quantize ./models/mymodel/ggml-model-q4_0.gguf ./models/mymodel/ggml-model-q4_0-v2.gguf COPY ./quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY
``` ```
### Run the quantized model ### Run the quantized model
```bash ```bash
# start inference on a gguf model # start inference on a gguf model
./main -m ./models/mymodel/ggml-model-q4_0.gguf -n 128 ./main -m ./models/mymodel/ggml-model-Q4_K_M.gguf -n 128
``` ```
When running the larger models, make sure you have enough disk space to store all the intermediate files. When running the larger models, make sure you have enough disk space to store all the intermediate files.