| * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params | ||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| quantize.cpp | ||
| README.md | ||
| tests.sh | ||
quantize
You can also use the GGUF-my-repo space on Hugging Face to build your own quants without any setup.
Note: It is synced from llama.cpp main every 6 hours.
Llama 2 7B
| Quantization | Bits per Weight (BPW) | 
|---|---|
| Q2_K | 3.35 | 
| Q3_K_S | 3.50 | 
| Q3_K_M | 3.91 | 
| Q3_K_L | 4.27 | 
| Q4_K_S | 4.58 | 
| Q4_K_M | 4.84 | 
| Q5_K_S | 5.52 | 
| Q5_K_M | 5.68 | 
| Q6_K | 6.56 | 
Llama 2 13B
| Quantization | Bits per Weight (BPW) | 
|---|---|
| Q2_K | 3.34 | 
| Q3_K_S | 3.48 | 
| Q3_K_M | 3.89 | 
| Q3_K_L | 4.26 | 
| Q4_K_S | 4.56 | 
| Q4_K_M | 4.83 | 
| Q5_K_S | 5.51 | 
| Q5_K_M | 5.67 | 
| Q6_K | 6.56 | 
Llama 2 70B
| Quantization | Bits per Weight (BPW) | 
|---|---|
| Q2_K | 3.40 | 
| Q3_K_S | 3.47 | 
| Q3_K_M | 3.85 | 
| Q3_K_L | 4.19 | 
| Q4_K_S | 4.53 | 
| Q4_K_M | 4.80 | 
| Q5_K_S | 5.50 | 
| Q5_K_M | 5.65 | 
| Q6_K | 6.56 |