add LLaMA 3 8b scoreboard
This commit is contained in:
parent
c31c0a8b4c
commit
fcdd66a7a2
1 changed files with 63 additions and 4 deletions
|
@ -30,7 +30,48 @@ In addition to the KL divergence the following statistics are calculated with `-
|
||||||
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggerganov/llama.cpp/discussions/2875 .
|
* The root mean square of the change in token probabilities. If you were to assume that the quantization simply causes Gaussian noise on the token probabilities then this would be the standard deviation of said noise. The uncertainty on the value is calculated that the change in token probabilities follows a Gaussian distribution. Related discussion: https://github.com/ggerganov/llama.cpp/discussions/2875 .
|
||||||
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
|
* Same top p: Percentage of how often the token was assigned the highest probabilites by both models. The uncertainty is calculated from the Gaussian approximation of the binomial distribution.
|
||||||
|
|
||||||
## Sample results
|
## LLaMA 3 8b Scoreboard
|
||||||
|
|
||||||
|
Results are sorted by Kullback-Leibler divergence relative to FP16.
|
||||||
|
The "WT 2.7m" importance matrix was created using 2.7 million Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
|
||||||
|
|
||||||
|
| Quantization | imatrix | Model size [GiB] | PPL | ΔPPL | KLD | RMS Δp |
|
||||||
|
|--------------|---------|------------------|-------------------|----------------------|---------------------|------------------|
|
||||||
|
| f16 | None | 14.97 | 6.7684 ± 0.04278 | - | - | - |
|
||||||
|
| q8_0 | None | 7.96 | 6.7687 ± 0.04277 | 0.005872 ± 0.001347 | 0.001391 ± 0.000007 | 1.210 ± 0.007 % |
|
||||||
|
| q6_K | None | 6.14 | 6.8007 ± 0.04312 | 0.037777 ± 0.002294 | 0.005669 ± 0.000046 | 2.343 ± 0.026 % |
|
||||||
|
| q5_K_M | None | 5.33 | 6.8308 ± 0.04330 | 0.067952 ± 0.003060 | 0.011093 ± 0.000086 | 3.173 ± 0.030 % |
|
||||||
|
| q5_K_S | None | 5.21 | 6.8877 ± 0.04378 | 0.124777 ± 0.003891 | 0.017177 ± 0.000135 | 3.947 ± 0.037 % |
|
||||||
|
| q5_1 | None | 5.65 | 6.8888 ± 0.04373 | 0.125879 ± 0.004015 | 0.018485 ± 0.000141 | 4.089 ± 0.039 % |
|
||||||
|
| q5_0 | None | 5.21 | 6.8988 ± 0.04373 | 0.135923 ± 0.004525 | 0.022964 ± 0.000170 | 4.631 ± 0.042 % |
|
||||||
|
| q4_K_M | WT 2.7m | 4.58 | 6.9164 ± 0.04390 | 0.153559 ± 0.005115 | 0.029126 ± 0.000256 | 5.270 ± 0.050 % |
|
||||||
|
| q4_K_M | None | 4.58 | 6.9593 ± 0.04415 | 0.196383 ± 0.005343 | 0.032032 ± 0.000248 | 5.531 ± 0.050 % |
|
||||||
|
| q4_K_S | WT 2.7m | 4.37 | 6.9393 ± 0.04396 | 0.176470 ± 0.005377 | 0.032768 ± 0.000266 | 5.630 ± 0.052 % |
|
||||||
|
| iq4_NL | WT 2.7m | 4.35 | 7.0114 ± 0.04468 | 0.248562 ± 0.005915 | 0.036482 ± 0.000286 | 5.965 ± 0.053 % |
|
||||||
|
| iq4_XS | WT 2.7m | 4.14 | 7.0091 ± 0.04459 | 0.246254 ± 0.005918 | 0.037087 ± 0.000292 | 6.009 ± 0.053 % |
|
||||||
|
| q4_K_S | None | 4.37 | 7.0545 ± 0.04481 | 0.291578 ± 0.006429 | 0.044040 ± 0.000320 | 6.511 ± 0.055 % |
|
||||||
|
| q4_1 | None | 4.78 | 7.2571 ± 0.04658 | 0.494238 ± 0.009036 | 0.072530 ± 0.000507 | 8.368 ± 0.062 % |
|
||||||
|
| q4_0 | None | 4.34 | 7.2927 ± 0.04665 | 0.529800 ± 0.009048 | 0.073598 ± 0.000486 | 8.395 ± 0.061 % |
|
||||||
|
| q3_K_L | WT 2.7m | 4.03 | 7.2330 ± 0.04666 | 0.470087 ± 0.009268 | 0.074345 ± 0.000530 | 8.577 ± 0.064 % |
|
||||||
|
| q3_K_M | WT 2.7m | 3.74 | 7.2941 ± 0.04699 | 0.531254 ± 0.010144 | 0.085849 ± 0.000596 | 9.236 ± 0.065 % |
|
||||||
|
| q3_K_L | None | 4.03 | 7.3483 ± 0.04729 | 0.585400 ± 0.010379 | 0.088558 ± 0.000611 | 9.333 ± 0.066 % |
|
||||||
|
| q3_K_M | None | 3.74 | 7.4524 ± 0.04789 | 0.689517 ± 0.011427 | 0.103797 ± 0.000675 | 10.111 ± 0.068 % |
|
||||||
|
| iq3_M | WT 2.7m | 3.53 | 7.5051 ± 0.04715 | 0.742584 ± 0.010752 | 0.104464 ± 0.000676 | 10.383 ± 0.066 % |
|
||||||
|
| iq3_S | WT 2.7m | 3.42 | 7.5693 ± 0.04794 | 0.806473 ± 0.011620 | 0.113201 ± 0.000719 | 10.669 ± 0.067 % |
|
||||||
|
| iq3_XS | WT 2.7m | 3.28 | 7.8058 ± 0.04967 | 1.042930 ± 0.013767 | 0.140704 ± 0.000846 | 11.979 ± 0.070 % |
|
||||||
|
| iq3_XXS | WT 2.7m | 3.05 | 8.0537 ± 0.05169 | 1.290849 ± 0.016815 | 0.187044 ± 0.001042 | 13.722 ± 0.073 % |
|
||||||
|
| q3_K_S | WT 2.7m | 3.41 | 8.4003 ± 0.05409 | 1.637409 ± 0.018650 | 0.208394 ± 0.001018 | 15.201 ± 0.070 % |
|
||||||
|
| q3_K_S | None | 3.41 | 8.6701 ± 0.05627 | 1.907244 ± 0.020902 | 0.236401 ± 0.001084 | 15.601 ± 0.069 % |
|
||||||
|
| iq2_M | WT 2.7m | 2.74 | 9.4260 ± 0.06254 | 2.663082 ± 0.028667 | 0.331202 ± 0.001611 | 18.368 ± 0.079 % |
|
||||||
|
| q2_K | WT 2.7m | 2.96 | 9.4737 ± 0.06303 | 2.710844 ± 0.029119 | 0.342129 ± 0.001565 | 18.996 ± 0.078 % |
|
||||||
|
| iq2_S | WT 2.7m | 2.56 | 10.6301 ± 0.07237 | 3.867287 ± 0.039162 | 0.446305 ± 0.001972 | 21.324 ± 0.082 % |
|
||||||
|
| q2_K | None | 2.96 | 10.6450 ± 0.07158 | 3.882171 ± 0.038471 | 0.457258 ± 0.001851 | 21.416 ± 0.078 % |
|
||||||
|
| iq2_XS | WT 2.7m | 2.43 | 11.8063 ± 0.08064 | 5.043388 ± 0.048007 | 0.556747 ± 0.002136 | 23.752 ± 0.082 % |
|
||||||
|
| iq2_XXS | WT 2.7m | 2.24 | 15.6064 ± 0.11301 | 8.843541 ± 0.081477 | 0.830947 ± 0.002749 | 28.363 ± 0.084 % |
|
||||||
|
| iq1_M | WT 2.7m | 2.01 | 28.6561 ± 0.21012 | 21.893176 ± 0.180729 | 1.413517 ± 0.003550 | 37.785 ± 0.084 % |
|
||||||
|
| iq1_S | WT 2.7m | 1.88 | 69.6303 ± 0.56051 | 62.867391 ± 0.535295 | 2.290167 ± 0.004882 | 45.826 ± 0.086 % |
|
||||||
|
|
||||||
|
## LLaMA 2 vs. LLaMA 3 Quantization comparison
|
||||||
|
|
||||||
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
|
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
|
||||||
|-----------------|---------------------|----------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
|
|-----------------|---------------------|----------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
|
||||||
|
@ -50,10 +91,28 @@ In addition to the KL divergence the following statistics are calculated with `-
|
||||||
| RMS Δp | 9.762 ± 0.053 % | 21.393 ± 0.078 % | 3.252 ± 0.024 % | 5.429 ± 0.051 % | 1.339 ± 0.010 % | 2.096 ± 0.029 % | 0.618 ± 0.011 % | 0.867 ± 0.007 % |
|
| RMS Δp | 9.762 ± 0.053 % | 21.393 ± 0.078 % | 3.252 ± 0.024 % | 5.429 ± 0.051 % | 1.339 ± 0.010 % | 2.096 ± 0.029 % | 0.618 ± 0.011 % | 0.867 ± 0.007 % |
|
||||||
| Same top p | 85.584 ± 0.086 % | 70.419 ± 0.120 % | 94.665 ± 0.055 % | 92.162 ± 0.071 % | 97.520 ± 0.038 % | 96.586 ± 0.048 % | 98.846 ± 0.026 % | 98.467 ± 0.032 % |
|
| Same top p | 85.584 ± 0.086 % | 70.419 ± 0.120 % | 94.665 ± 0.055 % | 92.162 ± 0.071 % | 97.520 ± 0.038 % | 96.586 ± 0.048 % | 98.846 ± 0.026 % | 98.467 ± 0.032 % |
|
||||||
|
|
||||||
<details>
|
| Metric | L2 70b q2_K | L3 70b q2_K | L2 70b q4_K_M | L3 70b q4_K_M | L2 70b q6_K | L3 70b q6_K | L2 70b q8_0 | L3 70b q8_0 |
|
||||||
<summary>Old numbers</summary>
|
|-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
|
||||||
|
| Mean PPL | 4.172530 ± 0.020805 | 5.902798 ± 0.035278 | 3.475398 ± 0.016580 | 3.193431 ± 0.016621 | 3.440612 ± 0.016372 | 3.052153 ± 0.015746 | 3.434686 ± 0.016346 | 3.039482 ± 0.015687 |
|
||||||
|
| Mean PPL ratio | 1.215161 ± 0.002103 | 1.942461 ± 0.007686 | 1.012136 ± 0.000413 | 1.050877 ± 0.001032 | 1.002006 ± 0.000193 | 1.004386 ± 0.000413 | 1.000280 ± 0.000119 | 1.000217 ± 0.000264 |
|
||||||
|
| Mean ΔPPL | 0.738805 ± 0.007888 | 2.863974 ± 0.025573 | 0.041672 ± 0.001433 | 0.154607 ± 0.003206 | 0.006887 ± 0.000664 | 0.013329 ± 0.001256 | 0.000961 ± 0.000408 | 0.000658 ± 0.000803 |
|
||||||
|
| PPL correlation | 93.80% | 75.67% | 99.63% | 98.21% | 99.92% | 99.68% | 99.97% | 99.87% |
|
||||||
|
| Mean KLD | 0.186386 ± 0.001134 | 0.674716 ± 0.003267 | 0.013168 ± 0.000095 | 0.055418 ± 0.000506 | 0.002736 ± 0.000018 | 0.009148 ± 0.000100 | 0.000878 ± 0.000006 | 0.003088 ± 0.000040 |
|
||||||
|
| Mean Δp | -5.417 ± 0.040 % | -17.236 ± 0.078 % | -0.350 ± 0.010 % | -1.678 ± 0.026 % | -0.076 ± 0.005 % | -0.202 ± 0.010 % | -0.005 ± 0.003 % | -0.007 ± 0.006 % |
|
||||||
|
| Maximum Δp | 95.064% | 95.799% | 80.018% | 91.140% | 28.193% | 63.263% | 25.395% | 50.187% |
|
||||||
|
| 99.9% Δp | 46.526% | 60.640% | 23.562% | 47.583% | 10.424% | 24.634% | 6.548% | 14.033% |
|
||||||
|
| 99.0% Δp | 21.251% | 26.948% | 10.161% | 18.666% | 5.339% | 10.273% | 3.337% | 6.323% |
|
||||||
|
| Median Δp | -0.447% | -3.780% | -0.004% | -0.022% | -0.001% | -0.002% | -0.000% | 0.000% |
|
||||||
|
| 1.0% Δp | -81.379% | -98.506% | -15.142% | -47.638% | -5.866% | -13.230% | -3.333% | -6.609% |
|
||||||
|
| 0.1% Δp | -97.547% | -99.873% | -37.914% | -82.914% | -13.351% | -30.683% | -6.096% | -15.564% |
|
||||||
|
| Minimum Δp | -99.965% | -99.993% | -81.378% | -98.505% | -46.213% | -82.746% | -34.335% | -63.634% |
|
||||||
|
| RMS Δp | 17.237 ± 0.077 % | 34.361 ± 0.094 % | 4.154 ± 0.032 % | 9.915 ± 0.067 % | 1.899 ± 0.015 % | 3.721 ± 0.030 % | 1.085 ± 0.007 % | 2.124 ± 0.018 % |
|
||||||
|
| Same top p | 85.001 ± 0.087 % | 71.991 ± 0.118 % | 95.632 ± 0.050 % | 92.881 ± 0.068 % | 97.651 ± 0.037 % | 96.538 ± 0.048 % | 98.502 ± 0.030 % | 97.825 ± 0.038 % |
|
||||||
|
|
||||||
## Llama 2 70B Scorechart
|
## Old Numbers
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Llama 2 70B Scorechart</summary>
|
||||||
|
|
||||||
| Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
|
| Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
|
||||||
|--------------|------------------|------------|---------------|
|
|--------------|------------------|------------|---------------|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue