update: benchmark results for llama2-7b

This commit is contained in:
Trần Đức Nam 2023-12-16 11:38:42 +07:00
parent 8a3ceced04
commit 0adf4c73bc

View file

@ -38,9 +38,10 @@ The perplexity measurements in table above are done against the `wikitext2` test
## Results ## Results
### Memory/Disk Requirements ### Llama 7B
Build with OpenBLAS
Llama 7B #### Memory/Disk Requirements
| Model | Original | AWQ-4bit | | Model | Original | AWQ-4bit |
|------:|--------------:|--------------:| |------:|--------------:|--------------:|
@ -49,19 +50,49 @@ Llama 7B
| q4_1 | 4.041 GB | 4.041 GB | | q4_1 | 4.041 GB | 4.041 GB |
| q2_k | 2.649 GB | 2.649 GB | | q2_k | 2.649 GB | 2.649 GB |
### Quantization #### Quantization
Several quantization methods are supported. They differ in the resulting model disk size and inference speed. Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | | Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K |
|-----------:|--------------|-------:|-------:|-------:|-------:| |-----------:|--------------|-------:|-------:|-------:|-------:|
|Llama 7B | perplexity | 5.9066 | 6.1214 | 6.0643 | xxxxxx| |Llama 7B | perplexity | 5.9066 | 6.1214 | 6.0643 | 6.5808 |
|Llama 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G | |Llama 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G |
|Llama 7B | ms/tok @ 4th | xxx | xx | xx | xx | |Llama 7B | ms/tok @ 4th | xxx | xx | xx | xx |
|Llama 7B | ms/tok @ 8th | xxx | xx | xx | xx | |Llama 7B | ms/tok @ 8th | xxx | xx | xx | xx |
|Llama 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | |Llama 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 |
|AWQ-LLama 7B| perplexity | 5.9175 | 6.0252 | 5.9987 | xxxxx | |AWQ-LLama 7B| perplexity | 5.9175 | 6.0252 | 5.9987 | 6.3692 |
|AWQ-LLama 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G | |AWQ-LLama 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G |
|AWQ-LLama 7B| ms/tok @ 4th | xxx| xxx | xxx | xxx | |AWQ-LLama 7B| ms/tok @ 4th | xxx| xxx | xxx | xxx |
|AWQ-LLama 7B| ms/tok @ 8th | xxx| xx | xx | xx | |AWQ-LLama 7B| ms/tok @ 8th | xxx| xx | xx | xx |
|AWQ-LLama 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | |AWQ-LLama 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 |
### Llama2 7B
Build with CuBLAS
#### Memory/Disk Requirements
| Model | Original | AWQ-4bit |
|------:|--------------:|--------------:|
| fp16 | 12.853 GB | 12.853 GB |
| q4_0 | 3.647 GB | 3.647 GB |
| q4_1 | 4.041 GB | 4.041 GB |
| q2_k | 2.649 GB | 2.649 GB |
#### Quantization
Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K |
|------------:|--------------|-------:|-------:|-------:|-------:|
|Llama2 7B | perplexity | 5.8664 | 6.0260 | 6.0656 | 6.4496 |
|Llama2 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G |
|Llama2 7B | ms/tok @ 4th | xxx | xx | xx | xx |
|Llama2 7B | ms/tok @ 8th | xxx | xx | xx | xx |
|Llama2 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 |
|AWQ-LLama2 7B| perplexity | 5.8801 | 6.0054 | 5.9849 | 6.3650 |
|AWQ-LLama2 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G |
|AWQ-LLama2 7B| ms/tok @ 4th | xxx| xxx | xxx | xxx |
|AWQ-LLama2 7B| ms/tok @ 8th | xxx| xx | xx | xx |
|AWQ-LLama2 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 |