From e851199ad32bc5fa1123a91f080be332449a174c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tr=E1=BA=A7n=20=C4=90=E1=BB=A9c=20Nam?= Date: Mon, 18 Dec 2023 11:37:09 +0700 Subject: [PATCH] update: mistral 7b v1 benchmark --- examples/awqutils/README.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/examples/awqutils/README.md b/examples/awqutils/README.md index dca87090c..357f406c0 100644 --- a/examples/awqutils/README.md +++ b/examples/awqutils/README.md @@ -96,3 +96,33 @@ Several quantization methods are supported. They differ in the resulting model d |AWQ-LLama2 7B| ms/tok @ 4th | xxx| xxx | xxx | xxx | |AWQ-LLama2 7B| ms/tok @ 8th | xxx| xx | xx | xx | |AWQ-LLama2 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | + + +### Mistral 7B v0.1 +Build with CuBLAS + +#### Memory/Disk Requirements + +| Model | Original | AWQ-4bit | +|------:|--------------:|--------------:| +| fp16 | 12.853 GB | 12.853 GB | +| q4_0 | 3.647 GB | 3.647 GB | +| q4_1 | 4.041 GB | 4.041 GB | +| q2_k | 2.649 GB | 2.649 GB | + +#### Quantization + +Several quantization methods are supported. They differ in the resulting model disk size and inference speed. + +| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | +|-------------:|--------------|-------:|-------:|-------:|-------:| +|Mistral 7B | perplexity | 5.6931 | 5.8202 | 5.8268 | 6.1645 | +|Mistral 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G | +|Mistral 7B | ms/tok @ 4th | xxx | xx | xx | xx | +|Mistral 7B | ms/tok @ 8th | xxx | xx | xx | xx | +|Mistral 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | +|AWQ-Mistral 7B| perplexity | 5.6934 | 5.8020 | 5.7691 | 6.0426 | +|AWQ-Mistral 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G | +|AWQ-Mistral 7B| ms/tok @ 4th | xxx| xxx | xxx | xxx | +|AWQ-Mistral 7B| ms/tok @ 8th | xxx| xx | xx | xx | +|AWQ-Mistral 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 |