From 0adf4c73bc7575651ff8a38dad66bdde39f57185 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tr=E1=BA=A7n=20=C4=90=E1=BB=A9c=20Nam?= <v.namtd12@vinai.io>
Date: Sat, 16 Dec 2023 11:38:42 +0700
Subject: [PATCH] update: benchmark results for llama2-7b

---
 examples/awqutils/README.md | 41 ++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/examples/awqutils/README.md b/examples/awqutils/README.md
index 481cbcde0..dca87090c 100644
--- a/examples/awqutils/README.md
+++ b/examples/awqutils/README.md
@@ -38,9 +38,10 @@ The perplexity measurements in table above are done against the `wikitext2` test
 
 ## Results
 
-### Memory/Disk Requirements
+### Llama 7B
+Build with OpenBLAS
 
-Llama 7B
+#### Memory/Disk Requirements
 
 | Model |     Original  |     AWQ-4bit  | 
 |------:|--------------:|--------------:|
@@ -49,19 +50,49 @@ Llama 7B
 |  q4_1 |     4.041  GB |     4.041  GB |
 |  q2_k |     2.649  GB |     2.649  GB |
 
-### Quantization
+#### Quantization
 
 Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
 
 | Model      | Measure      | F16    | Q4_0   | Q4_1   | Q2_K   |
 |-----------:|--------------|-------:|-------:|-------:|-------:|
-|Llama 7B    | perplexity   | 5.9066 | 6.1214 | 6.0643 | xxxxxx|
+|Llama 7B    | perplexity   | 5.9066 | 6.1214 | 6.0643 | 6.5808 |
 |Llama 7B    | file size    |  12.9G  |   3.5G |   3.9G |   2.7G |
 |Llama 7B    | ms/tok @ 4th |    xxx |     xx |     xx |     xx |
 |Llama 7B    | ms/tok @ 8th |    xxx |     xx |     xx |     xx |
 |Llama 7B    | bits/weight  |   16.0 |    4.5 |    5.0 |    2.6 |
-|AWQ-LLama 7B| perplexity   | 5.9175 | 6.0252 | 5.9987 | xxxxx |
+|AWQ-LLama 7B| perplexity   | 5.9175 | 6.0252 | 5.9987 | 6.3692 |
 |AWQ-LLama 7B| file size    |  12.9G  |   3.5G |   3.9G |   2.7G |
 |AWQ-LLama 7B| ms/tok @ 4th |     xxx|    xxx |    xxx |    xxx |
 |AWQ-LLama 7B| ms/tok @ 8th |     xxx|     xx |     xx |     xx |
 |AWQ-LLama 7B| bits/weight  |   16.0 |    4.5 |    5.0 |    2.6 |
+
+
+### Llama2 7B
+Build with CuBLAS
+
+#### Memory/Disk Requirements
+
+| Model |     Original  |     AWQ-4bit  | 
+|------:|--------------:|--------------:|
+|  fp16 |     12.853 GB |     12.853 GB |
+|  q4_0 |     3.647  GB |     3.647  GB |
+|  q4_1 |     4.041  GB |     4.041  GB |
+|  q2_k |     2.649  GB |     2.649  GB |
+
+#### Quantization
+
+Several quantization methods are supported. They differ in the resulting model disk size and inference speed.
+
+| Model       | Measure      | F16    | Q4_0   | Q4_1   | Q2_K   |
+|------------:|--------------|-------:|-------:|-------:|-------:|
+|Llama2 7B    | perplexity   | 5.8664 | 6.0260 | 6.0656 | 6.4496 |
+|Llama2 7B    | file size    |  12.9G  |   3.5G |   3.9G |   2.7G |
+|Llama2 7B    | ms/tok @ 4th |    xxx |     xx |     xx |     xx |
+|Llama2 7B    | ms/tok @ 8th |    xxx |     xx |     xx |     xx |
+|Llama2 7B    | bits/weight  |   16.0 |    4.5 |    5.0 |    2.6 |
+|AWQ-LLama2 7B| perplexity   | 5.8801 | 6.0054 | 5.9849 | 6.3650 |
+|AWQ-LLama2 7B| file size    |  12.9G  |   3.5G |   3.9G |   2.7G |
+|AWQ-LLama2 7B| ms/tok @ 4th |     xxx|    xxx |    xxx |    xxx |
+|AWQ-LLama2 7B| ms/tok @ 8th |     xxx|     xx |     xx |     xx |
+|AWQ-LLama2 7B| bits/weight  |   16.0 |    4.5 |    5.0 |    2.6 |