From 6ae8567a304cb9895be11a2767f73958afebe410 Mon Sep 17 00:00:00 2001 From: John <78893154+cmp-nct@users.noreply.github.com> Date: Sat, 17 Jun 2023 16:23:40 +0200 Subject: [PATCH] Update README.md --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index b1aa85f74..ae56b6d1a 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@ llama.cpp modification to run Falcon (work in progress) -Status: -* Quantization works except for Q_K_ types -* CUDA not yet functional -* +Status: +* Quantization works except for Q_K_ types +* CUDA not yet functional -It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second -CPU inference examples: + +It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second +CPU inference examples: ``` Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0 main: build = 677 (dd3d346)