Update README.md

2023-06-17 16:23:01 +02:00 · 2023-06-17 16:23:01 +02:00 · 9d4d26554a
commit 9d4d26554a
parent d0c460629d
1 changed files with 4 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -1,10 +1,12 @@
 llama.cpp modification to run Falcon (work in progress)

 Status:
-Quantization works except for Q_K_ types
-CUDA not yet functional
+* Quantization works except for Q_K_ types
+* CUDA not yet functional
+* 

 It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
+CPU inference examples:
 ```
 Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0
 main: build = 677 (dd3d346)