Update README.md

2023-06-17 16:51:34 +02:00 · 2023-06-17 16:51:34 +02:00 · 2797754843
commit 2797754843
parent f9118b0ca5
1 changed files with 1 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -3,8 +3,7 @@ llama.cpp modification to run Falcon (work in progress)
 Status:  
 * Quantization works except for Q_K_ types  
 * CUDA not yet functional
-* context size calculation not proper (cuda as well as cpu)  
-
+* python conversion script is very basic (produces ggml v0)

 It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second  
 CPU inference examples: