From f75125615af0889cec38d16ff2aafde7a77ee64d Mon Sep 17 00:00:00 2001
From: John <78893154+cmp-nct@users.noreply.github.com>
Date: Sat, 17 Jun 2023 18:57:40 +0200
Subject: [PATCH] Update README.md

---
 README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index be2eefeba..e122703e5 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,10 @@
 llama.cpp modification to run Falcon (work in progress)
 
-Status:  
+Status/Bugs:  
 * Quantization works except for Q_K_ types  
 * CUDA not yet functional
 * python conversion script is very basic (produces ggml v0)
+* On linux Q5_1 7B user reports a batch token ingestion context memory issue, with -b 1 it's gone. Not reproduced on Windows
 
 It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second  
 CPU inference examples: