Update README.md

This commit is contained in:
John 2023-06-17 16:23:40 +02:00 committed by GitHub
parent d4b9423560
commit 7c5f607287
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -1,12 +1,12 @@
llama.cpp modification to run Falcon (work in progress) llama.cpp modification to run Falcon (work in progress)
Status: Status:
* Quantization works except for Q_K_ types * Quantization works except for Q_K_ types
* CUDA not yet functional * CUDA not yet functional
*
It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
CPU inference examples: It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
CPU inference examples:
``` ```
Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0 Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0
main: build = 677 (dd3d346) main: build = 677 (dd3d346)