Update README.md
This commit is contained in:
parent
f9118b0ca5
commit
2797754843
1 changed files with 1 additions and 2 deletions
|
@ -3,8 +3,7 @@ llama.cpp modification to run Falcon (work in progress)
|
||||||
Status:
|
Status:
|
||||||
* Quantization works except for Q_K_ types
|
* Quantization works except for Q_K_ types
|
||||||
* CUDA not yet functional
|
* CUDA not yet functional
|
||||||
* context size calculation not proper (cuda as well as cpu)
|
* python conversion script is very basic (produces ggml v0)
|
||||||
|
|
||||||
|
|
||||||
It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
|
It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
|
||||||
CPU inference examples:
|
CPU inference examples:
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue