Update README.md
This commit is contained in:
parent
d0c460629d
commit
9d4d26554a
1 changed files with 4 additions and 2 deletions
|
@ -1,10 +1,12 @@
|
||||||
llama.cpp modification to run Falcon (work in progress)
|
llama.cpp modification to run Falcon (work in progress)
|
||||||
|
|
||||||
Status:
|
Status:
|
||||||
Quantization works except for Q_K_ types
|
* Quantization works except for Q_K_ types
|
||||||
CUDA not yet functional
|
* CUDA not yet functional
|
||||||
|
*
|
||||||
|
|
||||||
It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
|
It appears the Q5 Falcon 40B inference time on CPU is as fast as the A100 fp16 inference time at 2 tk/second
|
||||||
|
CPU inference examples:
|
||||||
```
|
```
|
||||||
Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0
|
Q:\ggllm.cpp> .\build\bin\Release\falcon_main.exe -t 31 -m Q:\models\falcon-40b\q5_1 -p "Love relates to hate like" -n 50 -ngl 0
|
||||||
main: build = 677 (dd3d346)
|
main: build = 677 (dd3d346)
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue