Update demo in README.md (#6)
* Update demo video in README.md * Update demo at README.md
This commit is contained in:
parent
64d83e1fd5
commit
a81d4a5ea0
1 changed files with 5 additions and 3 deletions
|
@ -1,11 +1,13 @@
|
|||
# PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
|
||||
---
|
||||
|
||||
*Demo* 🔥
|
||||
## Demo 🔥
|
||||
|
||||
https://github.com/hodlen/PowerInfer/assets/34213478/b782ccc8-0a2a-42b6-a6aa-07b2224a66f7
|
||||
https://github.com/SJTU-IPADS/PowerInfer/assets/34213478/d26ae05b-d0cf-40b6-8788-bda3fe447e28
|
||||
|
||||
<sub>The demo is running with a single 24G 4090 GPU, the model is Falcon (ReLU)-40B, and the precision is FP16.</sub>
|
||||
PowerInfer v.s. llama.cpp on a single RTX 4090(24G) running Falcon(ReLU)-40B-FP16 with a 11x speedup!
|
||||
|
||||
<sub>Both PowerInfer and llama.cpp were running on the same hardware and fully utilized VRAM on RTX 4090.</sub>
|
||||
|
||||
---
|
||||
## Abstract
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue