test anchor link
This commit is contained in:
parent
ffb06a345e
commit
89b377d7f8
3 changed files with 9 additions and 0 deletions
|
@ -293,6 +293,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|||
cmake --build . -config Release
|
||||
```
|
||||
|
||||
<a name="cublas"></a>
|
||||
|
||||
- **cuBLAS**
|
||||
|
||||
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
|
||||
|
|
7
docs/token_generation_performance_tips.md
Normal file
7
docs/token_generation_performance_tips.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
# Token generation performance tips
|
||||
|
||||
## Verifying that the model is running on the GPU
|
||||
Make sure you compiled llama with the correct env variables according to [this guide](../README.md#cublas)
|
||||
|
||||
When running `llama.cpp`, outputs some helpful diagnostic information to stderr.
|
||||
To verify that the workload is
|
Loading…
Add table
Add a link
Reference in a new issue