diff --git a/README.md b/README.md index 15a972772..ab876bceb 100644 --- a/README.md +++ b/README.md @@ -338,7 +338,7 @@ Building the program with BLAS support may lead to some performance improvements ``` The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance: - + | Option | Legal values | Default | Description | |-------------------------|------------------------|---------|-------------| | LLAMA_CUDA_DMMV_X | Positive integer >= 32 | 32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |