main : update README documentation for batch size

This commit is contained in:
Diego Devesa 2025-01-22 17:19:11 +01:00 committed by GitHub
parent 96f4053934
commit c1ea8ec6c9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -310,9 +310,9 @@ These options help improve the performance and memory usage of the LLaMA models.
### Batch Size ### Batch Size
- `-b N, --batch-size N`: Set the batch size for prompt processing (default: `2048`). This large batch size benefits users who have BLAS installed and enabled it during the build. If you don't have BLAS enabled ("BLAS=0"), you can use a smaller number, such as 8, to see the prompt progress as it's evaluated in some situations. - `-ub N`, `--ubatch-size N`: Physical batch size. This is the maximum number of tokens that may be processed at a time. Increasing this value may improve performance during prompt processing, at the expense of higher memory usage. Default: `512`.
- `-ub N`, `--ubatch-size N`: physical maximum batch size. This is for pipeline parallelization. Default: `512`. - `-b N, --batch-size N`: Logical batch size. Increasing this value above the value of the physical batch size may improve prompt processing performance when using multiple GPUs with pipeline parallelism. Default: `2048`
### Prompt Caching ### Prompt Caching