From 9006401b2ab522cd59be2ccd40ea34d074e5ab21 Mon Sep 17 00:00:00 2001 From: Diego Devesa Date: Wed, 22 Jan 2025 17:21:46 +0100 Subject: [PATCH] minor --- examples/main/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/main/README.md b/examples/main/README.md index 484999409..46f92eb7a 100644 --- a/examples/main/README.md +++ b/examples/main/README.md @@ -312,7 +312,7 @@ These options help improve the performance and memory usage of the LLaMA models. - `-ub N`, `--ubatch-size N`: Physical batch size. This is the maximum number of tokens that may be processed at a time. Increasing this value may improve performance during prompt processing, at the expense of higher memory usage. Default: `512`. -- `-b N`, `--batch-size N`: Logical batch size. Increasing this value above the value of the physical batch size may improve prompt processing performance when using multiple GPUs with pipeline parallelism. Default: `2048` +- `-b N`, `--batch-size N`: Logical batch size. Increasing this value above the value of the physical batch size may improve prompt processing performance when using multiple GPUs with pipeline parallelism. Default: `2048`. ### Prompt Caching