From 92e728d570b60e6d25ee138ae777bf6dc62daedd Mon Sep 17 00:00:00 2001 From: Fattire <528174+fat-tire@users.noreply.github.com> Date: Wed, 3 Apr 2024 10:35:03 -0700 Subject: [PATCH] Update README.md Remove an extra space before newline. --- examples/server/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/server/README.md b/examples/server/README.md index 29e679d40..626d55d4b 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -53,7 +53,7 @@ page cache before using this. See https://github.com/ggerganov/llama.cpp/issues/ - `-cb`, `--cont-batching`: Enable continuous batching (a.k.a dynamic batching). Default: disabled - `-spf FNAME`, `--system-prompt-file FNAME` Set a file to load a system prompt (initial prompt of all slots). This is useful for chat applications. [See more](#change-system-prompt-on-runtime) - `--mmproj MMPROJ_FILE`: Path to a multimodal projector file for LLaVA. -- `--grp-attn-n`: Set the group attention factor to extend context size through self-extend. Used together with group attention width `--grp-attn-w`. Default: `1`, which is disabled. +- `--grp-attn-n`: Set the group attention factor to extend context size through self-extend. Used together with group attention width `--grp-attn-w`. Default: `1`, which is disabled. - `--grp-attn-w`: Set the group attention width to extend context size through self-extend. Used together with group attention factor `--grp-attn-n`. Default: `512` - `-n N, --n-predict N`: Set the maximum tokens to predict. Default: `-1` - `--slots-endpoint-disable`: To disable slots state monitoring endpoint. Slots state may contain user data, prompts included.