de-duplicate sections

This commit is contained in:
ngxson 2024-07-05 17:12:51 +02:00
parent 8ea02fced1
commit 81b8bab516

View file

@ -294,7 +294,11 @@ llama-cli -m your_model.gguf -p "I believe the meaning of life is" -n 128
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
```
Or, you can run launch it in conversation mode (or chat mode):
See [this page](./examples/main/README.md) for a full list of parameters.
### Conversation mode
If you want a more ChatGPT-like experience, you can run in conversation mode by passing `-cnv` as a parameter:
```bash
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv
@ -307,15 +311,7 @@ llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv
# Easy peasy! The answer to 1+1 is... 2!
```
### Conversation mode
If you want a more ChatGPT-like experience, you can run in conversation mode by passing `-cnv` as a parameter:
```bash
./llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv
```
By default, the chat template will be taken from the input model. If you want to use a custom chat template, pass `--chat-template NAME` as a parameter. See the list of [supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
By default, the chat template will be taken from the input model. If you want to use another chat template, pass `--chat-template NAME` as a parameter. See the list of [supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
```bash
./llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
@ -324,7 +320,20 @@ By default, the chat template will be taken from the input model. If you want to
You can also use your own template via in-prefix, in-suffix and reverse-prompt parameters:
```bash
./llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User:' --reverse-prompt 'User:'
./llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
```
### Web server
[llama.cpp web server](./examples/server/README.md) is a lightweight [OpenAI API](https://github.com/openai/openai-openapi) compatible HTTP server that can be used to serve local models and easily connect them to existing clients.
Example usage:
```bash
./llama-server -m your_model.gguf --port 8080
# Basic web UI can be accessed via browser: http://localhost:8080
# Chat completion endpoint: http://localhost:8080/v1/chat/completions
```
### Interactive mode
@ -370,18 +379,6 @@ PROMPT_TEMPLATE=./prompts/chat-with-bob.txt PROMPT_CACHE_FILE=bob.prompt.bin \
CHAT_SAVE_DIR=./chat/bob ./examples/chat-persistent.sh
```
### Web server
[llama.cpp web server](./examples/server/README.md) is a lightweight [OpenAI API](https://github.com/openai/openai-openapi) compatible HTTP server that can be used to serve local models and easily connect them to existing clients.
Example usage:
```bash
./llama-server -m your_model.gguf --port 8080
# Then, open in browser: http://localhost:8080
```
### Constrained output with grammars
`llama.cpp` supports grammars to constrain model output. For example, you can force the model to output JSON only: