Update README.md

2024-07-05 08:20:03 -06:00 · 2024-07-05 08:20:03 -06:00 · b0589e6672
commit b0589e6672
parent 4e8d3bde75
1 changed files with 6 additions and 1 deletions
--- a/examples/main/README.md
+++ b/examples/main/README.md
@ -17,6 +17,11 @@ This example program allows you to use various LLaMA language models easily and

 To get started right away, run the following command, making sure to use the correct path for the model you have:

+First, we will need to download a model. In these examples, we will use the Gemma model from the ggml-org repo on Hugging Face.
+[https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)
+
+Once downloaded, place your model in the models folder in llama.cpp.
+
 ### Unix-based systems (Linux, macOS, etc.):

 ##### Input prompt (One-and-done)
@ -58,7 +63,7 @@ llama-cli.exe -m models\gemma-1.1-7b-it.Q4_K_M.gguf --ignore-eos -n -1
 In this section, we cover the most commonly used options for running the `llama-cli` program with the LLaMA models:

 -   `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/gemma-1.1-7b-it.Q4_K_M.gguf`; inferred from `--model-url` if set).
-   `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/models/resolve/main/phi-2/ggml-model-q4_0.gguf](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/blob/main/gemma-1.1-7b-it.Q4_K_M.gguf)).
+-   `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file (e.g [https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true](https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf?download=true)).
 -   `-i, --interactive`: Run the program in interactive mode, allowing you to provide input directly and receive real-time responses.
 -   `-n N, --n-predict N`: Set the number of tokens to predict when generating text. Adjusting this value can influence the length of the generated text.
 -   `-c N, --ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference.