From 0e41441fa114a62ffc672ef7e96ada3392bcbd8b Mon Sep 17 00:00:00 2001 From: TheNotary Date: Wed, 26 Apr 2023 12:55:45 -0500 Subject: [PATCH 1/6] moves ggml-vocab.bin into test folder where it's used. It appears this file is only used during tests as of now. Removing it from the model folder makes it more flexible for how users are loading their model data into the project (e.g. are they using a docker bind-mounts, are they using symlinks, are they DLing models directly into this folder?) By moving this, the instructions for getting started can be safely simplified to: $ rm models/.gitkeep $ rm -r models $ ln -s /mnt/c/ai/models/LLaMA $(pwd)/models I think it's a good idea because the model files are quite large, and be useful across multiple projects so symlinks shine in this use case without creating too much confusion for the onboardee.. --- .gitignore | 2 +- tests/CMakeLists.txt | 2 +- {models => tests}/ggml-vocab.bin | Bin 3 files changed, 2 insertions(+), 2 deletions(-) rename {models => tests}/ggml-vocab.bin (100%) mode change 100644 => 100755 diff --git a/.gitignore b/.gitignore index c7573bb3b..87a71ac96 100644 --- a/.gitignore +++ b/.gitignore @@ -20,7 +20,7 @@ build-no-accel/ build-sanitize-addr/ build-sanitize-thread/ -models/* +models /main /quantize diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 81eadbc4d..51b77faaf 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -8,4 +8,4 @@ endfunction() # llama_add_test(test-double-float.c) # SLOW llama_add_test(test-quantize-fns.cpp) llama_add_test(test-quantize-perf.cpp) -llama_add_test(test-tokenizer-0.cpp ${CMAKE_CURRENT_SOURCE_DIR}/../models/ggml-vocab.bin) +llama_add_test(test-tokenizer-0.cpp ${CMAKE_CURRENT_SOURCE_DIR}/ggml-vocab.bin) diff --git a/models/ggml-vocab.bin b/tests/ggml-vocab.bin old mode 100644 new mode 100755 similarity index 100% rename from models/ggml-vocab.bin rename to tests/ggml-vocab.bin From 5b36ab5afdebf044728078844adcabc7fc355fd6 Mon Sep 17 00:00:00 2001 From: TheNotary Date: Wed, 26 Apr 2023 16:19:06 -0500 Subject: [PATCH 2/6] documentation: reflow the readme to make following the setup more clear This minor (though time consuming) change, 1) Moves the models/gmml-vocab.bin file into the test folder. 2) changes the order in which information is presented to the user 3) recommends using symlinks to link training data into the right place in the repo 4) adds some clarification around the importance of the model weights 1 is handy because it enables 'automation' towards 3, e.g. the command rm -r models/ can safely be symlinked into the models folder and the commands to do so are clearly listed and described in the README.md 2 is ultimately the only important aspect of this change. The readme currently must be read in full by the user, cached, and then returned to in order to follow along with all the steps in the documentation. 3 is (I think) handy because these files are pretty huge and not exclusive to this repo. Symlinks shine in this as that many symlinks can be created across multiple projects and all point to the same source location. If researchers were copying/ pasting these to each project, it would get out of hand fast I think. 4 seems valuable, the AI world looks really opaque to people just getting started. I did my best to be accurate with my statements in the hops that it makes it more possible for humans to become more aware of this technology and what's happening to the internet and the world. --- README.md | 97 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 66 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index be0e49e47..c50d13e24 100644 --- a/README.md +++ b/README.md @@ -150,7 +150,7 @@ And here is another demo of running both LLaMA-7B and [whisper.cpp](https://gith https://user-images.githubusercontent.com/1991296/224442907-7693d4be-acaa-4e01-8b4f-add84093ffff.mp4 -## Usage +## Getting Started Here are the steps for the LLaMA-7B model. @@ -161,17 +161,25 @@ git clone https://github.com/ggerganov/llama.cpp cd llama.cpp ``` -### Build +### Build with Linux/ Mac Using Make -In order to build llama.cpp you have three different options. +These commands are specific to Ubuntu linux but OS specific varients are just a google away given this handy dependency list. Also, if you're using your windows gaming machine, some users have reported great success in using [WSL2](https://github.com/ggerganov/llama.cpp/issues/103#issuecomment-1470440202) to install Ubuntu within Windows and following the linux build instructions to run this project. + +```bash +# Install dependencies (these are for linux, use brew equivelants for mac) +sudo apt-get update +sudo apt-get install make cmake build-essentials python3 pip git-lfs + +# Build llama.cpp +make + +# Install python dependencies +python3 -m pip install -r requirements.txt +``` + +### Build For Windows - Using `make`: - - On Linux or MacOS: - - ```bash - make - ``` - - On Windows: 1. Download the latest fortran version of [w64devkit](https://github.com/seeto/w64devkit/releases). @@ -198,6 +206,8 @@ In order to build llama.cpp you have three different options. zig build -Drelease-fast ``` +Don't forget to install the Python dependencies (e.g. `python -m pip install -r requirements.txt`) + ### BLAS Build Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). BLAS doesn't affect the normal generation performance. There are currently three different implementations of it: @@ -257,15 +267,45 @@ Building the program with BLAS support may lead to some performance improvements cmake --build . --config Release ``` -### Prepare Data & Run +### Aquiring Setting up the 7b model weights + +You can use this system to conduct research on an AI chatbot vaguely comparable to ChatGPT-3 and it will even run on your local machine without needing massive amounts of hardware. But to do so you **must** install the Alpaca 7b model weights into the models folder. + +Because these resources belong to Facebook, their official path to obtaining the data should be followed. While it's true that most researchers using the Alpaca weights obtained them from a magnet link to a torrent file, linking or sharing that magnet link should not be done in this repo due to the questionability of violating FaceBook's IP rights and also (not to be an alarmist here) the potential for the popularization of these weights to cause harm. + +- **Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.** +- The LLaMA models are officially distributed by Facebook and will **never** be provided through this repository. +- Refer to [Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to request access to the model data. Any Magnet links shared + +#### Putting the Model Weights in the Right Spot + +This guide will assume that you've downloaded the files to an arbitrary folder, `/mnt/c/ai/models/LLaMA` using some responsible means described above. + +Because model weights files are so large, you may find it convenient to use softlinks to make them appear within the `models/` folder instead of having to copy them around on your HDD. Otherwise, if you prefer, you can just download the files directly in the `models/` folder and skip the below softlink command: ```bash -# obtain the original LLaMA model weights and place them in ./models +# On linux, run this from the root of this repo +rm models/.gitkeep + +# Now that the folder is empty, this command can safely remove the models/ folder or errors if something is still there +rm -r models/ + +# Create a symlink to the folder where the LLaMA weights are located +ln -s /mnt/c/ai/models/LLaMA $(pwd)/models +``` + +### Prepare Data + +```bash +# Verify the original LLaMA model data ls ./models 65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model -# install Python dependencies -python3 -m pip install -r requirements.txt +# Check for file corruption and wrong data +sha256sum --ignore-missing -c SHA256SUMS + +# On mac, use this command instead +shasum -a 256 --ignore-missing -c SHA256SUMS # convert the 7B model to ggml FP16 format python3 convert.py models/7B/ @@ -277,6 +317,13 @@ python3 convert.py models/7B/ ./main -m ./models/7B/ggml-model-q4_0.bin -n 128 ``` +### Run the Alpaca 7b Model + +```bash +# run the inference +./main -m ./models/7B/ggml-model-q4_0.bin -n 128 +``` + When running the larger models, make sure you have enough disk space to store all the intermediate files. ### Memory/Disk Requirements @@ -374,29 +421,17 @@ python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin - The newer GPT4All-J model is not yet supported! -### Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data - -- **Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.** -- The LLaMA models are officially distributed by Facebook and will **never** be provided through this repository. -- Refer to [Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to request access to the model data. -- Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files. -- The following command will verify if you have all possible latest files in your self-installed `./models` subdirectory: - - `sha256sum --ignore-missing -c SHA256SUMS` on Linux - - or - - `shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS +### Additional Notes on the original Facebook LLaMA model and Stanford Alpaca model data - If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT: - LLaMA: -- [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) -- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) + - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) + - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) - GPT-3 -- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) + - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) - GPT-3.5 / InstructGPT / ChatGPT: -- [Aligning language models to follow instructions](https://openai.com/research/instruction-following) -- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) + - [Aligning language models to follow instructions](https://openai.com/research/instruction-following) + - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) ### Perplexity (measuring model quality) From e9c3a82bd65d51479e01498a9691a3197d30275e Mon Sep 17 00:00:00 2001 From: TheNotary Date: Wed, 26 Apr 2023 17:43:50 -0500 Subject: [PATCH 3/6] adds missing .gitkeep --- models/.gitkeep | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 models/.gitkeep diff --git a/models/.gitkeep b/models/.gitkeep new file mode 100644 index 000000000..e69de29bb From 42b63c575e4d39906e9584d086557a376c120d3e Mon Sep 17 00:00:00 2001 From: TheNotary Date: Wed, 26 Apr 2023 17:55:56 -0500 Subject: [PATCH 4/6] fixes some typos --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c50d13e24..0e7ea819a 100644 --- a/README.md +++ b/README.md @@ -281,7 +281,7 @@ Because these resources belong to Facebook, their official path to obtaining the This guide will assume that you've downloaded the files to an arbitrary folder, `/mnt/c/ai/models/LLaMA` using some responsible means described above. -Because model weights files are so large, you may find it convenient to use softlinks to make them appear within the `models/` folder instead of having to copy them around on your HDD. Otherwise, if you prefer, you can just download the files directly in the `models/` folder and skip the below softlink command: +Because the weight files are so large, you may find it convenient to use softlinks to make them appear within the `models/` folder instead of having to copy them around on your HDD. Otherwise, if you prefer, you can just download the files directly in the `models/` folder and skip the below softlink command: ```bash # On linux, run this from the root of this repo @@ -423,7 +423,8 @@ python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin ### Additional Notes on the original Facebook LLaMA model and Stanford Alpaca model data -- If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT: +If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT: + - LLaMA: - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) From 4487694d799240eb1bae02f7a2a4f56fcb5c2f51 Mon Sep 17 00:00:00 2001 From: TheNotary Date: Thu, 27 Apr 2023 12:49:05 -0500 Subject: [PATCH 5/6] updates readme per feedback --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 0e7ea819a..f0b03cc66 100644 --- a/README.md +++ b/README.md @@ -267,11 +267,11 @@ Building the program with BLAS support may lead to some performance improvements cmake --build . --config Release ``` -### Aquiring Setting up the 7b model weights +### Aquiring the Publicly Released LLaMA Models -You can use this system to conduct research on an AI chatbot vaguely comparable to ChatGPT-3 and it will even run on your local machine without needing massive amounts of hardware. But to do so you **must** install the Alpaca 7b model weights into the models folder. +You can use Llama.cpp for many purposes. Perhaps you're here to conduct research on an AI chatbot vaguely comparable to ChatGPT-3 that can run on your local machine without needing massive amounts of hardware. You can use Llama.cpp for this, but to do so you **must** install some kind of model (none are included in this repo). Due to their wild popularity, this guide walks you through obtaining Facebooks LLaMa's model weights. -Because these resources belong to Facebook, their official path to obtaining the data should be followed. While it's true that most researchers using the Alpaca weights obtained them from a magnet link to a torrent file, linking or sharing that magnet link should not be done in this repo due to the questionability of violating FaceBook's IP rights and also (not to be an alarmist here) the potential for the popularization of these weights to cause harm. +Because these resources belong to Facebook, their official path to obtaining the data should be followed. While it's true that most researchers using the LLaMa weights obtained them from a magnet link to a torrent file, linking or sharing that magnet link should not be done in this repo due to the questionability of violating FaceBook's IP rights and also (not to be an alarmist here) the potential for the popularization of these weights to cause harm. - **Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.** - The LLaMA models are officially distributed by Facebook and will **never** be provided through this repository. @@ -279,7 +279,7 @@ Because these resources belong to Facebook, their official path to obtaining the #### Putting the Model Weights in the Right Spot -This guide will assume that you've downloaded the files to an arbitrary folder, `/mnt/c/ai/models/LLaMA` using some responsible means described above. +This guide will assume that you've downloaded the files to an arbitrary folder, `/some/path/to/LLaMA` using some responsible means described above. Because the weight files are so large, you may find it convenient to use softlinks to make them appear within the `models/` folder instead of having to copy them around on your HDD. Otherwise, if you prefer, you can just download the files directly in the `models/` folder and skip the below softlink command: @@ -291,7 +291,7 @@ rm models/.gitkeep rm -r models/ # Create a symlink to the folder where the LLaMA weights are located -ln -s /mnt/c/ai/models/LLaMA $(pwd)/models +ln -s /some/path/to/LLaMA $(pwd)/models ``` ### Prepare Data From 5bac24d2f70d880acbadeba8cab1b742dbc3151e Mon Sep 17 00:00:00 2001 From: TheNotary Date: Thu, 27 Apr 2023 14:47:57 -0500 Subject: [PATCH 6/6] adds performance disclaimer on p9 file sharing via symlinks --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f0b03cc66..23c62a07a 100644 --- a/README.md +++ b/README.md @@ -163,7 +163,7 @@ cd llama.cpp ### Build with Linux/ Mac Using Make -These commands are specific to Ubuntu linux but OS specific varients are just a google away given this handy dependency list. Also, if you're using your windows gaming machine, some users have reported great success in using [WSL2](https://github.com/ggerganov/llama.cpp/issues/103#issuecomment-1470440202) to install Ubuntu within Windows and following the linux build instructions to run this project. +These commands are specific to Ubuntu linux but OS specific varients are just a google away given this handy dependency list. Also, if you're using your windows gaming machine, some users have reported great success in using [WSL2](https://github.com/ggerganov/llama.cpp/issues/103#issuecomment-1470440202) to install Ubuntu within Windows and following the linux build instructions to run this project, but the CMAKE path is really easy. ```bash # Install dependencies (these are for linux, use brew equivelants for mac) @@ -283,6 +283,8 @@ This guide will assume that you've downloaded the files to an arbitrary folder, Because the weight files are so large, you may find it convenient to use softlinks to make them appear within the `models/` folder instead of having to copy them around on your HDD. Otherwise, if you prefer, you can just download the files directly in the `models/` folder and skip the below softlink command: +**Note:** Be concious of how you're mounting the data, as that this can have significat performance impacts. WSL2 users should note that the p9 file sharing system will significantly slow things down when using symlinks to the host windows machine. + ```bash # On linux, run this from the root of this repo rm models/.gitkeep