make build instructions generic
This commit is contained in:
parent
3ff93c9c3a
commit
0eff982f61
1 changed files with 23 additions and 17 deletions
40
README.md
40
README.md
|
@ -33,12 +33,13 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
|
||||||
<li><a href="#get-the-code">Get the Code</a></li>
|
<li><a href="#get-the-code">Get the Code</a></li>
|
||||||
<li><a href="#build">Build</a></li>
|
<li><a href="#build">Build</a></li>
|
||||||
<li><a href="#blas-build">BLAS Build</a></li>
|
<li><a href="#blas-build">BLAS Build</a></li>
|
||||||
<li><a href="#prepare-data--run">Prepare Data & Run</a></li>
|
<li><a href="#prepare-and-quantize">Prepare and Quantize</a></li>
|
||||||
|
<li><a href="#run-the-quantized-model">Run the quantized model</a></li>
|
||||||
<li><a href="#memorydisk-requirements">Memory/Disk Requirements</a></li>
|
<li><a href="#memorydisk-requirements">Memory/Disk Requirements</a></li>
|
||||||
<li><a href="#quantization">Quantization</a></li>
|
<li><a href="#quantization">Quantization</a></li>
|
||||||
<li><a href="#interactive-mode">Interactive mode</a></li>
|
<li><a href="#interactive-mode">Interactive mode</a></li>
|
||||||
<li><a href="#constrained-output-with-grammars">Constrained output with grammars</a></li>
|
<li><a href="#constrained-output-with-grammars">Constrained output with grammars</a></li>
|
||||||
<li><a href="#instruction-mode-with-alpaca-and-similar-instruct-models">Instruction mode with Alpaca and similar Instruct models</a></li>
|
<li><a href="#instruct-mode">Instruct mode</a></li>
|
||||||
<li><a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a></li>
|
<li><a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a></li>
|
||||||
<li><a href="#seminal-papers-and-background-on-the-models">Seminal papers and background on the models</a></li>
|
<li><a href="#seminal-papers-and-background-on-the-models">Seminal papers and background on the models</a></li>
|
||||||
<li><a href="#perplexity-measuring-model-quality">Perplexity (measuring model quality)</a></li>
|
<li><a href="#perplexity-measuring-model-quality">Perplexity (measuring model quality)</a></li>
|
||||||
|
@ -86,8 +87,6 @@ Typically finetunes of the base models below are supported as well.
|
||||||
- [X] [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
- [X] [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
||||||
- [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
|
- [x] [Mixtral MoE](https://huggingface.co/models?search=mistral-ai/Mixtral)
|
||||||
- [X] Falcon
|
- [X] Falcon
|
||||||
- [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)
|
|
||||||
- [X] [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all)
|
|
||||||
- [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) and [Chinese LLaMA-2 / Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)
|
- [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) and [Chinese LLaMA-2 / Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)
|
||||||
- [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne)
|
- [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne)
|
||||||
- [X] [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/)
|
- [X] [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/)
|
||||||
|
@ -243,7 +242,7 @@ https://user-images.githubusercontent.com/1991296/224442907-7693d4be-acaa-4e01-8
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
Here are the end-to-end binary build and model conversion steps for the LLaMA 2 7B model.
|
Here are the end-to-end binary build and model conversion steps for most supported models.
|
||||||
|
|
||||||
### Get the Code
|
### Get the Code
|
||||||
|
|
||||||
|
@ -657,9 +656,9 @@ Building the program with BLAS support may lead to some performance improvements
|
||||||
# ggml_vulkan: Using Intel(R) Graphics (ADL GT2) | uma: 1 | fp16: 1 | warp size: 32
|
# ggml_vulkan: Using Intel(R) Graphics (ADL GT2) | uma: 1 | fp16: 1 | warp size: 32
|
||||||
```
|
```
|
||||||
|
|
||||||
### Prepare Data & Run
|
### Prepare and Quantize
|
||||||
|
|
||||||
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section.
|
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# obtain the official LLaMA model weights and place them in ./models
|
# obtain the official LLaMA model weights and place them in ./models
|
||||||
|
@ -667,25 +666,32 @@ ls ./models
|
||||||
llama-2-7b tokenizer_checklist.chk tokenizer.model
|
llama-2-7b tokenizer_checklist.chk tokenizer.model
|
||||||
# [Optional] for models using BPE tokenizers
|
# [Optional] for models using BPE tokenizers
|
||||||
ls ./models
|
ls ./models
|
||||||
<folder containing .pth weights> vocab.json
|
<folder containing weights and tokenizer json> vocab.json
|
||||||
|
# [Optional] for PyTorch .bin models like Mistral-7B
|
||||||
|
ls ./models
|
||||||
|
<folder containing weights and tokenizer json>
|
||||||
|
|
||||||
# install Python dependencies
|
# install Python dependencies
|
||||||
python3 -m pip install -r requirements.txt
|
python3 -m pip install -r requirements.txt
|
||||||
|
|
||||||
# convert the 7B model to ggml FP16 format
|
# convert the model to ggml FP16 format
|
||||||
python3 convert.py models/llama-2-7b/
|
python3 convert.py models/mymodel/
|
||||||
|
|
||||||
# [Optional] for models using BPE tokenizers
|
# [Optional] for models using BPE tokenizers
|
||||||
python convert.py models/llama-2-7b/ --vocabtype bpe
|
python convert.py models/mymodel/ --vocabtype bpe
|
||||||
|
|
||||||
# quantize the model to 4-bits (using q4_0 method)
|
# quantize the model to 4-bits (using q4_0 method)
|
||||||
./quantize ./models/llama-2-7b/ggml-model-f16.gguf ./models/llama-2-7b/ggml-model-q4_0.gguf q4_0
|
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-q4_0.gguf q4_0
|
||||||
|
|
||||||
# update the gguf filetype to current if older version is unsupported by another application
|
# update the gguf filetype to current version if older version is now unsupported
|
||||||
./quantize ./models/llama-2-7b/ggml-model-q4_0.gguf ./models/llama-2-7b/ggml-model-q4_0-v2.gguf COPY
|
./quantize ./models/mymodel/ggml-model-q4_0.gguf ./models/mymodel/ggml-model-q4_0-v2.gguf COPY
|
||||||
|
```
|
||||||
|
|
||||||
# run the inference
|
### Run the quantized model
|
||||||
./main -m ./models/llama-2-7b/ggml-model-q4_0.gguf -n 128
|
|
||||||
|
```bash
|
||||||
|
# start inference on a gguf model
|
||||||
|
./main -m ./models/mymodel/ggml-model-q4_0.gguf -n 128
|
||||||
```
|
```
|
||||||
|
|
||||||
When running the larger models, make sure you have enough disk space to store all the intermediate files.
|
When running the larger models, make sure you have enough disk space to store all the intermediate files.
|
||||||
|
@ -822,7 +828,7 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
|
||||||
|
|
||||||
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
|
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
|
||||||
|
|
||||||
### Instruction mode with Alpaca and similar Instruct models
|
### Instruct mode
|
||||||
|
|
||||||
1. First, download and place the `ggml` model into the `./models` folder
|
1. First, download and place the `ggml` model into the `./models` folder
|
||||||
2. Run the `main` tool like this:
|
2. Run the `main` tool like this:
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue