convert-no-torch -> convert-legacy-llama
This commit is contained in:
parent
85e4e2b777
commit
88e06d04e5
7 changed files with 26 additions and 26 deletions
|
@ -690,7 +690,7 @@ Building the program with BLAS support may lead to some performance improvements
|
|||
|
||||
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
|
||||
|
||||
Note: `convert.py` has been moved to `examples/convert-no-torch.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derievatives.
|
||||
Note: `convert.py` has been moved to `examples/convert-legacy-llama.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derievatives.
|
||||
It does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
|
||||
|
||||
```bash
|
||||
|
|
|
@ -282,7 +282,7 @@ function gg_run_open_llama_3b_v2 {
|
|||
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_QKK_64=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
||||
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
||||
|
||||
python3 ../examples/convert-no-torch.py ${path_models}
|
||||
python3 ../examples/convert-legacy-llama.py ${path_models}
|
||||
|
||||
model_f16="${path_models}/ggml-model-f16.gguf"
|
||||
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
||||
|
@ -417,7 +417,7 @@ function gg_run_open_llama_7b_v2 {
|
|||
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
||||
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
||||
|
||||
python3 ../examples/convert-no-torch.py ${path_models}
|
||||
python3 ../examples/convert-legacy-llama.py ${path_models}
|
||||
|
||||
model_f16="${path_models}/ggml-model-f16.gguf"
|
||||
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
||||
|
|
|
@ -17,7 +17,7 @@ Also, it is important to check that the examples and main ggml backends (CUDA, M
|
|||
### 1. Convert the model to GGUF
|
||||
|
||||
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
|
||||
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-no-torch.py](../examples/convert-no-torch.py) (for `llama/llama2` models in `.pth` format).
|
||||
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-legacy-llama.py](../examples/convert-legacy-llama.py) (for `llama/llama2` models in `.pth` format).
|
||||
|
||||
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
|
||||
|
||||
|
|
|
@ -54,10 +54,10 @@ python ./examples/llava/convert-image-encoder-to-gguf \
|
|||
--projector-type ldpv2
|
||||
```
|
||||
|
||||
4. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||
4. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||
|
||||
```sh
|
||||
python ./examples/convert-no-torch.py path/to/MobileVLM-1.7B
|
||||
python ./examples/convert-legacy-llama.py path/to/MobileVLM-1.7B
|
||||
```
|
||||
|
||||
5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`
|
||||
|
|
|
@ -50,10 +50,10 @@ python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
|
|||
python ./examples/llava/convert-image-encoder-to-gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
|
||||
```
|
||||
|
||||
5. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||
5. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||
|
||||
```sh
|
||||
python ./examples/convert-no-torch.py ../llava-v1.5-7b --skip-unknown
|
||||
python ./examples/convert-legacy-llama.py ../llava-v1.5-7b --skip-unknown
|
||||
```
|
||||
|
||||
Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
|
||||
|
@ -92,7 +92,7 @@ python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projecto
|
|||
|
||||
6) Then convert the model to gguf format:
|
||||
```console
|
||||
python ./examples/convert-no-torch.py ../llava-v1.6-vicuna-7b/ --skip-unknown
|
||||
python ./examples/convert-legacy-llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown
|
||||
```
|
||||
|
||||
7) And finally we can run the llava-cli using the 1.6 model version:
|
||||
|
|
|
@ -3,20 +3,20 @@
|
|||
set -e
|
||||
|
||||
# LLaMA v1
|
||||
python3 examples/convert-no-torch.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
# LLaMA v2
|
||||
python3 examples/convert-no-torch.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
# Code Llama
|
||||
python3 examples/convert-no-torch.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-no-torch.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
# Falcon
|
||||
python3 convert-falcon-hf-to-gguf.py ../falcon/falcon-7b 1
|
||||
|
|
|
@ -75,7 +75,7 @@ if [ "$1" -eq "1" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -90,7 +90,7 @@ if [ "$1" -eq "2" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -105,7 +105,7 @@ if [ "$1" -eq "3" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -120,7 +120,7 @@ if [ "$1" -eq "4" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -135,7 +135,7 @@ if [ "$1" -eq "5" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -150,7 +150,7 @@ if [ "$1" -eq "6" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_k.gguf q4_k
|
||||
|
@ -165,7 +165,7 @@ if [ "$1" -eq "7" ]; then
|
|||
|
||||
cd /workspace/llama.cpp
|
||||
|
||||
python3 examples/convert-no-torch.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
python3 examples/convert-legacy-llama.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
|
||||
|
||||
./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_0.gguf q4_0
|
||||
./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_k.gguf q4_k
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue