# gglm Support for RedPajama Model ## Ackonwledgement We highly appreciate the great effort from the fork of [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp). Our support of the RedPajama Model is mainly based on this implementation. We extend the model configure and fixed a bug when setting use_parallel_residual flag to False in their original implementation. We also extend the chat model for RedPajama. ## Usage: ### RedPajama Chat model: - Make the code: make redpajama-chat quantize-gptneox - Prepare the RedPajama model (f16 and q4_0) for gglm: bash ./examples/redpajama/scripts/install-RedPajama-INCITE-Chat-3B-v1.sh - Run RedPajama chat model (fp16): ./redpajama-chat -m ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Chat-3B-v1-f16.bin \ -c 2048 \ -b 128 \ -n 1 \ -t 8 \ --instruct \ --color \ --top_k 30 \ --top_p 0.95 \ --temp 0.8 \ --repeat_last_n 3 \ --repeat_penalty 1.1 \ --seed 0 Note that you may need to install torch and transformers to run the above scripts, e.g.: pip install torch==2.0.0 pip install transformers==4.28.1 - Run RedPajama chat model (q4_0): ./redpajama-chat -m ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Chat-3B-v1-q4_0.bin \ -c 2048 \ -b 128 \ -n 1 \ -t 8 \ --instruct \ --color \ --top_k 30 \ --top_p 0.95 \ --temp 0.8 \ --repeat_last_n 3 \ --repeat_penalty 1.1 \ --seed 0 - Run other quantized version of RedPajama Chat model (Make sure you get the f16 model prepared before you run this): - Make the code to quantize the model if you have not: make quantize-gptneox - Generate the quantized model, the supported types include: q4_0, q4_1, q4_2, q5_0, q5_1, and q8_0. For example, to run q4_1, you need to do the following convertion: python ./examples/redpajama/scripts/quantize-gptneox.py ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Chat-3B-v1-f16.bin --quantize-output-type q4_1 - Then you can chat with the quantized model: ./redpajama-chat -m ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Chat-3B-v1-q4_1.bin \ -c 2048 \ -b 128 \ -n 1 \ -t 8 \ --instruct \ --color \ --top_k 30 \ --top_p 0.95 \ --temp 0.8 \ --repeat_last_n 3 \ --repeat_penalty 1.1 \ --seed 0 ### RedPajama Base/Instruct model: - Make the code: make redpajama quantize-gptneox - Prepare the RedPajama Base/Instruct model (f16 and q4_0) for gglm: bash ./examples/redpajama/scripts/install-RedPajama-INCITE-Base-3B-v1.sh # Or bash ./examples/redpajama/scripts/install-RedPajama-INCITE-Instruct-3B-v1.sh - Run other quantize version of RedPajama Base/Instruct model (Make sure you get the f16 model prepared before you run this). Then you can generate the quantized model, the supported types include: q4_0, q4_1, q4_2, q5_0, q5_1, and q8_0. For example, to run q4_1, you need to do the following convertion, e.g for RedPajama-Base q8_0: python ./examples/redpajama/scripts/quantize-gptneox.py ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Base-3B-v1-f16.bin --quantize-output-type q8_0 - Run RedPajama Base/Instruct model (e.g., RedPajama-Instruct q8_0) : ./redpajama -m ./examples/redpajama/models/pythia/ggml-RedPajama-INCITE-Instruct-3B-v1-q8_0.bin \ -c 2048 \ -b 128 \ -n 1 \ -t 8 \ --color \ --top_k 30 \ --top_p 0.95 \ --temp 0.8 \ --repeat_last_n 3 \ --repeat_penalty 1.1 \ --seed 0 \ --n_predict 256 \ --verbose-prompt \ -p "How to schedule a tour to Anfield:" ## Attribution The following files are covered by a MIT license and were taken from: https://github.com/byroneverson/gptneox.cpp Thank you Byron. ``` common-gptneox.cpp copy-gptneox.cpp gptneox.cpp quantize-gptneox.cpp common-gptneox.h gptneox-util.h gptneox.h convert_gptneox_to_ggml.py quantize-gptneox.py ```