from http://github.com/ggerganov/llama.cpp.git

Find a file

Emanuel Seemann 60f519c74a add self to license		2023-03-18 13:29:43 +01:00
.devops	Don't tell users to use a bad number of threads (#243 )	2023-03-17 19:47:35 +02:00
.github/workflows	CI Improvements (#230 )	2023-03-18 09:27:12 +02:00
extern	add modules	2023-03-18 13:24:14 +01:00
src	add all llamacpypy	2023-03-18 13:26:50 +01:00
.dockerignore	🚀 Dockerize llamacpp (#132 )	2023-03-17 10:47:06 +01:00
.gitignore	add all llamacpypy	2023-03-18 13:26:50 +01:00
.gitmodules	add modules	2023-03-18 13:24:14 +01:00
CMakeLists.txt	add all llamacpypy	2023-03-18 13:26:50 +01:00
convert-pth-to-ggml.py	🚀 Dockerize llamacpp (#132 )	2023-03-17 10:47:06 +01:00
download-pth.py	🚀 Dockerize llamacpp (#132 )	2023-03-17 10:47:06 +01:00
flake.lock	Nix flake (#40 )	2023-03-17 23:03:48 +01:00
flake.nix	Nix flake (#40 )	2023-03-17 23:03:48 +01:00
ggml.c	Don't tell users to use a bad number of threads (#243 )	2023-03-17 19:47:35 +02:00
ggml.h	Add RMS norm and use it (#187 )	2023-03-16 00:41:38 +02:00
LICENSE	add self to license	2023-03-18 13:29:43 +01:00
main.cpp	fixed warning with std::ignore about unused function result (#151 )	2023-03-18 11:44:09 +00:00
Makefile	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
pyproject.toml	add all llamacpypy	2023-03-18 13:26:50 +01:00
quantize.cpp	Windows fixes (#31 )	2023-03-12 22:15:00 +02:00
quantize.sh	Add quantize script for batch quantization (#92 )	2023-03-13 18:15:20 +02:00
README.md	add all llamacpypy	2023-03-18 13:26:50 +01:00
requirements.txt	add all llamacpypy	2023-03-18 13:26:50 +01:00
utils.cpp	Fix n^2 loop in tokenization (#254 )	2023-03-18 11:17:19 +00:00
utils.h	Default to 4 threads (#243 )	2023-03-17 21:46:46 +02:00

README.md

llamacpypy

llamacpp but wrapped in python

This allows serving llama using libraries such as fastAPI using the optimized and in particular quantized models of the llama.cpp ecosystem instead of using torch directly. This should decrease ressource consumption over plain torch.

Installation

Atm this is all very raw so it will require some work on the users part.

Clone the repo and pull the external repo for pybind11

git clone https://github.com/seemanne/llamacpypy.git
cd llamacpypy
git submodule update --init

Set up your venv and install the requirements as always

If you have poetry, there are artifacts in the pyproject file that should allow you to do poetry install to set up venv, however it wont install the project itself. This can be done by using poetry shell and then calling pip install ./ as below.

If anyone want to fix the build process to make it less cumbersome, I would be very happy.

If you have another setup just pip install the reqs in your virtual env of choice and then continue as described below.

Run makefile

This isn't actually required, but it will give compile errors if something is wrong.

make -j

Install the module using pip

pip install ./

Usage

Initialize the model instance:

from llamacpypy import Llama

llama = Llama('models/7B/ggml-model-q4_0.bin', warm_start=False)

Load your model into memory:

llama.load_model()

Generate from a given prompt:

var = llama.generate("This is the weather report, we are reporting a clown fiesta happening at backer street. The clowns ")
print(var)
>>> This is the weather report, we are reporting a clown fiesta happening at backer street. The clowns 1st of July parade was going to be in their own neighborhood but they just couldn't contain themselves;
They decided it would look better and probably have more fun if all went into one area which meant that the whole town had to shut down for a little while as all roads were blocked. At least traffic wasn’t too bad today because most of people are out shopping, but I did see some shoppers in their car driving away from Backer street with “clowns” on wheels outside their windows…
The kids lined up along the route and waited for the parade to pass by

Implementation details

This python module is mainly a wrapper around the llama class in src/inference.cpp. As such, any changes should be done in there. As the llamacpp code is mostly contained in main.cpp which doesn't expose a good api, this repo will have to be manually patched on a need-be basis. Changes to ggml should not be a problem. Fixing the api on the main repo would allow this to be set up as a downstream fork rather than the weird sidekick repo it currently is.

README.md Unescape Escape