Find a file
Concedo 91e2b43575 Merge remote-tracking branch 'origin/master' into concedo
# Conflicts:
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	flake.nix
#	main.cpp
2023-03-21 20:48:56 +08:00
prompts Add "--instruct" argument for usage with Alpaca (#240) 2023-03-19 18:37:02 +02:00
.gitignore Nix flake (#40) 2023-03-17 23:03:48 +01:00
alpaca.sh Rename script 2023-03-19 21:58:51 +02:00
convert-pth-to-ggml.py Fixed tokenizer.model not found error when model dir is symlink (#325) 2023-03-20 19:33:10 +00:00
download-pth.py 🚀 Dockerize llamacpp (#132) 2023-03-17 10:47:06 +01:00
expose.cpp update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
extra.cpp update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
extra.h update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
ggml.c Change RMSNorm eps to 1e-6 (#173) 2023-03-19 17:30:00 +02:00
ggml.h Add RMS norm and use it (#187) 2023-03-16 00:41:38 +02:00
klite.embd added embedded copy of kobold lite 2023-03-21 20:41:19 +08:00
LICENSE.md update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
llama_for_kobold.py added embedded copy of kobold lite 2023-03-21 20:41:19 +08:00
llamacpp.dll Merge remote-tracking branch 'origin/master' into concedo 2023-03-21 20:48:56 +08:00
main.cpp Merge remote-tracking branch 'origin/master' into concedo 2023-03-21 20:48:56 +08:00
main.exe Merge remote-tracking branch 'origin/master' into concedo 2023-03-21 20:48:56 +08:00
Makefile update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
MIT_LICENSE_GGML_LLAMACPP_ONLY update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
quantize.cpp move file magic/version to header, print expected version (#319) 2023-03-20 19:26:01 +00:00
quantize.exe Merge remote-tracking branch 'origin/master' into concedo 2023-03-21 20:48:56 +08:00
quantize.py Improved quantize script (#222) 2023-03-19 20:38:44 +02:00
README.md update license, added backwards compatibility with both ggml model formats, fixed context length issues. 2023-03-20 23:43:35 +08:00
utils.cpp sentencepiece bpe compatible tokenizer (#252) 2023-03-20 03:17:23 -07:00
utils.h move file magic/version to header, print expected version (#319) 2023-03-20 19:26:01 +00:00

llama-for-kobold

A hacky little script from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint.

It's not very usable as there is a fundamental flaw with llama.cpp, which causes generation delay to scale linearly with original prompt length. Nobody knows why or really cares much, so I'm just going to publish whatever I have at this point.

If you care, please contribute to this discussion which, if resolved, will actually make this viable.

Considerations

  • Don't want to use pybind11 due to dependencies on MSVCC
  • ZERO or MINIMAL changes as possible to main.cpp - do not move their function declarations elsewhere!
  • Leave main.cpp UNTOUCHED, We want to be able to update the repo and pull any changes automatically.
  • No dynamic memory allocation! Setup structs with FIXED (known) shapes and sizes for ALL output fields. Python will ALWAYS provide the memory, we just write to it.
  • No external libraries or dependencies. That means no Flask, Pybind and whatever. All You Need Is Python.

Usage

  • Windows binaries are provided in the form of llamacpp.dll but if you feel worried go ahead and rebuild it yourself.
  • Weights are not included, you can use the llama.cpp quantize.exe to generate them from your official weight files (or download them from...places).
  • To run, simply clone the repo and run llama_for_kobold.py [ggml_quant_model.bin] [port], and then connect with Kobold or Kobold Lite.
  • By default, you can connect to http://localhost:5001 (you can also use https://lite.koboldai.net/?local=1&port=5001).

License

  • The original GGML library and llama.cpp by ggerganov are licensed under the MIT License
  • However, Kobold Lite is licensed under the AGPL v3.0 License
  • The provided python ctypes bindings in llamacpp.dll are also under the AGPL v3.0 License