Commit graph

584 commits

Author SHA1 Message Date
Concedo
3d650d0e25 remove dependency of psutil, fixed compile error on WSL, handle exceptions when sending http response, added multiline for embedded kobold 2023-04-06 11:08:19 +08:00
Georgi Gerganov
eeaa7b0492
ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (#781) 2023-04-05 22:11:03 +03:00
Georgi Gerganov
986b6ce9f9
ggml, llama : avoid heavy V transpose + improvements (#775)
ggml :

- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned

llama :

- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy
2023-04-05 22:07:33 +03:00
Georgi Gerganov
3416298929
Update README.md 2023-04-05 19:54:30 +03:00
Ivan Stepanov
5a8c4f6240
llama : define non-positive top_k; top_k range check (#779)
* Define non-positive top_k; top_k range check

* minor : brackets

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-05 19:20:05 +03:00
at8u
ff05d05c96
miku.sh : add executable bit (#780) 2023-04-05 18:59:13 +03:00
Georgi Gerganov
62b3e81aae
media : add logos and banners 2023-04-05 18:58:31 +03:00
Georgi Gerganov
8d10406d6e
readme : change logo + add bindings + add uis + add wiki 2023-04-05 18:56:20 +03:00
iacore
ed1c214e66
zig : add build.zig (#773)
Co-authored-by: Locria Cyber <74560659+locriacyber@users.noreply.github.com>
2023-04-05 18:06:02 +03:00
Ivan Stepanov
0c44427df1
make : missing host optimizations in CXXFLAGS (#763) 2023-04-05 17:38:37 +03:00
Adithya Balaji
594cc95fab
readme : update with CMake and windows example (#748)
* README: Update with CMake and windows example

* README: update with code-review for cmake build
2023-04-05 17:36:12 +03:00
at8u
88ed5761b8
examples : add Miku.sh (#724)
* Add Miku.sh to examples

* Add missing line to prompt in Miku.sh

* Add --keep param to Miku.sh

* Remove '[end_of_conversation]' line from Miku.sh

No longer is necessary.
2023-04-05 17:32:42 +03:00
Andrew Duffy
58c438cf7d
Add Accelerate/BLAS when using Swift (#765) 2023-04-05 06:44:24 -04:00
Concedo
5c1920df43 why nobody ever told me the makefile doesnt work outside x86 xD 2023-04-05 17:15:42 +08:00
Concedo
1490cdd71d change GPT-J and GPT2 KVs to use fp16 instead 2023-04-05 15:53:07 +08:00
Concedo
57e9f929ee renamed misnamed ACCELERATE define, and removed all -march=native and -mtune=native flags 2023-04-05 15:22:13 +08:00
Concedo
14273fea7a integrated gpt2 support 2023-04-04 23:15:47 +08:00
Concedo
52de932842 removed main.exe to reduce clutter, added support for rep pen in gptj 2023-04-04 20:43:13 +08:00
Concedo
9c0dbbb08b Merge branch 'master' into concedo 2023-04-04 00:51:05 +08:00
Concedo
dd2abd8bc7 lower default thread threshold 2023-04-04 00:42:49 +08:00
mgroeber9110
53dbba7695
Windows: reactive sigint handler after each Ctrl-C (#736) 2023-04-03 18:00:55 +02:00
SebastianApel
437e77855a
10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (#654)
* Performance improvement of AVX2 code
* Fixed problem with MSVC compiler
* Reviewer comments: removed double semicolon, deleted empty line 1962
2023-04-03 09:52:28 +02:00
Concedo
06c711d770 Merge branch 'master' into concedo
# Conflicts:
#	.devops/full.Dockerfile
#	README.md
2023-04-03 15:10:08 +08:00
Concedo
eb5b22dda2 rebrand to koboldcpp 2023-04-03 10:35:18 +08:00
Ivan Stepanov
cd7fa95690
Define non-positive temperature behavior (#720) 2023-04-03 02:19:04 +02:00
bsilvereagle
a0c0516416
Remove torch GPU dependencies from the Docker.full image (#665)
By using `pip install torch --index-url https://download.pytorch.org/whl/cpu`
instead of `pip install torch` we can specify we want to install a CPU-only version
of PyTorch without any GPU dependencies. This reduces the size of the Docker image
from 7.32 GB to 1.62 GB
2023-04-03 00:13:03 +02:00
Concedo
8dd8ab1659 Various enhancement and integration pygmalion.cpp 2023-04-03 00:04:43 +08:00
Thatcher Chamberlin
d8d4e865cd
Add a missing step to the gpt4all instructions (#690)
`migrate-ggml-2023-03-30-pr613.py` is needed to get gpt4all running.
2023-04-02 12:48:57 +02:00
Christian Falch
e986f94829
Added api for getting/setting the kv_cache (#685)
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number.
It also contains a method for setting the kv_cache from a memory buffer.

This makes it possible to load/save history - maybe support --cache-prompt paramater as well?

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-02 12:23:04 +02:00
Marian Cepok
c0bb1d3ce2
ggml : change ne to int64_t (#626) 2023-04-02 13:21:31 +03:00
Concedo
3f4967b827 added new binaries 2023-04-02 17:14:38 +08:00
Concedo
bb965cc120 Merge branch 'master' into concedo
# Conflicts:
#	README.md
2023-04-02 17:13:28 +08:00
Concedo
9aabb0d9db massive refactor completed, GPT-J integrated 2023-04-02 17:03:30 +08:00
Leonardo Neumann
6e7801d08d
examples : add gpt4all script (#658) 2023-04-02 10:56:20 +03:00
Stephan Walter
81040f10aa
llama : do not allocate KV cache for "vocab_only == true" (#682)
Fixes sanitizer CI
2023-04-02 10:18:53 +03:00
Fabian
c4f89d8d73
make : use -march=native -mtune=native on x86 (#609) 2023-04-02 10:17:05 +03:00
Murilo Santana
5b70e7de4c
fix default params for examples/main (#697) 2023-04-02 04:41:12 +02:00
Concedo
b1f08813e3 added support for gpt4all original format 2023-04-02 00:53:46 +08:00
Ikko Eltociear Ashimine
a717cba844
py: huggingface -> Hugging Face (#686) 2023-04-01 18:38:18 +02:00
rimoliga
d0a7f742e7
readme: replace termux links with homepage, play store is deprecated (#680) 2023-04-01 16:57:30 +02:00
Slaren
0d054e292e Show error message when -f fails 2023-04-01 16:08:40 +02:00
Concedo
085a9f90a7 still refactoring 2023-04-01 11:56:34 +08:00
Concedo
6e6125ebdb updated pyinstaller to clean temp dir,removed warning flags from makefile because they are just clutter. 2023-04-01 09:25:41 +08:00
Concedo
9ab6e87b58 Merge branch 'master' into concedo
# Conflicts:
#	CMakeLists.txt
2023-04-01 09:05:45 +08:00
Concedo
801b178f2a still refactoring, but need a checkpoint to prepare build for 1.0.7 2023-04-01 08:55:14 +08:00
Stephan Walter
3525899277
Enable -std= for cmake builds, fix warnings (#598) 2023-03-31 19:19:16 +00:00
Concedo
6b86f5ea22 halfway refactoring, wip adding other model types 2023-04-01 01:13:05 +08:00
slaren
1d08882afa
Optimize AVX2 ggml_vec_dot_q4_0 (#642) 2023-03-31 15:55:52 +00:00
perserk
02c5b27e91
Add AVX acceleration (#617)
* ggml : add AVX quantize_row_q4_0()

* ggml : add AVX ggml_vec_dot_q4_0()

* ggml : refactor AVX part of ggml_vec_dot_q4_0()

https://github.com/ggerganov/llama.cpp/pull/617#issuecomment-1489985645
2023-03-31 13:55:44 +02:00
Concedo
56949197fe added HF converter base 2023-03-31 19:10:21 +08:00