Commit graph

2269 commits

Author SHA1 Message Date
Concedo
9db21757ef update docs 2023-10-06 23:40:21 +08:00
Concedo
2a36c85558 abort has multiuser support via genkey too 2023-10-06 23:27:00 +08:00
Concedo
84eeecb889 updated lite 2023-10-06 23:15:11 +08:00
Concedo
1d1232ffbc show horde job count 2023-10-06 18:42:59 +08:00
Concedo
b5cd935cdb Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ggml-opencl.cpp
2023-10-06 17:58:08 +08:00
Concedo
9d2a25b12b updated lite, fixed fancy quotes 2023-10-06 15:44:37 +08:00
Concedo
efd0567f10 Merge branch 'concedo' into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-06 11:22:01 +08:00
Concedo
b8f0576c7b updated docs 2023-10-06 11:19:04 +08:00
grawity
9d0dd7ab11
avoid leaving a zombie process for --onready (#462)
Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
2023-10-06 11:06:37 +08:00
cebtenzzre
48edda30ee
convert : update Falcon script for new HF config (#3448)
Also adds Falcon-180B support.
Closes #3049

Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05 15:00:34 -04:00
Kenvix ⭐
45eba9369f
build : use std::make_tuple() for compatibility with older GCC versions (#3488) 2023-10-05 20:16:39 +03:00
staviq
acec9eaaa9
common : process escape sequences in reverse prompts (#3461) 2023-10-05 19:17:29 +03:00
shibe2
e2583cbc29 CLBlast: Fix handling of on-device tensor data
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
2023-10-05 18:25:23 +04:00
Concedo
da8a09ba10 use filename as default model name 2023-10-05 22:24:20 +08:00
Jhen-Jie Hong
e8b8d32e86
server : fix incorrect num_tokens_predicted (#3480) 2023-10-05 17:02:55 +03:00
Jhen-Jie Hong
8f3a642ec1
swift : disable ACCELERATE_NEW_LAPACK (#3481) 2023-10-05 17:00:07 +03:00
Jhen-Jie Hong
0745384449
ci : add swift build via xcodebuild (#3482) 2023-10-05 16:56:21 +03:00
Concedo
a0c1ba7747 Merge branch 'concedo_experimental' of https://github.com/LostRuins/llamacpp-for-kobold into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-05 21:20:21 +08:00
Concedo
b4b5c35074 add documentation for koboldcpp 2023-10-05 21:17:36 +08:00
teddybear082
f9f4cdf3c0
Implement basic chat/completions openai endpoint (#461)
* Implement basic chat/completions openai endpoint

-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create

-Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions.

-Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py

-Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine.

-Still TODO / evaluate before merging:

(1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters

(2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code)

(3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible)

Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.

* Fix typographical error on deleted streaming argument

-Mistakenly left code relating to streaming argument from main branch in experimental.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* Revert "add additional openai chat completions parameters"

This reverts commit 443a6f7ff6346f41c78b0a6ff59c063999542327.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* add /n after formatting prompts from openaiformat

to conform with alpaca standard used as default in lite.koboldai.net

* tidy up and simplify code, do not set globals for streaming

* oai endpoints must start with v1

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-10-05 20:13:10 +08:00
Concedo
5beb773320 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	tests/test-grad0.cpp
#	tests/test-opt.cpp
#	tests/test-quantize-perf.cpp
2023-10-05 11:44:35 +08:00
Concedo
ce065d39d0 allow drag and drop kcpps file and openwith 2023-10-05 11:38:37 +08:00
Kerfuffle
019ba1dcd0
convert : fix Baichuan2 models by using vocab size in config.json (#3299)
Use local GGUF package when possible in Baichuan converter
2023-10-04 17:20:28 +03:00
Georgi Gerganov
beabc8cfb0
readme : add project status link 2023-10-04 16:50:44 +03:00
Georgi Gerganov
0d152b37fe
ggml : fix build after #3329 2023-10-04 16:25:41 +03:00
ds5t5
f8c90cdbaa
llm : add Refact model (#3329)
* add refact model

* resolve comments

* rebase to the latest

* solve alibi cpu error

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-04 16:23:39 +03:00
Georgi Gerganov
f93af02488
sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)
* sync : ggml (conv 1d + 2d updates)

ggml-ci

* ggml : fix UB in q5_0 and q5_1 quantize code

ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml-ci

* tests : fix UB in test-quantize-perf
2023-10-04 15:29:58 +03:00
Merrick Christensen
f72f8f22c9
finetune : readme fix typo (#3465)
Fix small typo
2023-10-04 09:33:13 +03:00
Concedo
47f7ebb632 adjust horde worker and debugmode 2023-10-04 14:00:07 +08:00
Concedo
c7660ab6e6 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	flake.nix
2023-10-04 12:54:55 +08:00
Tameem
79f34abddb
ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453)
* Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v.

The RVV intrinsics is added for the following quantize row functions
   quantize_row_q8_0
   quantize_row_q8_1

The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1

And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics

Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>

* Added RVV intrinsics support for k_quants

This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64
   ggml_vec_dot_q2_K_q8_K
   ggml_vec_dot_q3_K_q8_K
   ggml_vec_dot_q4_K_q8_K
   ggml_vec_dot_q5_K_q8_K
   ggml_vec_dot_q6_K_q8_K

Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>

---------

Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
2023-10-03 21:38:19 +03:00
h-h-h-h
8186242b6d
main : consistent prefix/suffix coloring (#3425)
* Typo

* No `--in-prefix` coloring

The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
2023-10-03 21:16:15 +03:00
Georgi Gerganov
ac2219fef3
llama : fix session saving/loading (#3400)
* llama : fix session saving/loading

* llama : temp fix for clearing "future" tokens from the KV cache

* llama : fix handling of "future" tokens when loading sessions

* llama : fix comments for llama_kv_cache API
2023-10-03 21:04:01 +03:00
Alex Klinkhamer
48be797ffb
llama : expose model's rope_freq_scale in the API (#3418)
so it can be scaled further before creating a context.
2023-10-03 20:09:28 +03:00
Jiahao Li
f56e1baec3
metal : alibi for arbitrary number of heads (#3426) 2023-10-03 19:55:21 +03:00
Eve
017efe899d
cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)
* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03 19:53:15 +03:00
Concedo
ea726fcffa cleanup threaded horde submit 2023-10-04 00:34:26 +08:00
Concedo
c249f7dbc5 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	tests/CMakeLists.txt
2023-10-03 23:51:30 +08:00
Concedo
0cc740115d updated lite, improve horde worker (+1 squashed commits)
Squashed commits:

[a7c25999] improve horde worker
2023-10-03 23:44:27 +08:00
Concedo
ae8ccdc1be Remove old tkinter gui (+1 squashed commits)
Squashed commits:

[0933c1da] Remove old tkinter gui
2023-10-03 22:05:44 +08:00
Concedo
d10470a1e3 Breaking Change: Remove deprecated commands 2023-10-03 17:16:09 +08:00
goerch
ff5a3f0c09
Work on the BPE tokenizer (#3252)
* Work on the BPE tokenizer

Tokenizer tests work for Falcon-7B

* Try to fix build problem

* Fix debug assertion failure

* Fix MSVC Unicode BOM problem

* Cleanup and an improvement

* Fix compiler warning

* Cleanup

* Test doesn't work over the full range of Unicodes

* Update .gitignore and Makefile

* Another Makefile rule

* Testing Aquila

* Moving byte decoding back to `token_to_piece` ...

... because everyone is using it.

* Guarding some unusable code pathes

* Streamlining code and adding some more assertions

Important change: I'm classifying added tokens as control tokens now for BPE.

* Adding a comment

* Adding another assertion

* Fixed vocabulary guarding assertions

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fix PR for recent change

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fixes for more compiler warnings

* Remove unused code

* Fix initialization of static maps

* Add scores and token types back, adapt gptneox

* Update llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update unicode.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update unicode.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Ported Starcoder and added some assertions

* Fix coding style

* Apply @jploski 's fix for missing tokens

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-03 09:16:26 +02:00
cebtenzzre
1c84003c08
convert : fix vocab size when not defined in hparams (#3421) 2023-10-02 18:07:24 -04:00
cebtenzzre
e78f0b0d05
cmake : increase minimum version for add_link_options (#3444) 2023-10-02 22:38:43 +03:00
shibe2
665018c749
CLBlast: Add broadcast support for matrix multiplication (#3402)
Broadcast src0 into src1 across dimensions 2 and 3 when needed.
This is required for models that use GQA.
2023-10-02 21:26:15 +02:00
cebtenzzre
29a404a951
gguf : add BERT, MPT, and GPT-J arch info (#3408) 2023-10-02 15:20:28 -04:00
cebtenzzre
0fe321031a
gguf : general usability improvements (#3409) 2023-10-02 14:58:46 -04:00
cebtenzzre
9476b01226
cmake : make CUDA flags more similar to the Makefile (#3420)
* cmake : fix misuse of cxx_flags

* cmake : make CUDA flags more similar to the Makefile

* cmake : fix MSVC build
2023-10-02 16:16:50 +03:00
xaedes
a03ce38455
finetune : fix #3404 (#3437)
the shapes for init model of gqa models was wrong
2023-10-02 16:15:45 +03:00
Concedo
5d3e142145 use_default_badwordsids defaults to false if the parameter is missing 2023-10-02 19:41:07 +08:00