Commit graph

2294 commits

Author SHA1 Message Date
Concedo
a2b8473354 force flush sse 2023-10-08 15:12:07 +08:00
Concedo
133897a558 updated lite (+1 squashed commits)
Squashed commits:

[4d1411df] update lite
2023-10-08 12:17:47 +08:00
Concedo
f797cba377 Merge branch 'master' into concedo_experimental 2023-10-08 10:43:34 +08:00
Kerfuffle
a16e89cec8
Fix trying to strip newline from empty prompt and cfg prompt file content (#3534) 2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz
4d03833211
gguf.py : fix CI for publishing GGUF package (#3532)
* Fix CI for publishing GGUF package

* Bump version

* fix

* bump version

* bump version

* bump version
2023-10-07 22:14:10 +03:00
Concedo
e46708eedc updated lite 2023-10-07 23:33:54 +08:00
Concedo
678f31f2fd Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	llama.cpp
2023-10-07 22:00:09 +08:00
Concedo
ca4a8c5dc8 updated lite 2023-10-07 21:50:24 +08:00
Tom C
c47066d833
py : change version of numpy requirement to 1.24.4 (#3515)
Co-authored-by: Lyjia <me@lyjia.us>
2023-10-07 12:56:15 +03:00
cebtenzzre
f1782c68de
quantize : fail fast on write errors (#3521) 2023-10-07 11:41:52 +03:00
Jhen-Jie Hong
c26765a0a1
metal : support default.metallib load & reuse code for swift package (#3522)
* metal : support load default.metallib & reuse code for swift package

* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
2023-10-07 11:40:27 +03:00
Phillip Kravtsov
0e797c2fc5
llm : support Adept Persimmon 8B (#3410)
* Produces garbage output

* wip: correct tensors up to RoPE

* correct tensors thru RoPE

* Correct outputs through masked & softmax'd KQ

* fp32 works

* Rename adept->persimmon

* Produces correct outputs

* clean up convert scripts

* remove printing logic from ggml.c

* remove prints from llama.cpp & fix merge

* trivial cleanups

* Add offload funcs

* update conversion script to directly take adept artifacts rather than .saftensors file

* Fix norm eps bug

* Support sqr and concat on metal, persimmon-8b-q4 runs correctly

* Small changes from review

* Formatting changes

* Minor changes to conversion script

* Remove old script

* Fix editorconfig formatting

* Fix build

* add overlooked offload code ggml-ci
2023-10-07 10:12:43 +03:00
goerch
3a716b4dae
Fix for #3454 (#3455)
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
2023-10-07 06:57:01 +02:00
Concedo
6b282271b1 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-10-07 10:24:34 +08:00
Concedo
07a114de63 force debugmode to be indicated on horde, allow 64k context for gguf 2023-10-07 10:23:33 +08:00
BarfingLemurs
1faaae8c2b
readme : update models, cuda + ppl instructions (#3510) 2023-10-06 22:13:36 +03:00
Mihai
cb13d73a72
server : docs fix default values and add n_probs (#3506) 2023-10-06 21:39:33 +03:00
Concedo
d8f7a7077a Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
2023-10-07 01:36:14 +08:00
Concedo
120695ddf7 add update link 2023-10-07 01:33:18 +08:00
Kerfuffle
9ca79d5cbb
kv cache slot search improvements (#3493)
* kv cache slot search improvements

* Use n_ctx in kv find slot for consistency

* Ensure kv cache head points to a valid slot in llama_decode internal

* Add some comments to prevent dumb people (like me) from getting confused.
2023-10-06 10:10:13 -06:00
Concedo
9db21757ef update docs 2023-10-06 23:40:21 +08:00
Concedo
2a36c85558 abort has multiuser support via genkey too 2023-10-06 23:27:00 +08:00
Concedo
84eeecb889 updated lite 2023-10-06 23:15:11 +08:00
Georgi Gerganov
0c731ca403
prompts : fix editorconfig checks after #3416 2023-10-06 16:36:32 +03:00
pudepiedj
a8777ad84e
parallel : add option to load external prompt file (#3416)
* Enable external file and add datestamp

* Add name of external file at end

* Upload ToK2024

* Delete ToK2024.txt

* Experiments with jeopardy

* Move ParallelQuestions to /proimpts and rename

* Interim commit

* Interim commit

* Final revision

* Remove trailing whitespace

* remove cmake_all.sh

* Remove cmake_all.sh

* Changed .gitignore

* Improved reporting and new question files.

* Corrected typo

* More LLM questions

* Update LLM-questions.txt

* Yet more LLM-questions

* Remove jeopardy results file

* Reinstate original jeopardy.sh

* Update examples/parallel/parallel.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-06 16:16:38 +03:00
Jhen-Jie Hong
97af49fa39
server : reuse llama_sample_token common util (#3494)
* server : reuse llama_sample_token common function

* common : use n_probs for temperature sampling
2023-10-06 15:44:24 +03:00
l3utterfly
16820a5a0d
llama : correct hparams comparison (#3446)
* fixed floating point comparison issues

* updated implementation for hparam comparison to handle inf and NaN

* fixed code review comments

* minor simplification

* rename is_float_eq -> is_float_close

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-10-06 13:47:59 +03:00
Concedo
1d1232ffbc show horde job count 2023-10-06 18:42:59 +08:00
Jhen-Jie Hong
04b2f4386e
ci : fix xcodebuild destinations (#3491)
* ci : fix xcodebuild destinations

* ci : add .swift to paths
2023-10-06 13:36:43 +03:00
Concedo
b5cd935cdb Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ggml-opencl.cpp
2023-10-06 17:58:08 +08:00
Concedo
9d2a25b12b updated lite, fixed fancy quotes 2023-10-06 15:44:37 +08:00
Concedo
efd0567f10 Merge branch 'concedo' into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-06 11:22:01 +08:00
Concedo
b8f0576c7b updated docs 2023-10-06 11:19:04 +08:00
grawity
9d0dd7ab11
avoid leaving a zombie process for --onready (#462)
Popen() needs to be used with 'with' or have .wait() called or be
destroyed, otherwise there is a zombie child that sticks around until
the object is GC'd.
2023-10-06 11:06:37 +08:00
cebtenzzre
48edda30ee
convert : update Falcon script for new HF config (#3448)
Also adds Falcon-180B support.
Closes #3049

Co-authored-by: jb <jonathan.t.barnard@gmail.com>
2023-10-05 15:00:34 -04:00
Kenvix ⭐
45eba9369f
build : use std::make_tuple() for compatibility with older GCC versions (#3488) 2023-10-05 20:16:39 +03:00
staviq
acec9eaaa9
common : process escape sequences in reverse prompts (#3461) 2023-10-05 19:17:29 +03:00
shibe2
e2583cbc29 CLBlast: Fix handling of on-device tensor data
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
2023-10-05 18:25:23 +04:00
Concedo
da8a09ba10 use filename as default model name 2023-10-05 22:24:20 +08:00
Jhen-Jie Hong
e8b8d32e86
server : fix incorrect num_tokens_predicted (#3480) 2023-10-05 17:02:55 +03:00
Jhen-Jie Hong
8f3a642ec1
swift : disable ACCELERATE_NEW_LAPACK (#3481) 2023-10-05 17:00:07 +03:00
Jhen-Jie Hong
0745384449
ci : add swift build via xcodebuild (#3482) 2023-10-05 16:56:21 +03:00
Concedo
a0c1ba7747 Merge branch 'concedo_experimental' of https://github.com/LostRuins/llamacpp-for-kobold into concedo_experimental
# Conflicts:
#	koboldcpp.py
2023-10-05 21:20:21 +08:00
Concedo
b4b5c35074 add documentation for koboldcpp 2023-10-05 21:17:36 +08:00
teddybear082
f9f4cdf3c0
Implement basic chat/completions openai endpoint (#461)
* Implement basic chat/completions openai endpoint

-Basic support for openai chat/completions endpoint documented at: https://platform.openai.com/docs/api-reference/chat/create

-Tested with example code from openai for chat/completions and chat/completions with stream=True parameter found here: https://cookbook.openai.com/examples/how_to_stream_completions.

-Tested with Mantella, the skyrim mod that turns all the NPC's into AI chattable characters, which uses openai's acreate / async competions method: https://github.com/art-from-the-machine/Mantella/blob/main/src/output_manager.py

-Tested default koboldcpp api behavior with streaming and non-streaming generate endpoints and running GUI and seems to be fine.

-Still TODO / evaluate before merging:

(1) implement rest of openai chat/completion parameters to the extent possible, mapping to koboldcpp parameters

(2) determine if there is a way to use kobold's prompt formats for certain models when translating openai messages format into a prompt string. (Not sure if possible or where these are in the code)

(3) have chat/completions responses include the actual local model the user is using instead of just koboldcpp (Not sure if this is possible)

Note I am a python noob, so if there is a more elegant way of doing this at minimum hopefully I have done some of the grunt work for you to implement on your own.

* Fix typographical error on deleted streaming argument

-Mistakenly left code relating to streaming argument from main branch in experimental.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* Revert "add additional openai chat completions parameters"

This reverts commit 443a6f7ff6346f41c78b0a6ff59c063999542327.

* add additional openai chat completions parameters

-support stop parameter mapped to koboldai stop_sequence parameter

-make default max_length / max_tokens parameter consistent with default 80 token length in generate function

-add support for providing name of local model in openai responses

* add /n after formatting prompts from openaiformat

to conform with alpaca standard used as default in lite.koboldai.net

* tidy up and simplify code, do not set globals for streaming

* oai endpoints must start with v1

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-10-05 20:13:10 +08:00
Concedo
5beb773320 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	tests/test-grad0.cpp
#	tests/test-opt.cpp
#	tests/test-quantize-perf.cpp
2023-10-05 11:44:35 +08:00
Concedo
ce065d39d0 allow drag and drop kcpps file and openwith 2023-10-05 11:38:37 +08:00
Kerfuffle
019ba1dcd0
convert : fix Baichuan2 models by using vocab size in config.json (#3299)
Use local GGUF package when possible in Baichuan converter
2023-10-04 17:20:28 +03:00
Georgi Gerganov
beabc8cfb0
readme : add project status link 2023-10-04 16:50:44 +03:00
Georgi Gerganov
0d152b37fe
ggml : fix build after #3329 2023-10-04 16:25:41 +03:00