Commit graph

2342 commits

Author SHA1 Message Date
Concedo
643902fbbb fixed tensor split save and load 2023-10-13 10:07:22 +08:00
Concedo
7e2f714c9c tensor split only for cuda 2023-10-12 17:01:52 +08:00
Alexander Abushady
11b8f97c1e
Tensor split UI (#471)
* update .gitignore

Remove .idea folder created by Jet Brains products.

* Front end, and partial backe-end

Tensor Split pulled in, shows in console, then not respected on model load.

* UI Tweak + Tensor Split Fix

Made Tensor Flow input match similar boxes around it. Also, fixed Tensor Split to populate the correct argument.

* Changed int to float for tensor split

Accidentally set int, needed to be float when setting tensor split args
2023-10-12 16:50:21 +08:00
Concedo
601be78a3f kcpp does sampling ourselves, we can do whatever we want 2023-10-12 16:47:56 +08:00
Concedo
a6c3dbc351 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	README.md
#	build.zig
2023-10-12 16:32:00 +08:00
Concedo
8be043ee38 more horde optimizations 2023-10-12 16:20:52 +08:00
Concedo
8d1cd512e2 missed a flag 2023-10-12 15:00:51 +08:00
Concedo
c6fe820357 improve cors and header handling 2023-10-12 14:53:39 +08:00
Aarni Koskela
b016596d90
server : add completion mode (no chat) (#3582) 2023-10-12 09:51:53 +03:00
Georgi Gerganov
6b3ae4da92
prompts : add mnemonics.txt 2023-10-12 09:35:30 +03:00
Georgi Gerganov
57dd55e2c7
server : fix kv cache management (#3588) 2023-10-12 09:29:04 +03:00
Concedo
f604cffdce multiuser racer bugfix 2023-10-12 13:39:12 +08:00
Georgi Gerganov
b8fe4b5cc9
main : fix session loading bug (#3400) 2023-10-11 23:55:41 +03:00
Michael Coppola
a8bdd65525
server : add parameter -tb N, --threads-batch N (#3584)
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11 22:42:22 +03:00
Kerfuffle
70c29da118
common : fix mirostat state when using multiple sequences (#3543)
* Fix mirostat state when using multiple sequences

* Fix mirostat by completely refactoring sampling!

* Try to fix zig build.

* Export function to fetch/create default sampler states

Code formatting cleanups and add some comments

Silence a warning about id not being used when logging is disabled

* Apply some renaming suggestions.

Fix comments that were out of sync with the pull.

* Use more consistant naming convention for sampling contexts
2023-10-11 22:35:46 +03:00
Georgi Gerganov
8c70a5ff25
batched : add bench tool (#3545)
* batched : add bench tool

* batched : minor fix table

* batched-bench : add readme + n_kv_max is now configurable

* batched-bench : init warm-up batch

* batched-bench : pass custom set of PP, TG and PL

* batched-bench : add mmq CLI arg
2023-10-11 21:25:33 +03:00
Concedo
a003e3c348 horde auto recovery 2023-10-12 00:57:32 +08:00
Zane Shannon
24ba3d829e
examples : add batched.swift + improve CI for swift (#3562) 2023-10-11 06:14:05 -05:00
Galunid
9f6ede19f3
Add MPT model to supported models in README.md (#3574) 2023-10-10 19:02:49 -04:00
goerch
233fc1c69f
Minor improvements in GPT2 tokenizer (#3567)
* Fixing minor bugs in bpe_gpt2_preprocess

* Don't add bos token in test
2023-10-10 18:59:52 +02:00
Xingchen Song(宋星辰)
c5b49360d0
readme : add bloom (#3570) 2023-10-10 19:28:50 +03:00
Xingchen Song(宋星辰)
02d2875def
llm : add bloom models (#3553)
* feat: Support bloom models

* fix(bloom): fix model size

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10 17:48:21 +03:00
Jhen-Jie Hong
0aa6595ae0
swift : improvements and fixes (#3564)
* swift : use macOS 12 as minimum requirement

* swift : add missing ggml-backend.c source

* swift : add -O3 -DNDEBUG unsafe flags
2023-10-10 14:31:13 +03:00
Concedo
d74eab0e63 actually for this round, do not include deprecated params. i dont want to have to deal with them (+2 squashed commit)
Squashed commit:

[df2691c2] show context limit

[7c74f52a] prevent old scripts from crashing
2023-10-10 19:20:33 +08:00
Concedo
a723466d50 Merge branch 'master' into concedo_experimental 2023-10-10 17:21:42 +08:00
YellowRoseCx
1b25b21655
Merge pull request #27 from one-lithe-rune/allow-sdk-dll-loading - Allow use of hip SDK (if installed) dlls on windows (#470)
* If the rocm/hip sdk is installed on windows, then include the sdk
as a potential location to load the hipBlas/rocBlas .dlls from. This
allows running koboldcpp.py directly with python after building
work on windows without having to build the .exe and run that or
copy .dlls around.

Co-authored-by: one-lithe-rune <skapusniak@lithe-runes.com>
2023-10-10 17:16:33 +08:00
Jan Ploski
f5f9121de1
llm : add MPT support (#3417)
* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545)

* mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt

* mpt : protect against "clip_qkv": null in mpt-7b

* mpt : quick fix to avoid "Strange model" warning when quantizing MPT models

* mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)

* mpt : standardized all tensor names to follow GGUF spec

* mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code

* mpt : fixed comment s/gptneox/mpt/

* mpt : remove tabs, trailing whitespace

* mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt

* mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252

* comment out n_past instead of marking it unused

* mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"]

* mpt : remove unused tokenizer_json in convert script

* ggml : remove obsolete n_past assert in ggml_alibi

* llama : print clam_kqv and max_alibi_bias hparams

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-10 10:50:23 +03:00
vvhg1
11ea5c7d96
infill. : fix tokenization (#3508)
* infill tokens correction

* serverinfill tokens correction

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* only rm when params.escape, rm space if possible which is added back or rm added space token

* only rm when params.escape, rm space if possible which is added back or rm added space token

* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"

This reverts commit 63ba0b621f.

* fix interactive prompt escaping and fix server infill leading space handling

* rm unnecessary bool check
2023-10-10 10:31:21 +03:00
Concedo
f288c6b5e3 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	build.zig
#	scripts/sync-ggml.sh
2023-10-10 00:09:46 +08:00
Matěj Štágl
96e9539f05
OpenAI compat API adapter (#466)
* feat: oai-adapter

* simplify optional adapter for instruct start and end tags

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-10-09 23:24:48 +08:00
slaren
95bd60a0a6
ggml-alloc : fix assert in debug builds (#3555) 2023-10-09 15:44:58 +03:00
Georgi Gerganov
fcca0a7004
refact : fix convert script + zero out KV cache to avoid nans (#3523)
* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements
2023-10-09 14:32:17 +03:00
Georgi Gerganov
dcc09d2596
metal : do not use mul_mm kernels when ne00 < 64 (#3542) 2023-10-09 14:28:27 +03:00
Georgi Gerganov
db3abcc114
sync : ggml (ggml-backend) (#3548)
* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build
2023-10-08 20:19:14 +03:00
Concedo
80e53af236 fixed a bug in lite 2023-10-09 00:18:03 +08:00
Concedo
4e5b6293ab adjust streaming timings 2023-10-08 23:12:45 +08:00
Concedo
e967717385 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	build.zig
2023-10-08 22:55:44 +08:00
Concedo
840b244c17 update lite 2023-10-08 22:55:05 +08:00
Matheus C. França
eee42c670e
ci : add Zig CI/CD and fix build (#2996)
* zig CI/CD and fix build

Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>

* fix build_compiler

* ci : remove trailing whitespace

---------

Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08 16:59:20 +03:00
Ryder Wishart
8e6716a102
api_like_OAI.py : compat with Microsoft Guidance (#2746)
Check for None in addition to empty string check in all request params

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-08 13:55:58 +03:00
arcrank
9c38d181d4
api_like_OAI.py : simplify function (#2796)
Simplify function
2023-10-08 13:52:57 +03:00
Johannes Rudolph
a1202a31ed
k-quants : fix comments about block sizing (#3499) 2023-10-08 13:21:19 +03:00
Georgi Gerganov
94e502dfb7
ci : enable on obj-c changes + fix metal build (#3540) 2023-10-08 11:24:50 +03:00
Luo Tian
7d8b24932f
zig : fix build by introducing train.cpp (#3539) 2023-10-08 11:24:01 +03:00
Concedo
d8fa5ca230 Merge branch 'master' into concedo_experimental 2023-10-08 15:51:42 +08:00
Concedo
80dfe2ba49 blasthreads should apply for any thread count below 32 2023-10-08 15:51:18 +08:00
Concedo
a2b8473354 force flush sse 2023-10-08 15:12:07 +08:00
Georgi Gerganov
b0ec5218c3
metal : support MTLGPUFamily < Apple7, formatting, style (#3524)
* metal : improve decoding speed for batches of 2-16

* metal : rename kernels mul_mat_ to mul_mv_

* metal : indentations

* minor

* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
2023-10-08 10:01:53 +03:00
Kerfuffle
63d3b06a43
llama : fix missing break in Persimmon arch case statements (#3535) 2023-10-08 08:22:17 +03:00
Concedo
133897a558 updated lite (+1 squashed commits)
Squashed commits:

[4d1411df] update lite
2023-10-08 12:17:47 +08:00