Commit graph

2371 commits

Author SHA1 Message Date
Concedo
c1ca1de2ac fixed support for old falcon models 2023-10-18 17:20:44 +08:00
Concedo
700951dbd4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-10-18 16:33:09 +08:00
Concedo
53b7cdf8a3 Merge branch 'concedo' into concedo_experimental 2023-10-18 13:51:13 +08:00
slaren
cb33f43a2a
fix embeddings when using CUDA (#3657) 2023-10-17 22:24:50 +02:00
Georgi Gerganov
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538) 2023-10-17 22:34:26 +03:00
BarfingLemurs
8402566a7c
readme : update hot-topics & models, detail windows release in usage (#3615)
* Update README.md

* Update README.md

* Update README.md

* move "Running on Windows" section below "Prepare data and run"

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 21:13:21 +03:00
LostRuins
6e34d31c44
Update README.md (#479) 2023-10-18 01:24:14 +08:00
shibe2
40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize)
Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.
2023-10-17 21:02:30 +04:00
slaren
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc (#3618) 2023-10-17 20:00:58 +03:00
Georgi Gerganov
e74c705e15
editorconfig : remove trailing spaces 2023-10-17 19:52:53 +03:00
coezbek
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 19:51:02 +03:00
Georgi Gerganov
1142013da4
save-load-state : fix example + add ci test (#3655)
* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci
2023-10-17 19:12:46 +03:00
ldwang
5fe268a4d9
readme : add Aquila2 links (#3610)
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17 18:52:33 +03:00
staviq
1a159553f9
tokenizer : special token handling (#3538)
* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 18:11:01 +03:00
Concedo
6f8fe88f10 fix for lite (+5 squashed commit)
Squashed commit:

[f9ce9855] catch more exceptions

[8cdaf149] tweaked horde worker timeouts, updated lite

[619ebef4] fixed abort no response if failed

[a54a66a2] fixed time overflow

[9affdc3e] updated lite
2023-10-17 23:04:32 +08:00
Georgi Gerganov
281ef73c25
k-quants : fix quantization ranges (#3646) 2023-10-17 09:19:28 +03:00
Georgi Gerganov
940efa95fe
llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)
* llava : fix tokenization to not add bos after system prompt

* set seed

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-16 23:58:00 +03:00
Concedo
ee0681f0d9 convert some asserts into non-terminating since they are ovezealous 2023-10-15 16:12:20 +08:00
Concedo
5cfabaee25 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	docs/BLIS.md
2023-10-15 15:50:20 +08:00
cebtenzzre
11bff29045
MPT : support GQA for replit-code-v1.5 (#3627) 2023-10-15 09:32:06 +03:00
M. Yusuf Sarıgöz
11dc1091f6
Honor -ngl option for Cuda offloading in llava (#3621) 2023-10-14 04:52:44 -06:00
Daniel Bevenius
2a4bcbacea
llama : remove n_threads from llama_decode_internal (#3614)
This commit removes `n_threads` from the `llama_decode_internal`
functions doc comment as it does not exist anymore.

It looks like this parameter was removed in
Commit 16bc66d947 ("llama.cpp : split
llama_context_params into model and context params").

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-10-13 13:33:16 +03:00
slaren
424b6381c4
ggml : add context enumeration functions (#3605)
finetune : fix assert failure in ggml-alloc
2023-10-13 12:23:10 +02:00
Concedo
643902fbbb fixed tensor split save and load 2023-10-13 10:07:22 +08:00
shibe2
1e0e873c37
CLBlast: Fix matrix-vector multiplication (#3544) 2023-10-12 21:59:47 +02:00
M. Yusuf Sarıgöz
370359e5ba
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-12 18:23:18 +03:00
uint256_t
9e24cc6e2e
docs : fix typo GOMP_CPU_AFFINITY (#3597) 2023-10-12 16:36:16 +03:00
Georgi Gerganov
d28e572c02
cmake : fix add_compile_options on macOS 2023-10-12 14:31:05 +03:00
Ian Scrivener
f3040beaab
typo : it is --n-gpu-layers not --gpu-layers (#3592)
fixed a typo in the MacOS Metal run doco
2023-10-12 14:10:50 +03:00
Georgi Gerganov
1a8c8795d6
ci : check if there is enough VRAM (#3596)
ggml-ci
2023-10-12 13:44:56 +03:00
Concedo
7e2f714c9c tensor split only for cuda 2023-10-12 17:01:52 +08:00
Alexander Abushady
11b8f97c1e
Tensor split UI (#471)
* update .gitignore

Remove .idea folder created by Jet Brains products.

* Front end, and partial backe-end

Tensor Split pulled in, shows in console, then not respected on model load.

* UI Tweak + Tensor Split Fix

Made Tensor Flow input match similar boxes around it. Also, fixed Tensor Split to populate the correct argument.

* Changed int to float for tensor split

Accidentally set int, needed to be float when setting tensor split args
2023-10-12 16:50:21 +08:00
Concedo
601be78a3f kcpp does sampling ourselves, we can do whatever we want 2023-10-12 16:47:56 +08:00
Concedo
a6c3dbc351 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	README.md
#	build.zig
2023-10-12 16:32:00 +08:00
Concedo
8be043ee38 more horde optimizations 2023-10-12 16:20:52 +08:00
Concedo
8d1cd512e2 missed a flag 2023-10-12 15:00:51 +08:00
Concedo
c6fe820357 improve cors and header handling 2023-10-12 14:53:39 +08:00
Aarni Koskela
b016596d90
server : add completion mode (no chat) (#3582) 2023-10-12 09:51:53 +03:00
Georgi Gerganov
6b3ae4da92
prompts : add mnemonics.txt 2023-10-12 09:35:30 +03:00
Georgi Gerganov
57dd55e2c7
server : fix kv cache management (#3588) 2023-10-12 09:29:04 +03:00
Concedo
f604cffdce multiuser racer bugfix 2023-10-12 13:39:12 +08:00
Georgi Gerganov
b8fe4b5cc9
main : fix session loading bug (#3400) 2023-10-11 23:55:41 +03:00
Michael Coppola
a8bdd65525
server : add parameter -tb N, --threads-batch N (#3584)
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11 22:42:22 +03:00
Kerfuffle
70c29da118
common : fix mirostat state when using multiple sequences (#3543)
* Fix mirostat state when using multiple sequences

* Fix mirostat by completely refactoring sampling!

* Try to fix zig build.

* Export function to fetch/create default sampler states

Code formatting cleanups and add some comments

Silence a warning about id not being used when logging is disabled

* Apply some renaming suggestions.

Fix comments that were out of sync with the pull.

* Use more consistant naming convention for sampling contexts
2023-10-11 22:35:46 +03:00
Georgi Gerganov
8c70a5ff25
batched : add bench tool (#3545)
* batched : add bench tool

* batched : minor fix table

* batched-bench : add readme + n_kv_max is now configurable

* batched-bench : init warm-up batch

* batched-bench : pass custom set of PP, TG and PL

* batched-bench : add mmq CLI arg
2023-10-11 21:25:33 +03:00
Concedo
a003e3c348 horde auto recovery 2023-10-12 00:57:32 +08:00
Zane Shannon
24ba3d829e
examples : add batched.swift + improve CI for swift (#3562) 2023-10-11 06:14:05 -05:00
Galunid
9f6ede19f3
Add MPT model to supported models in README.md (#3574) 2023-10-10 19:02:49 -04:00
goerch
233fc1c69f
Minor improvements in GPT2 tokenizer (#3567)
* Fixing minor bugs in bpe_gpt2_preprocess

* Don't add bos token in test
2023-10-10 18:59:52 +02:00
Xingchen Song(宋星辰)
c5b49360d0
readme : add bloom (#3570) 2023-10-10 19:28:50 +03:00