Commit graph

2069 commits

Author SHA1 Message Date
Kawrakow
e8d9158925
metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)
* Somewhat faster f16 x f32 matrix multiply kernel

* Better use 32 thread groups for f16 x f32

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-01 11:15:57 +03:00
Concedo
81abd3cb1f Merge remote-tracking branch 'elbios/concat_output_mutex' into concedo_experimental 2023-09-01 15:24:13 +08:00
Concedo
d7fed4732f fix for typical sampler 2023-09-01 15:24:00 +08:00
Elbios
30588617fb Fix race condition by locking concat_output string
Writer thread was appending to concat_output global string without a lock, while another thread could be reading the string invoked by HTTP API.
Appending to std::string is not an atomic operation. Worst case would be if string was reallocated while being read.
Fix it by locking the access in writer and reader with a mutex.
2023-09-01 07:18:48 +02:00
Cebtenzzre
bce1fef328
convert : fix another python 3.8 issue (#2949) 2023-08-31 22:13:51 -04:00
slaren
528134dd02
remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906) 2023-09-01 01:32:09 +02:00
Kerfuffle
aeefac4ff7
scripts: Use local gguf package when running from repo (#2927)
* scripts: Use local gguf when running from repo
2023-08-31 16:49:24 -06:00
Concedo
0c3a265187 fixed incorrect buffer size values 2023-09-01 01:31:09 +08:00
Concedo
35ba699a7c Merge remote-tracking branch 'vxii/concedo' into concedo_experimental 2023-09-01 01:28:16 +08:00
Concedo
0fe3c9cf96 stronger banning bias 2023-09-01 01:25:23 +08:00
Concedo
fe4a233d79 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	llama.cpp
2023-09-01 00:47:06 +08:00
vxiiduu
f2985a070b
Add support for 34B GGML models 2023-09-01 01:29:09 +10:00
DannyDaemonic
e8422de39e
@vxiiduu's fix for PrefetchVirtualMemory (#2930)
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
2023-08-31 04:21:45 -07:00
Concedo
bc02f7663f allow sse3 in failsafe 2023-08-31 18:07:17 +08:00
Concedo
07b02af8bc fixed tab ordering , update lite for panel alignment 2023-08-31 16:33:00 +08:00
Concedo
e2fd30b5d1 reverted the failsafe removal, since they dropped support for dll check 2023-08-31 15:39:32 +08:00
Cebtenzzre
92d0b751a7
convert : fix python 3.8 support, modernize type annotations (#2916)
* convert : fix python 3.8 support

* convert : sort imports

* convert : fix required parameters in convert-llama-ggmlv3-to-gguf

* convert : fix mypy errors in convert-llama-ggmlv3-to-gguf

* convert : use PEP 585 generics and PEP 604 unions

Now that we have `from __future__ import annotations`, we can use this
modern syntax in Python 3.7 instead of restricting support to Python 3.9
or 3.10 respectively.

* gguf.py : a tuple is already a tuple

* add mypy.ini

* convert : add necessary `type: ignore` comments

* gguf-py: bump version
2023-08-31 08:02:23 +03:00
Johannes Gäßler
8afe228000
CUDA: mul_mat_q=true llama_context_params default (#2912) 2023-08-30 21:46:19 +02:00
Concedo
b6914ebd04 hotfix to revert the auto ctx scaling first, i didnt do it properly 2023-08-31 00:58:52 +08:00
Henri Vasserman
71d6975559
[Docker] fix tools.sh argument passing. (#2884)
* [Docker] fix tools.sh argument passing.

This should allow passing multiple arguments to containers with
the full image that are using the tools.sh frontend.

Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
2023-08-30 19:14:53 +03:00
Concedo
5cd0309610 renamed incorrect identifier 2023-08-30 23:06:39 +08:00
Concedo
0ee394ae1b falcon disable offload only for clblast 2023-08-30 22:35:24 +08:00
Concedo
29757de61f cmake disable buggy logs 2023-08-30 22:15:33 +08:00
Concedo
aa4ad830e2 log.h is broken so disable it first
Merge branch 'master' into concedo_experimental

# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	Makefile
#	README.md
#	tests/CMakeLists.txt
2023-08-30 21:58:54 +08:00
Concedo
a2a4eefa07 slight change to logits 2023-08-30 21:27:51 +08:00
Georgi Gerganov
b532a69b2f
convert.py : use dir name to name the llama 2023-08-30 13:29:40 +03:00
Concedo
1301bd7e29 Fix to skip GPU offloading so falcon models work correctly 2023-08-30 18:26:41 +08:00
Georgi Gerganov
c90d135eb4
examples : fix underscore in beam-search + .gitignore (close #2900) 2023-08-30 12:53:24 +03:00
M. Yusuf Sarıgöz
0d1c706181
gguf : add workflow for Pypi publishing (#2896)
* gguf : add workflow for Pypi publishing

* gguf : add workflow for Pypi publishing

* fix trailing whitespace
2023-08-30 12:47:40 +03:00
alonfaraj
9509294420
make : add test and update CI (#2897)
* build ci: run make test

* makefile:
- add all
- add test

* enable tests/test-tokenizer-0-llama

* fix path to model

* remove gcc-8 from macos build test

* Update Makefile

* Update Makefile
2023-08-30 12:42:51 +03:00
Concedo
d4c22a8b02 updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken 2023-08-30 16:50:55 +08:00
Gilad S
35092fb547
docs : add node-llama-cpp to README.md (#2885) 2023-08-30 11:40:12 +03:00
Kerfuffle
dc07dc492e
convert : various script cleanups/fixes + merges and special token handling (#2842)
* convert: Fix permute calls and method/func definitions

* Cleanups for gguf-py

* Minor types cleanups.

* Initial implementation of handling merges and special tokens

* convert: Handle special tokens and merges in vocab only mode

convert: Vocab only mode no longer requires loading model tensors

* gguf: Refactor tensor name mapping

* convert: Fix type hint for special_token_types in SpecialVocab

* Use common special vocab handling in various conversion scripts

* First pass at implementing suggested changes

* Second pass

* gguf: SpecialVocab: Fix issue with special token content not in a dict

gguf: SpecialVocab: Allow skipping handling of merges

* convert-falcon-hf-to-gguf: Support --vocab-only option, bail out if no tokenizer.json

* convert-gptneox-hf-to-gguf and convert: Only handle merges for BPE tokenizer

* gguf: SpecialVocab: Actually set load_merges in object

* Uniform args parsing and vocab only mode for convert examples

* convert.py: Set gpt2 as tokenizer model when using BPE

* Squish last type warning in gguf.py - yay!
2023-08-30 11:25:50 +03:00
chaihahaha
ad9ddcff6e
llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
staviq
8341a25957
main : log file (#2748)
* initial, base LOG macro

* add *.log to .gitignore

* added basic log file handler

* reverted log auto endline to better mimic printf

* remove atomics and add dynamic log target

* log_enable/disable, LOG_TEE, basic usage doc

* update .gitignore

* mv include to common, params, help msg

* log tostring helpers, token vectors pretty prints

* main: replaced fprintf/LOG_TEE, some trace logging

* LOG_DISABLE_LOGS compile flag, wrapped f in macros

* fix LOG_TEELN and configchecker

* stub LOG_DUMP_CMDLINE for WIN32 for now

* fix msvc

* cleanup main.cpp:273

* fix stray whitespace after master sync

* log : fix compile warnings

- do not use C++20 stuff
- use PRIu64 to print uint64_t
- avoid string copies by using const ref
- fix ", ##__VA_ARGS__" warnings
- compare strings with == and !=

* log : do not append to existing log + disable file line func by default

* log : try to fix Windows build

* main : wip logs

* main : add trace log

* review: macro f lowercase, str append to sstream

* review: simplify ifs and str comparisons

* fix MSVC, formatting, FMT/VAL placeholders

* review: if/else cleanup

* review: if/else cleanup (2)

* replace _ prefix with _impl suffix

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-30 09:29:32 +03:00
Cebtenzzre
849408957c
tests : add a C compliance test (#2848)
* tests : add a C compliance test

* make : build C compliance test by default

* make : fix clean and make sure C test fails on clang

* make : move -Werror=implicit-int to CFLAGS
2023-08-30 09:20:26 +03:00
Concedo
89495c0716 handle token unbanning over api 2023-08-30 10:51:49 +08:00
Concedo
f2c02dd06d Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	tests/test-grad0.cpp
2023-08-30 10:51:28 +08:00
YellowRoseCx
d7bdfbdd78
Update Makefile for misc amd gpu targetting (#407)
adds the hipBlas gpu_target $(shell $(ROCM_PATH)/llvm/bin/amdgpu-arch)
back to the gpu_target line, possibly allowing misc gpu arch's like gfx1031 or gfx1032 etc to be built
2023-08-30 09:54:15 +08:00
slaren
06abf8eeba
ggml : add view_src and view_offs to ggml_tensor for views (#2874)
* ggml : add view_src and view_offs

* update ggml-alloc to use view_src

* update ggml_diag_mask to work correctly with automatic inplace

* exclude other ops that set an inplace flag from automatic inplace
2023-08-29 23:24:42 +02:00
slaren
c03a243abf
remove outdated references to -eps and -gqa from README (#2881) 2023-08-29 23:17:34 +02:00
Kawrakow
fa3582f509
Tell users attmepting to run perplexity with too few tokens to use more (#2882)
Closes #2858

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-29 23:55:45 +03:00
Kawrakow
e37e69dcc3
10X faster BPE tokenizer (#2876)
* 10X faster BPE tokenizer

* Remove comment that no longer applies

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-08-29 23:55:03 +03:00
Concedo
380fa0f0ca fixed broken typical sampler issues 2023-08-29 23:50:59 +08:00
maddes8cht
53885d7256
py : fix "usage" messages (#2873)
convert-to-gguf python scripts
2023-08-29 16:51:02 +03:00
jameswu2014
bcce96ba4d
convert.py : fix baichuan7B support (#2870)
* [Fix]: convert.py support baichuan7B

* convert.py : fix trailing whitespaces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-08-29 12:48:41 +03:00
Jhen-Jie Hong
74e0caeb82
readme : add react-native binding (#2869) 2023-08-29 12:30:10 +03:00
Cebtenzzre
d4b5e16c32
make : fix clang tests build, add missing examples (#2859)
* make : do not pass headers to the compiler

This fixes building tests with clang.

* make : add missing examples

* make : fix build-info.h dependencies
2023-08-29 11:42:41 +03:00
Georgi Gerganov
3a007648f2
metal : add option to disable debug logs (close #2764) 2023-08-29 11:33:46 +03:00
Georgi Gerganov
611363ac79 scripts : add pipefail 2023-08-29 10:50:30 +03:00