Commit graph

1987 commits

Author SHA1 Message Date
Concedo
eed651494e Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	common/log.h
2023-09-02 11:24:28 +08:00
Concedo
8df03ed026 tweaks for rocm blas 2023-09-02 09:22:32 +08:00
Jhen-Jie Hong
571083f508
server : avoid aniprompt in probabilities of final response (#2849) 2023-09-02 08:31:46 +08:00
Engininja2
f04d002844
cuda : vsubss4 for older versions of ROCm/clang (#2942) 2023-09-01 23:33:19 +02:00
Concedo
6d06695c7e initializer syntax 2023-09-02 00:41:50 +08:00
ZHAOKAI WANG
69fdbb9abc
readme : quick start command fix (#2908)
* quick start command fix

* quick start win command fix
2023-09-01 17:06:44 +03:00
Kerfuffle
5d6f19f16b
Allow quantize to only copy tensors, some other improvements (#2931)
* Allow quantize tool to only copy tensors to allow repackaging models.

* Slightly better logic when requantizing.

* Change help message to go to `stdout`.
2023-09-01 08:02:48 -06:00
Georgi Gerganov
0d58936686
llama2c : rename function 2023-09-01 17:01:11 +03:00
Cebtenzzre
6c9c23429b
make : use unaligned vector moves on MinGW (#2945)
Fixes #2922
2023-09-01 16:53:14 +03:00
m3ndax
ee8654bcd0
minor : add const qualifiers (#2853)
* made the methods const

# Conflicts:
#	examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp

* made method const

* Update convert-llama2c-to-ggml.cpp

removed write_raw and write_u32

* llama2c : remove misleading const

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-01 16:47:27 +03:00
Konstantin Herud
49bb9cbe0f
docs : add java-llama.cpp to README.md (#2935) 2023-09-01 16:36:14 +03:00
Cebtenzzre
ef15649972
build : fix most gcc and clang warnings (#2861)
* fix most gcc and clang warnings

* baby-llama : remove commented opt_params_adam

* fix some MinGW warnings

* fix more MinGW warnings
2023-09-01 16:34:50 +03:00
Ben Siraphob
d8d6977f48
examples : add C grammar (#2357) 2023-09-01 16:32:14 +03:00
Tameem
5aec2cfaac
ggml : add RISC-V vector intrinsics support (#2929)
* added support for RISCV CFLAGS & native compile + cross compile options

* Add RISC-V Vector Intrinsics Support

Added RVV intrinsics for following
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1
   ggml_vec_dot_q8_0_q8_0

Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>

---------

Signed-off-by: Ahmad Tameem <ahmad.tameem@10xengineers.ai>
Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>
Co-authored-by: Sharafat <sharafat.hussain@10xengineers.ai>
2023-09-01 16:27:40 +03:00
Georgi Gerganov
13268c5331
metal : slight speed-up for add and mul kernels (#2917) 2023-09-01 13:42:41 +03:00
staviq
4dcd47d71d
logs : fix mingw-like builds (fixes #2898) (#2911)
* fix mingw-like builds

* formatting

* make LOG_COMPAT easier to override and extend

* simplify win detection

* fix for #2940
2023-09-01 12:07:06 +03:00
Cebtenzzre
18705a30ef
llama2c : fix segfault and alloc-dealloc-mismatch (#2913)
* llama2c : fix segfault if vocab is not found

* llama2c : fix mismatch between new[] and delete

* llama2c : fix basename on Windows

* llama2c : use a destructor to prevent memory leaks
2023-09-01 12:03:49 +03:00
Concedo
5925c23d51 fix for RWKV 2023-09-01 17:02:11 +08:00
Kawrakow
e8d9158925
metal: somewhat faster f16 x f32 matrix multiply kernel (#2951)
* Somewhat faster f16 x f32 matrix multiply kernel

* Better use 32 thread groups for f16 x f32

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-09-01 11:15:57 +03:00
Concedo
81abd3cb1f Merge remote-tracking branch 'elbios/concat_output_mutex' into concedo_experimental 2023-09-01 15:24:13 +08:00
Concedo
d7fed4732f fix for typical sampler 2023-09-01 15:24:00 +08:00
Elbios
30588617fb Fix race condition by locking concat_output string
Writer thread was appending to concat_output global string without a lock, while another thread could be reading the string invoked by HTTP API.
Appending to std::string is not an atomic operation. Worst case would be if string was reallocated while being read.
Fix it by locking the access in writer and reader with a mutex.
2023-09-01 07:18:48 +02:00
Cebtenzzre
bce1fef328
convert : fix another python 3.8 issue (#2949) 2023-08-31 22:13:51 -04:00
slaren
528134dd02
remove convert-llama-7b-pth-to-gguf.py and convert-llama-hf-to-gguf.py (#2906) 2023-09-01 01:32:09 +02:00
Kerfuffle
aeefac4ff7
scripts: Use local gguf package when running from repo (#2927)
* scripts: Use local gguf when running from repo
2023-08-31 16:49:24 -06:00
Concedo
0c3a265187 fixed incorrect buffer size values 2023-09-01 01:31:09 +08:00
Concedo
35ba699a7c Merge remote-tracking branch 'vxii/concedo' into concedo_experimental 2023-09-01 01:28:16 +08:00
Concedo
0fe3c9cf96 stronger banning bias 2023-09-01 01:25:23 +08:00
Concedo
fe4a233d79 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	llama.cpp
2023-09-01 00:47:06 +08:00
vxiiduu
f2985a070b
Add support for 34B GGML models 2023-09-01 01:29:09 +10:00
DannyDaemonic
e8422de39e
@vxiiduu's fix for PrefetchVirtualMemory (#2930)
Reimplement fix for `PrefetchVirtualMemory`.
Co-authored-by: vxiiduu <73044267+vxiiduu@users.noreply.github.com>
2023-08-31 04:21:45 -07:00
Concedo
bc02f7663f allow sse3 in failsafe 2023-08-31 18:07:17 +08:00
Concedo
07b02af8bc fixed tab ordering , update lite for panel alignment 2023-08-31 16:33:00 +08:00
Concedo
e2fd30b5d1 reverted the failsafe removal, since they dropped support for dll check 2023-08-31 15:39:32 +08:00
Cebtenzzre
92d0b751a7
convert : fix python 3.8 support, modernize type annotations (#2916)
* convert : fix python 3.8 support

* convert : sort imports

* convert : fix required parameters in convert-llama-ggmlv3-to-gguf

* convert : fix mypy errors in convert-llama-ggmlv3-to-gguf

* convert : use PEP 585 generics and PEP 604 unions

Now that we have `from __future__ import annotations`, we can use this
modern syntax in Python 3.7 instead of restricting support to Python 3.9
or 3.10 respectively.

* gguf.py : a tuple is already a tuple

* add mypy.ini

* convert : add necessary `type: ignore` comments

* gguf-py: bump version
2023-08-31 08:02:23 +03:00
Johannes Gäßler
8afe228000
CUDA: mul_mat_q=true llama_context_params default (#2912) 2023-08-30 21:46:19 +02:00
Concedo
b6914ebd04 hotfix to revert the auto ctx scaling first, i didnt do it properly 2023-08-31 00:58:52 +08:00
Henri Vasserman
71d6975559
[Docker] fix tools.sh argument passing. (#2884)
* [Docker] fix tools.sh argument passing.

This should allow passing multiple arguments to containers with
the full image that are using the tools.sh frontend.

Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734
2023-08-30 19:14:53 +03:00
Concedo
5cd0309610 renamed incorrect identifier 2023-08-30 23:06:39 +08:00
Concedo
0ee394ae1b falcon disable offload only for clblast 2023-08-30 22:35:24 +08:00
Concedo
29757de61f cmake disable buggy logs 2023-08-30 22:15:33 +08:00
Concedo
aa4ad830e2 log.h is broken so disable it first
Merge branch 'master' into concedo_experimental

# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	Makefile
#	README.md
#	tests/CMakeLists.txt
2023-08-30 21:58:54 +08:00
Concedo
a2a4eefa07 slight change to logits 2023-08-30 21:27:51 +08:00
Georgi Gerganov
b532a69b2f
convert.py : use dir name to name the llama 2023-08-30 13:29:40 +03:00
Concedo
1301bd7e29 Fix to skip GPU offloading so falcon models work correctly 2023-08-30 18:26:41 +08:00
Georgi Gerganov
c90d135eb4
examples : fix underscore in beam-search + .gitignore (close #2900) 2023-08-30 12:53:24 +03:00
M. Yusuf Sarıgöz
0d1c706181
gguf : add workflow for Pypi publishing (#2896)
* gguf : add workflow for Pypi publishing

* gguf : add workflow for Pypi publishing

* fix trailing whitespace
2023-08-30 12:47:40 +03:00
alonfaraj
9509294420
make : add test and update CI (#2897)
* build ci: run make test

* makefile:
- add all
- add test

* enable tests/test-tokenizer-0-llama

* fix path to model

* remove gcc-8 from macos build test

* Update Makefile

* Update Makefile
2023-08-30 12:42:51 +03:00
Concedo
d4c22a8b02 updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken 2023-08-30 16:50:55 +08:00
Gilad S
35092fb547
docs : add node-llama-cpp to README.md (#2885) 2023-08-30 11:40:12 +03:00