Commit graph

1697 commits

Author SHA1 Message Date
Concedo
ba2040d1df compile fix for ARM NEON 2023-08-03 12:52:06 +08:00
Concedo
3fa6befdaf increase max free blocks 2023-08-03 10:50:16 +08:00
Concedo
34e60be41a compile fix 2023-08-03 10:36:14 +08:00
Concedo
b2eaec4261 updated lite 2023-08-02 22:54:17 +08:00
Johannes Gäßler
4f6b60c776
CUDA: Fix models with output size != 32000 (#2480) 2023-08-02 16:48:10 +02:00
Concedo
4c90fdc5cd Merge remote-tracking branch 'johannes/cuda-fix-output-size' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
2023-08-02 22:37:41 +08:00
Concedo
6fe92318f8 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	scripts/sync-ggml.sh
#	tests/CMakeLists.txt
#	tests/test-double-float.cpp
#	tests/test-grad0.cpp
#	tests/test-opt.cpp
2023-08-02 22:36:00 +08:00
JohannesGaessler
1e64d511d5 CUDA: Fix models with output size != 32000 2023-08-02 10:26:53 +02:00
ldwang
220d931864
readme : add Aquila-7B model series to supported models (#2487)
* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <ftgreat@gmail.com>

* Add Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

* Up Aquila-7B models in README.md

Signed-off-by: ldwang <ftgreat@gmail.com>

---------

Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02 11:21:11 +03:00
Eve
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
* fix hellaswag print format, cast away warning in test-double-float

* c++11 cannot use designated initializers

* add static to test-grad0.c internal functions

* use memcpy in test-double-float.c

* port c tests to c++

* use initializer list for ggml_init_params
2023-08-02 11:06:19 +03:00
Yiming Cui
a312193e18
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
* add support for chinese llama-2 / alpaca-2

* remove white spaces
2023-08-02 09:18:31 +03:00
Bono Lv
c574bddb36
fix a typo in examples/server/README.md (#2478) 2023-08-01 14:54:28 +02:00
Concedo
c58ffc92e5 fixed compile error 2023-08-01 18:28:49 +08:00
Concedo
84b28c4282 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-08-01 18:13:27 +08:00
Concedo
46682e5cb3 added mmq launch flag 2023-08-01 17:57:13 +08:00
ebraminio
86aeb27734
server : Support dark mode (#2414)
* server : Support dark mode

So it respects user system light / dark settings.

* Update index.html.hpp by running ./deps.sh
2023-08-01 10:56:23 +02:00
Matteo Boschini
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
* Added gqa8 kernel to allow llama-2-70B on metal

* Update ggml-metal.m

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast

* Added ne03==ne13 assertion

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option (#2473) 2023-07-31 21:02:19 +02:00
Johannes Gäßler
b772bba42e
CUDA: fixed cmake F16 option (#2471) 2023-07-31 19:52:22 +02:00
Concedo
e221843147 trying out mmq
Merge branch 'master' into concedo_experimental

# Conflicts:
#	CMakeLists.txt
#	README.md
2023-07-31 22:51:15 +08:00
Concedo
3e370f83ef Warning: Very experimental merge, do not use until confirmed stable. 2023-07-31 22:33:43 +08:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453) 2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468) 2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q (#2458) 2023-07-31 13:18:51 +02:00
Concedo
84ce184c4f layout 2023-07-31 17:33:31 +08:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
YellowRoseCx
f27972777f
correct semantic error in import_vars (#355)
* Hide unavailable backends & Add tooltip over backend count

Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

Add tooltip when hovering over backend count label

hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

* add some code comments

* hide "missing" if all are built

move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "

* small typo fix

* remove wrongly added leftover device choosing code

* fix labels

* move tooltip to function

* import vars logic fix

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-31 15:51:35 +08:00
Concedo
5ea5d19d6a SSE emoji fix 2023-07-30 22:31:20 +08:00
slaren
a113689571
ggml : add graph tensor allocator (#2411)
* ggml : add graph tensor allocator

* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset

* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00
Concedo
82d0695f0f Merge commit '9baf9ef304' into concedo_experimental 2023-07-30 18:18:23 +08:00
Concedo
90a37d63d5 up ver, added warning for max context 2023-07-30 18:07:14 +08:00
YellowRoseCx
c8af65760f
Hide unavailable backends & Add tooltip over backend count (#352)
* Hide unavailable backends & Add tooltip over backend count

Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

Add tooltip when hovering over backend count label

hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

* add some code comments

* hide "missing" if all are built

move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "

* small typo fix

* remove wrongly added leftover device choosing code

* fix labels

* move tooltip to function

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-30 17:50:55 +08:00
Concedo
45456fa6ca switch noavx2 to not use openblas, as it has incompatible instructions 2023-07-30 16:47:33 +08:00
Concedo
23825abee1 fix wrong key 2023-07-30 14:30:46 +08:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization (#2448) 2023-07-29 23:04:10 +02:00
Concedo
cde3760e52 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	ggml.h
#	llama.cpp
2023-07-29 17:47:00 +08:00
Concedo
9589d52079 added help link 2023-07-29 17:33:15 +08:00
Concedo
e4b42e5b15 fixed gui bugs 2023-07-29 11:15:57 +08:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation (#2389)
* common.h : add hellaswag / remove perplexity-lines

* common.cpp : add hellaswag / remove perplexity-lines

* perplexity.cpp : add hellswag scores / remove perplexity-lines

* perplexity.cpp : clean up

* common.h : change default param value

* common.cpp : Change default param

* perplexity.cpp : alter wording

* common.h : alter wording

* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405) 2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? (#2420)
* supporting more diverse tokenizers

* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace 2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 (#2400)
* add: server chat mode with llama2

* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method (#2431) 2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) 2023-07-28 11:42:53 +03:00
Concedo
b40550cf1a change wiki link 2023-07-28 13:01:12 +08:00
Concedo
31486ebc8d updated readme 2023-07-28 11:32:55 +08:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions (#2308)
* Obtaining LLaMA 2 instructions

* Removed sharing warning for LLaMA 2

* Linked TheBloke's GGML repos

* Add LLaMA 2 to list of supported models

* Added LLaMA 2 usage instructions

* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files (#2427)
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00