Commit graph

1828 commits

Author SHA1 Message Date
Concedo
e221843147 trying out mmq
Merge branch 'master' into concedo_experimental

# Conflicts:
#	CMakeLists.txt
#	README.md
2023-07-31 22:51:15 +08:00
Concedo
3e370f83ef Warning: Very experimental merge, do not use until confirmed stable. 2023-07-31 22:33:43 +08:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues (#2453) 2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE (#2468) 2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q (#2458) 2023-07-31 13:18:51 +02:00
Concedo
84ce184c4f layout 2023-07-31 17:33:31 +08:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes (#2455)
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
YellowRoseCx
f27972777f
correct semantic error in import_vars (#355)
* Hide unavailable backends & Add tooltip over backend count

Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

Add tooltip when hovering over backend count label

hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

* add some code comments

* hide "missing" if all are built

move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "

* small typo fix

* remove wrongly added leftover device choosing code

* fix labels

* move tooltip to function

* import vars logic fix

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-31 15:51:35 +08:00
Concedo
5ea5d19d6a SSE emoji fix 2023-07-30 22:31:20 +08:00
slaren
a113689571
ggml : add graph tensor allocator (#2411)
* ggml : add graph tensor allocator

* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset

* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00
Concedo
82d0695f0f Merge commit '9baf9ef304' into concedo_experimental 2023-07-30 18:18:23 +08:00
Concedo
90a37d63d5 up ver, added warning for max context 2023-07-30 18:07:14 +08:00
YellowRoseCx
c8af65760f
Hide unavailable backends & Add tooltip over backend count (#352)
* Hide unavailable backends & Add tooltip over backend count

Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

Add tooltip when hovering over backend count label

hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

* add some code comments

* hide "missing" if all are built

move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "

* small typo fix

* remove wrongly added leftover device choosing code

* fix labels

* move tooltip to function

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-30 17:50:55 +08:00
Concedo
45456fa6ca switch noavx2 to not use openblas, as it has incompatible instructions 2023-07-30 16:47:33 +08:00
Concedo
23825abee1 fix wrong key 2023-07-30 14:30:46 +08:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization (#2448) 2023-07-29 23:04:10 +02:00
Concedo
cde3760e52 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	ggml.h
#	llama.cpp
2023-07-29 17:47:00 +08:00
Concedo
9589d52079 added help link 2023-07-29 17:33:15 +08:00
Concedo
e4b42e5b15 fixed gui bugs 2023-07-29 11:15:57 +08:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation (#2389)
* common.h : add hellaswag / remove perplexity-lines

* common.cpp : add hellaswag / remove perplexity-lines

* perplexity.cpp : add hellswag scores / remove perplexity-lines

* perplexity.cpp : clean up

* common.h : change default param value

* common.cpp : Change default param

* perplexity.cpp : alter wording

* common.h : alter wording

* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405) 2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? (#2420)
* supporting more diverse tokenizers

* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace 2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 (#2400)
* add: server chat mode with llama2

* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method (#2431) 2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433) 2023-07-28 11:42:53 +03:00
Concedo
b40550cf1a change wiki link 2023-07-28 13:01:12 +08:00
Concedo
31486ebc8d updated readme 2023-07-28 11:32:55 +08:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions (#2308)
* Obtaining LLaMA 2 instructions

* Removed sharing warning for LLaMA 2

* Linked TheBloke's GGML repos

* Add LLaMA 2 to list of supported models

* Added LLaMA 2 usage instructions

* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files (#2427)
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00
Georgi Gerganov
1a941869cb
metal : disable graph concurrency optimization due to bug (#2413) 2023-07-27 11:00:54 +03:00
slaren
b5472ea0ad
ggml : fix assert in ggml_set_unary_op (#2410) 2023-07-26 23:57:23 +02:00
Cebtenzzre
6df1f5940f
make : build with -Wmissing-prototypes (#2394) 2023-07-26 21:00:04 +03:00
slaren
5488fb789e
ggml : allocate graphs in a context (#2392)
* ggml : graph allocation in contexts

* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx

* llama.cpp : allocate graph in the context

* add GGML_PAD

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-26 15:56:53 +02:00
Concedo
94e0a06daf updated lite, up ver (+1 squashed commits)
Squashed commits:

[7d6520f] updated lite, up ver
2023-07-26 11:03:17 +08:00
Concedo
b184380aae Revert "a better default rms_norm_eps"
This reverts commit 0c26799e77.
2023-07-26 10:23:45 +08:00
Concedo
f53d2aabb4 Merge branch 'master' into concedo_experimental 2023-07-26 10:19:59 +08:00
Kawrakow
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 18:35:53 +03:00
Concedo
6a054b80b0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	scripts/build-info.sh
2023-07-25 22:55:55 +08:00
Concedo
0c26799e77 a better default rms_norm_eps 2023-07-25 22:51:01 +08:00
slaren
07aaa0f63f
ggml : fix ggml_flash_attn to use op_params (#2387)
* ggml : fix ggml_flash_attn to use op_params
2023-07-25 16:20:12 +02:00
ldwang
fce48caf9a
convert.py : support bpe tokenizer (#2228)
* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert

Signed-off-by: ldwang <ftgreat@gmail.com>

* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <ftgreat@gmail.com>

---------

Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-07-25 16:22:09 +03:00
Jiahao Li
875086bdb9
ggml : relax contiguous constraints in activation function (#2371) 2023-07-25 15:58:32 +03:00
slaren
da1889834a
ggml : improve graph build time via hash table lookup (#2329)
* improve graph build time

* ggml_tensor : use 1 bit per flag

* use a hash table instead
2023-07-25 15:32:20 +03:00
Hesen Peng
82552b7f54
build : fix line breaking error in build-info.sh (#2349)
* fix line breaking

* build number line break removal
2023-07-25 15:24:09 +03:00
Xiao-Yong Jin
0c06204fb3
main : add --in-prefix-bos to prefix BOS to user inputs; keep EOS (#2304)
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS

The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.

It provides a way to strictly following the prompt format used in
Llama-2-chat.

The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.

* examples/common: move input_prefix_bos to other bools
2023-07-25 15:19:11 +03:00
Eve
1fed755b1f
ci : add non-AVX scalar build/test (#2356)
* noavx build and test

* we don't need to remove f16c in windows
2023-07-25 15:16:13 +03:00
katsu560
be2301bcda
k_quants : add AVX support to dot functions with QK_K as 64 (#2339)
* add AVX to ggml_vec_dot_q2_K_q8_K()

* add AVX to ggml_vec_dot_q3_K_q8_K()

* add AVX to ggml_vec_dot_q4_K_q8_K()

* add AVX to ggml_vec_dot_q5_K_q8_K()

* add AVX to ggml_vec_dot_q6_K_q8_K()

* refactor AVX code in ggml_vec_dot_q6_K_q8_K()
2023-07-25 15:13:41 +03:00
Shouzheng Liu
1aa18ef994
metal : concurrently dispatch commands (#2358)
* metal: concurrently dispatch commands

Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.

* metal: don't call find_concurrency automatically.

* metal : code style changes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 15:00:19 +03:00