Concedo
46682e5cb3
added mmq launch flag
2023-08-01 17:57:13 +08:00
ebraminio
86aeb27734
server : Support dark mode ( #2414 )
...
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
2023-08-01 10:56:23 +02:00
Matteo Boschini
1873ff586b
metal : add gqa8 kernel to allow llama-2-70B on metal ( #2459 )
...
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-08-01 10:43:12 +03:00
Johannes Gäßler
49e7cb5bb1
CUDA: fixed LLAMA_FAST compilation option ( #2473 )
2023-07-31 21:02:19 +02:00
Johannes Gäßler
b772bba42e
CUDA: fixed cmake F16 option ( #2471 )
2023-07-31 19:52:22 +02:00
Concedo
e221843147
trying out mmq
...
Merge branch 'master' into concedo_experimental
# Conflicts:
# CMakeLists.txt
# README.md
2023-07-31 22:51:15 +08:00
Concedo
3e370f83ef
Warning: Very experimental merge, do not use until confirmed stable.
2023-07-31 22:33:43 +08:00
Johannes Gäßler
0728c5a8b9
CUDA: mmq CLI option, fixed mmq build issues ( #2453 )
2023-07-31 15:44:35 +02:00
Johannes Gäßler
1215ed7d5c
CUDA: Implemented row flattening for non-glm RoPE ( #2468 )
2023-07-31 14:32:30 +02:00
Johannes Gäßler
2dbf518911
CUDA: fewer memory bank conflicts for mul_mat_q ( #2458 )
2023-07-31 13:18:51 +02:00
Concedo
84ce184c4f
layout
2023-07-31 17:33:31 +08:00
slaren
9d2382b3e4
Fix Metal backend broken from the allocator changes ( #2455 )
...
* fix Metal backend broken from the allocator changes
2023-07-31 11:02:53 +02:00
YellowRoseCx
f27972777f
correct semantic error in import_vars ( #355 )
...
* Hide unavailable backends & Add tooltip over backend count
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
Add tooltip when hovering over backend count label
hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built
* add some code comments
* hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "
* small typo fix
* remove wrongly added leftover device choosing code
* fix labels
* move tooltip to function
* import vars logic fix
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-31 15:51:35 +08:00
Concedo
5ea5d19d6a
SSE emoji fix
2023-07-30 22:31:20 +08:00
slaren
a113689571
ggml : add graph tensor allocator ( #2411 )
...
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
2023-07-30 15:58:01 +02:00
Concedo
82d0695f0f
Merge commit ' 9baf9ef304
' into concedo_experimental
2023-07-30 18:18:23 +08:00
Concedo
90a37d63d5
up ver, added warning for max context
2023-07-30 18:07:14 +08:00
YellowRoseCx
c8af65760f
Hide unavailable backends & Add tooltip over backend count ( #352 )
...
* Hide unavailable backends & Add tooltip over backend count
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
Add tooltip when hovering over backend count label
hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built
* add some code comments
* hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "
* small typo fix
* remove wrongly added leftover device choosing code
* fix labels
* move tooltip to function
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-30 17:50:55 +08:00
Concedo
45456fa6ca
switch noavx2 to not use openblas, as it has incompatible instructions
2023-07-30 16:47:33 +08:00
Concedo
23825abee1
fix wrong key
2023-07-30 14:30:46 +08:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication ( #2160 )
...
* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization ( #2448 )
2023-07-29 23:04:10 +02:00
Concedo
cde3760e52
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# ggml.h
# llama.cpp
2023-07-29 17:47:00 +08:00
Concedo
9589d52079
added help link
2023-07-29 17:33:15 +08:00
Concedo
e4b42e5b15
fixed gui bugs
2023-07-29 11:15:57 +08:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation ( #2389 )
...
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c ( #2405 )
2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? ( #2420 )
...
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace
2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 ( #2400 )
...
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method ( #2431 )
2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B ( #2433 )
2023-07-28 11:42:53 +03:00
Concedo
b40550cf1a
change wiki link
2023-07-28 13:01:12 +08:00
Concedo
31486ebc8d
updated readme
2023-07-28 11:32:55 +08:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions ( #2308 )
...
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files ( #2427 )
...
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00
Georgi Gerganov
1a941869cb
metal : disable graph concurrency optimization due to bug ( #2413 )
2023-07-27 11:00:54 +03:00
slaren
b5472ea0ad
ggml : fix assert in ggml_set_unary_op ( #2410 )
2023-07-26 23:57:23 +02:00
Cebtenzzre
6df1f5940f
make : build with -Wmissing-prototypes ( #2394 )
2023-07-26 21:00:04 +03:00
slaren
5488fb789e
ggml : allocate graphs in a context ( #2392 )
...
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-26 15:56:53 +02:00
Concedo
94e0a06daf
updated lite, up ver (+1 squashed commits)
...
Squashed commits:
[7d6520f] updated lite, up ver
2023-07-26 11:03:17 +08:00
Concedo
b184380aae
Revert "a better default rms_norm_eps"
...
This reverts commit 0c26799e77
.
2023-07-26 10:23:45 +08:00
Concedo
f53d2aabb4
Merge branch 'master' into concedo_experimental
2023-07-26 10:19:59 +08:00
Kawrakow
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 18:35:53 +03:00
Concedo
6a054b80b0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# scripts/build-info.sh
2023-07-25 22:55:55 +08:00
Concedo
0c26799e77
a better default rms_norm_eps
2023-07-25 22:51:01 +08:00
slaren
07aaa0f63f
ggml : fix ggml_flash_attn to use op_params ( #2387 )
...
* ggml : fix ggml_flash_attn to use op_params
2023-07-25 16:20:12 +02:00
ldwang
fce48caf9a
convert.py : support bpe tokenizer ( #2228 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-07-25 16:22:09 +03:00
Jiahao Li
875086bdb9
ggml : relax contiguous constraints in activation function ( #2371 )
2023-07-25 15:58:32 +03:00
slaren
da1889834a
ggml : improve graph build time via hash table lookup ( #2329 )
...
* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead
2023-07-25 15:32:20 +03:00