Concedo
90a37d63d5
up ver, added warning for max context
2023-07-30 18:07:14 +08:00
YellowRoseCx
c8af65760f
Hide unavailable backends & Add tooltip over backend count ( #352 )
...
* Hide unavailable backends & Add tooltip over backend count
Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command
Add tooltip when hovering over backend count label
hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built
* add some code comments
* hide "missing" if all are built
move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
" if len(runopts)==6 else + "
* small typo fix
* remove wrongly added leftover device choosing code
* fix labels
* move tooltip to function
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-07-30 17:50:55 +08:00
Concedo
45456fa6ca
switch noavx2 to not use openblas, as it has incompatible instructions
2023-07-30 16:47:33 +08:00
Concedo
23825abee1
fix wrong key
2023-07-30 14:30:46 +08:00
Johannes Gäßler
11f3ca06b8
CUDA: Quantized matrix matrix multiplication ( #2160 )
...
* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds
2023-07-29 23:04:44 +02:00
Johannes Gäßler
9baf9ef304
CUDA: faster multi GPU synchronization ( #2448 )
2023-07-29 23:04:10 +02:00
Concedo
cde3760e52
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# ggml.h
# llama.cpp
2023-07-29 17:47:00 +08:00
Concedo
9589d52079
added help link
2023-07-29 17:33:15 +08:00
Concedo
e4b42e5b15
fixed gui bugs
2023-07-29 11:15:57 +08:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation ( #2389 )
...
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c ( #2405 )
2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? ( #2420 )
...
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace
2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 ( #2400 )
...
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method ( #2431 )
2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B ( #2433 )
2023-07-28 11:42:53 +03:00
Concedo
b40550cf1a
change wiki link
2023-07-28 13:01:12 +08:00
Concedo
31486ebc8d
updated readme
2023-07-28 11:32:55 +08:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions ( #2308 )
...
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files ( #2427 )
...
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00
Georgi Gerganov
1a941869cb
metal : disable graph concurrency optimization due to bug ( #2413 )
2023-07-27 11:00:54 +03:00
slaren
b5472ea0ad
ggml : fix assert in ggml_set_unary_op ( #2410 )
2023-07-26 23:57:23 +02:00
Cebtenzzre
6df1f5940f
make : build with -Wmissing-prototypes ( #2394 )
2023-07-26 21:00:04 +03:00
slaren
5488fb789e
ggml : allocate graphs in a context ( #2392 )
...
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-26 15:56:53 +02:00
Concedo
94e0a06daf
updated lite, up ver (+1 squashed commits)
...
Squashed commits:
[7d6520f] updated lite, up ver
2023-07-26 11:03:17 +08:00
Concedo
b184380aae
Revert "a better default rms_norm_eps"
...
This reverts commit 0c26799e77
.
2023-07-26 10:23:45 +08:00
Concedo
f53d2aabb4
Merge branch 'master' into concedo_experimental
2023-07-26 10:19:59 +08:00
Kawrakow
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 18:35:53 +03:00
Concedo
6a054b80b0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# scripts/build-info.sh
2023-07-25 22:55:55 +08:00
Concedo
0c26799e77
a better default rms_norm_eps
2023-07-25 22:51:01 +08:00
slaren
07aaa0f63f
ggml : fix ggml_flash_attn to use op_params ( #2387 )
...
* ggml : fix ggml_flash_attn to use op_params
2023-07-25 16:20:12 +02:00
ldwang
fce48caf9a
convert.py : support bpe tokenizer ( #2228 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-07-25 16:22:09 +03:00
Jiahao Li
875086bdb9
ggml : relax contiguous constraints in activation function ( #2371 )
2023-07-25 15:58:32 +03:00
slaren
da1889834a
ggml : improve graph build time via hash table lookup ( #2329 )
...
* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead
2023-07-25 15:32:20 +03:00
Hesen Peng
82552b7f54
build : fix line breaking error in build-info.sh ( #2349 )
...
* fix line breaking
* build number line break removal
2023-07-25 15:24:09 +03:00
Xiao-Yong Jin
0c06204fb3
main : add --in-prefix-bos
to prefix BOS to user inputs; keep EOS ( #2304 )
...
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools
2023-07-25 15:19:11 +03:00
Eve
1fed755b1f
ci : add non-AVX scalar build/test ( #2356 )
...
* noavx build and test
* we don't need to remove f16c in windows
2023-07-25 15:16:13 +03:00
katsu560
be2301bcda
k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )
...
* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K()
2023-07-25 15:13:41 +03:00
Shouzheng Liu
1aa18ef994
metal : concurrently dispatch commands ( #2358 )
...
* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 15:00:19 +03:00
Concedo
3e68cdd26a
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# tests/test-grad0.c
2023-07-25 18:52:48 +08:00
Kawrakow
9a08eaf3c4
Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )
...
* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:29 +03:00
Kawrakow
129d844c87
Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )
...
* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:04 +03:00
Concedo
66e4b5141e
fix horde worker host and client agent
2023-07-25 18:18:41 +08:00
slaren
d5512b782b
server: add rms_norm_eps parameter ( #2380 )
2023-07-25 12:36:17 +03:00
Henri Vasserman
c798308e3a
[Server] Escape HTML in webchat ( #2368 )
...
* escape HTML in webchat
* add amp
2023-07-25 10:27:34 +03:00
Concedo
48c27a9ce1
hotfix for 70b broadcast issues
2023-07-25 01:32:47 +08:00
Александр Герман
9731682ad6
Update Makefile ( #345 )
...
fix requirements for idiotic source file concatenation (lol)
2023-07-25 00:21:32 +08:00
slaren
41c674161f
make rms_norm_eps a parameter ( #2374 )
...
* make rms_norm_eps a parameter
* add rms_norm_eps to command line
* fix baby llama, test-grad0
* use scientific notation for eps param in the help
ggml-ci
2023-07-24 17:57:12 +02:00
Concedo
d8d2449bfb
better label (+1 squashed commits)
...
Squashed commits:
[f573b2c] cuda 3 target arch
2023-07-24 23:07:31 +08:00
Aarni Koskela
b3f138d058
Chat UI extras ( #2366 )
...
* makefile: correct deps for server
* server: tighten settings layout a little
* server: expose all currently configured generation params in UI
* server: expose remaining generation params, for the adventurous
* server: embetter mirostat fields
2023-07-24 17:54:22 +03:00