Commit graph

1781 commits

Author SHA1 Message Date
Eve
1fed755b1f
ci : add non-AVX scalar build/test (#2356)
* noavx build and test

* we don't need to remove f16c in windows
2023-07-25 15:16:13 +03:00
katsu560
be2301bcda
k_quants : add AVX support to dot functions with QK_K as 64 (#2339)
* add AVX to ggml_vec_dot_q2_K_q8_K()

* add AVX to ggml_vec_dot_q3_K_q8_K()

* add AVX to ggml_vec_dot_q4_K_q8_K()

* add AVX to ggml_vec_dot_q5_K_q8_K()

* add AVX to ggml_vec_dot_q6_K_q8_K()

* refactor AVX code in ggml_vec_dot_q6_K_q8_K()
2023-07-25 15:13:41 +03:00
Shouzheng Liu
1aa18ef994
metal : concurrently dispatch commands (#2358)
* metal: concurrently dispatch commands

Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.

* metal: don't call find_concurrency automatically.

* metal : code style changes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 15:00:19 +03:00
Concedo
3e68cdd26a Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	tests/test-grad0.c
2023-07-25 18:52:48 +08:00
Kawrakow
9a08eaf3c4
Another speed gain for Q4_0 and Q4_1 on Metal (#2375)
* Another speed gain for Q4_0 and Q4_1 on Metal

* Have N_DST, etc., be template parameters

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:29 +03:00
Kawrakow
129d844c87
Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359)
* Fix Q4_K and Q5_K for QK_K = 64

* Very slightly better Q5_K bit fiddling

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:04 +03:00
Concedo
66e4b5141e fix horde worker host and client agent 2023-07-25 18:18:41 +08:00
slaren
d5512b782b
server: add rms_norm_eps parameter (#2380) 2023-07-25 12:36:17 +03:00
Henri Vasserman
c798308e3a
[Server] Escape HTML in webchat (#2368)
* escape HTML in webchat
* add amp
2023-07-25 10:27:34 +03:00
Concedo
48c27a9ce1 hotfix for 70b broadcast issues 2023-07-25 01:32:47 +08:00
Александр Герман
9731682ad6
Update Makefile (#345)
fix requirements for idiotic source file concatenation (lol)
2023-07-25 00:21:32 +08:00
slaren
41c674161f
make rms_norm_eps a parameter (#2374)
* make rms_norm_eps a parameter

* add rms_norm_eps to command line

* fix baby llama, test-grad0

* use scientific notation for eps param in the help

ggml-ci
2023-07-24 17:57:12 +02:00
Concedo
d8d2449bfb better label (+1 squashed commits)
Squashed commits:

[f573b2c] cuda 3 target arch
2023-07-24 23:07:31 +08:00
Aarni Koskela
b3f138d058
Chat UI extras (#2366)
* makefile: correct deps for server

* server: tighten settings layout a little

* server: expose all currently configured generation params in UI

* server: expose remaining generation params, for the adventurous

* server: embetter mirostat fields
2023-07-24 17:54:22 +03:00
Concedo
7555dae4cc ditch advanced subparsers 2023-07-24 22:40:36 +08:00
Concedo
8a9b40840b Merge branch 'master' into concedo_experimental
# Conflicts:
#	tests/test-grad0.c
#	tests/test-opt.c
2023-07-24 20:51:28 +08:00
Concedo
6d71e100fe buff buffers 2023-07-24 20:33:17 +08:00
Georgi Gerganov
5b2b2dc6ae
ggml : sync (unary ops refactor, static-correctness) (#2370)
* ggml : sync (unary ops, tests)

ggml-ci

* tests : remove unnecessary funcs
2023-07-24 14:46:21 +03:00
Concedo
825e34baa3 default horde name and better handling for horde (+3 squashed commit)
Squashed commit:

[fadfa60] better idle handling for horde worker

[a3971e6] updated lite

[2ca2b79] seems to not generate rubbish
2023-07-24 18:41:41 +08:00
Kawrakow
42f70cb2f6
Fix scalar version of Q5_K when QK_K = 64 (#2362)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-24 12:55:02 +03:00
Concedo
c7136f03d9 added support for tensor_split parameter as an advanced parameter. 2023-07-24 17:16:19 +08:00
Concedo
66328fcd80 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
2023-07-24 15:44:26 +08:00
Concedo
94499dba25 added support for 70b llama 2 2023-07-24 15:20:18 +08:00
Concedo
993ba3b026 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-24 11:59:00 +08:00
Evan Jones
84e09a7d8b
llama : add grammar-based sampling (#1773)
* llama, main : constrain sampling to grammar

* allow loading grammar from file

* fix whitespace errors

* handle & print parser errors

* add comments to grammar syntax and allow newlines where unambiguous

* add missing include

* support alternates in root rule

* fix bugs with empty token and EOS

* adjust JSON grammar

* remove swp file

* rewrite ternary expressions

Co-authored-by: Henri Vasserman <henv@hot.ee>

* use struct for grammar elements and add Unicode support

* add unicode escapes

* add inverse char ranges

* only sample full tokens (no peeking or truncation)

* llama : minor style changes

blindly applied in online editor - hopefully I didn't break something

* update help text

* add warning message if EOS is disabled

---------

Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23 23:58:10 -04:00
Concedo
280abaf029 added stop reason in the perf endpoint 2023-07-24 11:55:35 +08:00
Kawrakow
2f9cf974a0
Some more Q4_K and Q5_K speedup on CUDA (#2346)
* Faster Q5_K on CUDA

* Small Q5_K improvement on older GPUs

* Spped up Q4_K on CUDA

GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t

* Spped up Q4_K on CUDA

GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t

* Address PR comments

* Add some comments to satisfy PR reviewer

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-24 00:19:47 +03:00
IgnacioFDM
4f06592cc6
Add gqa parameter support to the server (#2351)
* Add gqa parameter support to the server
* Change help from stderr to stdout
2023-07-23 23:31:17 +03:00
Johannes Gäßler
70d26ac388
Fix __dp4a documentation (#2348) 2023-07-23 17:49:06 +02:00
Concedo
910744e2c0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	flake.nix
#	llama.cpp
2023-07-23 22:37:38 +08:00
Concedo
c28ab4e1b7 update lite, try support k80 2023-07-23 21:50:35 +08:00
wzy
57921ca6db
common : n_threads == -1 uses std:🧵:hardware_concurrency() (#2347)
* Fix #2345, fix incorrect n_threads

* Update examples/common.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23 16:33:02 +03:00
slaren
3602ac4255
fix n_tasks (#2342)
ggml-ci
2023-07-23 15:19:39 +02:00
slaren
95a6c595e7
ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)
* ggml: move op parameters from tensors to ggml_tensor::op_params

* alibi: use memcpy for float params

* remove `src[1] = NULL` in ops
2023-07-23 14:36:02 +02:00
Georgi Gerganov
e76d630df1
llama : grouped-query attention + LLaMAv2 70B support (#2276)
* CUDA: GQA implementation

* llama : support for GQA and LLaMAv2 70B

ggml-ci

* py : fix hparams parsing (if-else blocks)

ggml-ci

* py : oh boy ..

ggml-ci

* help : fix gqa value for 70B

ggml-ci

---------

Co-authored-by: JohannesGaessler <johannesg@5d6.de>
2023-07-23 15:09:47 +03:00
maddes8cht
1d0824b247
llama : print help to stdout (#2338) 2023-07-23 14:59:48 +03:00
wzy
bc3ec2cdc9
flake : support nix build '.#opencl' (#2337) 2023-07-23 14:57:02 +03:00
Christian Demsar
a940458e48
llama : print max tensor size to stderr (#2336) 2023-07-23 14:56:34 +03:00
Jose Maldonado
91171b8072
make : fix CLBLAST compile support in FreeBSD (#2331)
* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD

* More general use-case for CLBLAST support (Linux and FreeBSD)
2023-07-23 14:52:08 +03:00
AustinMroz
355c80f49e
examples : simplify vim plugin (#2327)
Uses builtin json_encode and json_decode functions to simplify escaping
Removes the need for temp files
2023-07-23 14:16:48 +03:00
Jiahao Li
83a00ce69b
metal : support bcast add & dup & cont op (#2323) 2023-07-23 14:00:37 +03:00
Concedo
2e84eac7f6 Merge branch 'master' into concedo_experimental 2023-07-23 16:23:00 +08:00
Concedo
aa05eadb6f Merge branch 'master' into concedo_experimental
# Conflicts:
#	llama.cpp
2023-07-23 16:22:44 +08:00
Kawrakow
d2a43664f9
Speed up Q4_K (#2322)
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-23 08:49:20 +03:00
Concedo
1108232e30 Merge branch 'concedo' into concedo_experimental 2023-07-23 09:59:58 +08:00
Concedo
0cca0726fe reduce number of retries, fixed maxlength > maxctx bug 2023-07-23 09:59:34 +08:00
Ycros
56995caa48
Fix mirostatv2. (#338) 2023-07-23 09:52:03 +08:00
Johannes Gäßler
b9b7d94fc1
CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313) 2023-07-22 21:27:34 +02:00
Georgi Gerganov
b47b8a9cfe
llama : optimize memory buffers (#2325) 2023-07-22 21:17:57 +03:00
Concedo
fa0270df7c added some checks to skip generation if busy 2023-07-22 23:10:04 +08:00