Commit graph

1129 commits

Author SHA1 Message Date
Concedo
54dc75ce73 Merge branch 'concedo-opencl-dev' into concedo_experimental 2023-06-05 13:31:53 +08:00
Concedo
f6431ded5d removed flags from the CL pool malloc, apply code tidying suggestions. 2023-06-05 13:31:37 +08:00
Concedo
c27f250b6f bigger scratch buffer for 3B llama 2023-06-05 13:24:53 +08:00
Concedo
9270056269 fixed compile error in cmake VS 2023-06-05 11:48:04 +08:00
Georgi Gerganov
827f5eda91
readme : update hot topics 2023-06-04 23:38:19 +03:00
Georgi Gerganov
ecb217db4f
llama : Metal inference (#1642)
* mtl : export the LLaMA computation graph

* ci : disable temporary

* mtl : adapt the MNIST example as starter

* mtl : no need for mtl-export tool, add cli arg for main instead

* mtl : export just a small part of the graph for now to make it easier

* mtl : move MSL code into separate file for easy editing

* mtl : initial get_rows_q4_0 kernel

* mtl : confirmed get_rows_q4_0 is working correctly

* mtl : add rms_norm kernel + confirm working

* mtl : add mul kernel + confirm working

* mtl : initial mul_mat Q4 kernel (wrong results)

* mtl : mul_mat fixes (still wrong)

* mtl : another mul_mat Q4 (still does not work)

* mtl : working mul_mat q4

* ggml : fix handling of "view" ops in ggml_graph_import()

* mtl : add rope kernel

* mtl : add reshape and transpose handling

* ggml : store offset as opt arg for ggml_view_xd() operators

* mtl : add cpy kernel + handle view ops

* mtl : confirm f16 x f32 attention mul mat

* mtl : add scale kernel

* mtl : add diag_mask_inf kernel

* mtl : fix soft_max kernel

* ggml : update ggml_nbytes() to handle non-contiguous tensors

* mtl : verify V tensor contents

* mtl : add f32 -> f32 cpy kernel

* mtl : add silu kernel

* mtl : add non-broadcast mul kernel

* mtl : full GPU inference of the computation graph

* mtl : optimize rms_norm and soft_max kernels

* mtl : add f16 mat x f32 vec multiplication kernel

* mtl : fix bug in f16 x f32 mul mat + speed-up computation

* mtl : faster mul_mat_q4_0_f32 kernel

* mtl : fix kernel signature + roll inner loop

* mtl : more threads for rms_norm + better timing

* mtl : remove printfs from inner loop

* mtl : simplify implementation

* mtl : add save/load vocab to ggml file

* mtl : plug Metal inference into llama.cpp (very quick-n-dirty)

* mtl : make it work with main example

Lots of hacks but at least now it generates text

* mtl : preparing for merge

* mtl : clean-up ggml mtl interface + suport scratch / inplace

* mtl : remove temp / debug code

* metal : final refactoring and simplification

* Revert "ci : disable temporary"

This reverts commit 98c267fc77.

* metal : add comments

* metal : clean-up stuff, fix typos

* readme : add Metal instructions

* readme : add example for main
2023-06-04 23:34:30 +03:00
Concedo
b7fb1aa233 removed build info in cmake 2023-06-04 22:34:27 +08:00
Concedo
6f66e4c4a5 updated lite 2023-06-04 22:27:15 +08:00
Concedo
9aa2d8535b hide gpu input box when dropdown not selected, minor memory fix for neox and gptj 2023-06-04 21:47:17 +08:00
Concedo
1ddbb9acd9 Merge branch 'concedo-opencl-dev' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-06-04 18:07:27 +08:00
Concedo
64e3e74556 change max value size_t to use limits 2023-06-04 18:04:52 +08:00
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev 2023-06-04 18:00:06 +08:00
Concedo
dd4b5c64b8 Merge branch 'master' into concedo_experimental
# Conflicts:
#	ggml-opencl.cpp
2023-06-04 17:38:22 +08:00
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
88919095b5 edit readme 2023-06-04 12:09:49 +08:00
Concedo
c3c05fc33b further cleanup, refactor renamemode to hordeconfig 2023-06-04 11:57:46 +08:00
Concedo
2868fac676 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	README.md
2023-06-04 11:07:07 +08:00
Concedo
20803c221e cleaning up some old junk 2023-06-04 11:05:46 +08:00
Concedo
b62279cb39 buf size for starcoder still not good 2023-06-04 00:41:08 +08:00
Henri Vasserman
d8bd0013e8
Add info about CUDA_VISIBLE_DEVICES (#1682) 2023-06-03 16:35:20 +03:00
Jiří Podivín
b5c85468a3
Docker: change to calling convert.py (#1641)
Deprecation disclaimer was added to convert-pth-to-ggml.py
2023-06-03 15:11:53 +03:00
Evan Jones
136476e898
Fix prompt cache saving and chat-persistent rollover (#1678)
* Fix prompt cache saving and chat-persistent rollover (fixes #1670)

* clang-tidy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-03 07:28:45 -04:00
Concedo
c1b293d31a fixed MPT ooms 2023-06-03 18:37:13 +08:00
Concedo
8bd9a3a48b updated readme, improved simple launcher 2023-06-03 17:17:15 +08:00
Concedo
6f82e17b7a added MPT support 2023-06-03 16:14:08 +08:00
Concedo
9839259b63 allow specifying the horde limit as well 2023-06-03 00:55:44 +08:00
Concedo
96b0e536b7 Merge branch 'opencl-dev-concedo' into concedo_experimental 2023-06-02 22:12:14 +08:00
Concedo
59fe16877d Clblast fixes + enhancements to save VRAM:
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-06-02 12:19:59 +08:00
Concedo
144d8a8312 updated lite 2023-06-02 12:19:51 +08:00
0cc4m
24239f0df7 Improve implementation 2023-06-01 18:57:08 +02:00
Concedo
37659d2c4e allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads. 2023-06-01 22:33:50 +08:00
Concedo
49272e3c53 adjusted defaults 2023-06-01 20:03:44 +08:00
0cc4m
457aaf5bad Reduce code duplication between cuda and opencl branches 2023-06-01 07:33:32 +02:00
Concedo
234270bd83 back to 32 block size, not better 2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6 change dmmv block size 2023-05-31 21:40:12 +08:00
Concedo
077ee4e989 Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612)""
This reverts commit 4afa38e744.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 17:53:14 +08:00
Concedo
32dada5e5f updated lite 2023-05-31 17:52:09 +08:00
0cc4m
5e1eecfe12 Adapt to #1612 cl_mem malloc changes 2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387 Merge remote-tracking branch 'origin/master' into opencl-dev 2023-05-31 06:58:51 +02:00
Concedo
a5a85d68c6 Merge branch 'master' into concedo_experimental
# Conflicts:
#	llama.cpp
2023-05-31 10:51:54 +08:00
Concedo
85c9f7df41 Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental 2023-05-31 10:20:32 +08:00
Concedo
4afa38e744 Revert "opencl : no need to allocate cl_mem on heap (#1612)"
This reverts commit bb051d9723.
2023-05-31 10:20:23 +08:00
Henri Vasserman
ffb06a345e
OpenLLaMA 3B support (#1588)
This adds support to llama.cpp to load the model.

Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing.

Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>
2023-05-30 21:24:22 +03:00
0cc4m
ac6b49ed45 Reduce queueing overhead for contiguous tensors by using single mul kernel call 2023-05-30 18:49:53 +02:00
Concedo
56456797f4 Merge branch 'master' into concedo_experimental 2023-05-30 22:15:58 +08:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API 2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi 2023-05-29 19:30:49 +03:00
Concedo
ea336bfa33 rwkv eos 2023-05-29 22:40:27 +08:00