Concedo
c27f250b6f
bigger scratch buffer for 3B llama
2023-06-05 13:24:53 +08:00
Concedo
9270056269
fixed compile error in cmake VS
2023-06-05 11:48:04 +08:00
Concedo
b7fb1aa233
removed build info in cmake
2023-06-04 22:34:27 +08:00
Concedo
6f66e4c4a5
updated lite
2023-06-04 22:27:15 +08:00
Concedo
9aa2d8535b
hide gpu input box when dropdown not selected, minor memory fix for neox and gptj
2023-06-04 21:47:17 +08:00
Concedo
1ddbb9acd9
Merge branch 'concedo-opencl-dev' into concedo_experimental
...
# Conflicts:
# ggml-opencl.cpp
2023-06-04 18:07:27 +08:00
Concedo
64e3e74556
change max value size_t to use limits
2023-06-04 18:04:52 +08:00
LostRuins
2b700749e5
Merge branch 'master' into concedo-opencl-dev
2023-06-04 18:00:06 +08:00
Concedo
dd4b5c64b8
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml-opencl.cpp
2023-06-04 17:38:22 +08:00
0cc4m
dcb2ed4826
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel ( #1653 )
...
* Use events instead of clFinish, where possible
* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
* Reduce queueing overhead for contiguous tensors by using single mul kernel call
* Adapt to #1612 cl_mem malloc changes
* Reduce code duplication between cuda and opencl branches
* Improve implementation
2023-06-04 08:12:05 +02:00
Concedo
88919095b5
edit readme
2023-06-04 12:09:49 +08:00
Concedo
c3c05fc33b
further cleanup, refactor renamemode to hordeconfig
2023-06-04 11:57:46 +08:00
Concedo
2868fac676
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# README.md
2023-06-04 11:07:07 +08:00
Concedo
20803c221e
cleaning up some old junk
2023-06-04 11:05:46 +08:00
Concedo
b62279cb39
buf size for starcoder still not good
2023-06-04 00:41:08 +08:00
Henri Vasserman
d8bd0013e8
Add info about CUDA_VISIBLE_DEVICES ( #1682 )
2023-06-03 16:35:20 +03:00
Jiří Podivín
b5c85468a3
Docker: change to calling convert.py ( #1641 )
...
Deprecation disclaimer was added to convert-pth-to-ggml.py
2023-06-03 15:11:53 +03:00
Evan Jones
136476e898
Fix prompt cache saving and chat-persistent rollover ( #1678 )
...
* Fix prompt cache saving and chat-persistent rollover (fixes #1670 )
* clang-tidy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-06-03 07:28:45 -04:00
Concedo
c1b293d31a
fixed MPT ooms
2023-06-03 18:37:13 +08:00
Concedo
8bd9a3a48b
updated readme, improved simple launcher
2023-06-03 17:17:15 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
9839259b63
allow specifying the horde limit as well
2023-06-03 00:55:44 +08:00
Concedo
96b0e536b7
Merge branch 'opencl-dev-concedo' into concedo_experimental
2023-06-02 22:12:14 +08:00
Concedo
59fe16877d
Clblast fixes + enhancements to save VRAM:
...
1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.
2023-06-02 22:10:49 +08:00
Concedo
8d0c81e7cc
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-06-02 12:19:59 +08:00
Concedo
144d8a8312
updated lite
2023-06-02 12:19:51 +08:00
0cc4m
24239f0df7
Improve implementation
2023-06-01 18:57:08 +02:00
Concedo
37659d2c4e
allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.
2023-06-01 22:33:50 +08:00
Concedo
49272e3c53
adjusted defaults
2023-06-01 20:03:44 +08:00
0cc4m
457aaf5bad
Reduce code duplication between cuda and opencl branches
2023-06-01 07:33:32 +02:00
Concedo
234270bd83
back to 32 block size, not better
2023-06-01 00:14:22 +08:00
Concedo
446e42a8c6
change dmmv block size
2023-05-31 21:40:12 +08:00
Concedo
077ee4e989
Revert "Revert "opencl : no need to allocate cl_mem on heap ( #1612 )""
...
This reverts commit 4afa38e744
.
2023-05-31 18:00:52 +08:00
Concedo
50c85bea4c
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 17:53:14 +08:00
Concedo
32dada5e5f
updated lite
2023-05-31 17:52:09 +08:00
0cc4m
5e1eecfe12
Adapt to #1612 cl_mem malloc changes
2023-05-31 07:07:47 +02:00
0cc4m
49aaf08387
Merge remote-tracking branch 'origin/master' into opencl-dev
2023-05-31 06:58:51 +02:00
Concedo
a5a85d68c6
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# llama.cpp
2023-05-31 10:51:54 +08:00
Concedo
85c9f7df41
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
2023-05-31 10:20:32 +08:00
Concedo
4afa38e744
Revert "opencl : no need to allocate cl_mem on heap ( #1612 )"
...
This reverts commit bb051d9723
.
2023-05-31 10:20:23 +08:00
Henri Vasserman
ffb06a345e
OpenLLaMA 3B support ( #1588 )
...
This adds support to llama.cpp to load the model.
Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing.
Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>
2023-05-30 21:24:22 +03:00
0cc4m
ac6b49ed45
Reduce queueing overhead for contiguous tensors by using single mul kernel call
2023-05-30 18:49:53 +02:00
Concedo
56456797f4
Merge branch 'master' into concedo_experimental
2023-05-30 22:15:58 +08:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API
2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi
2023-05-29 19:30:49 +03:00
Concedo
ea336bfa33
rwkv eos
2023-05-29 22:40:27 +08:00
Concedo
6b3373cb81
revert bad fix
2023-05-29 22:06:12 +08:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts ( Fixes #1585 ) ( #1609 )
...
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Concedo
ef16d09a51
fix for older gcc, updated lite
2023-05-29 18:54:15 +08:00
Concedo
3a73ebe8d2
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/full.Dockerfile
# .devops/main.Dockerfile
# Makefile
2023-05-29 16:47:32 +08:00