llama.cpp

Author	SHA1	Message	Date
Concedo	54dc75ce73	Merge branch 'concedo-opencl-dev' into concedo_experimental	2023-06-05 13:31:53 +08:00
Concedo	f6431ded5d	removed flags from the CL pool malloc, apply code tidying suggestions.	2023-06-05 13:31:37 +08:00
Concedo	c27f250b6f	bigger scratch buffer for 3B llama	2023-06-05 13:24:53 +08:00
Concedo	9270056269	fixed compile error in cmake VS	2023-06-05 11:48:04 +08:00
Concedo	b7fb1aa233	removed build info in cmake	2023-06-04 22:34:27 +08:00
Concedo	6f66e4c4a5	updated lite	2023-06-04 22:27:15 +08:00
Concedo	9aa2d8535b	hide gpu input box when dropdown not selected, minor memory fix for neox and gptj	2023-06-04 21:47:17 +08:00
Concedo	1ddbb9acd9	Merge branch 'concedo-opencl-dev' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-06-04 18:07:27 +08:00
Concedo	64e3e74556	change max value size_t to use limits	2023-06-04 18:04:52 +08:00
LostRuins	2b700749e5	Merge branch 'master' into concedo-opencl-dev	2023-06-04 18:00:06 +08:00
Concedo	dd4b5c64b8	Merge branch 'master' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-06-04 17:38:22 +08:00
0cc4m	dcb2ed4826	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653 ) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation	2023-06-04 08:12:05 +02:00
Concedo	88919095b5	edit readme	2023-06-04 12:09:49 +08:00
Concedo	c3c05fc33b	further cleanup, refactor renamemode to hordeconfig	2023-06-04 11:57:46 +08:00
Concedo	2868fac676	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # README.md	2023-06-04 11:07:07 +08:00
Concedo	20803c221e	cleaning up some old junk	2023-06-04 11:05:46 +08:00
Concedo	b62279cb39	buf size for starcoder still not good	2023-06-04 00:41:08 +08:00
Henri Vasserman	d8bd0013e8	Add info about CUDA_VISIBLE_DEVICES (#1682 )	2023-06-03 16:35:20 +03:00
Jiří Podivín	b5c85468a3	Docker: change to calling convert.py (#1641 ) Deprecation disclaimer was added to convert-pth-to-ggml.py	2023-06-03 15:11:53 +03:00
Evan Jones	136476e898	Fix prompt cache saving and chat-persistent rollover (#1678 ) * Fix prompt cache saving and chat-persistent rollover (fixes #1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-06-03 07:28:45 -04:00
Concedo	c1b293d31a	fixed MPT ooms	2023-06-03 18:37:13 +08:00
Concedo	8bd9a3a48b	updated readme, improved simple launcher	2023-06-03 17:17:15 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	9839259b63	allow specifying the horde limit as well	2023-06-03 00:55:44 +08:00
Concedo	96b0e536b7	Merge branch 'opencl-dev-concedo' into concedo_experimental	2023-06-02 22:12:14 +08:00
Concedo	59fe16877d	Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.	2023-06-02 22:10:49 +08:00
Concedo	8d0c81e7cc	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-06-02 12:19:59 +08:00
Concedo	144d8a8312	updated lite	2023-06-02 12:19:51 +08:00
0cc4m	24239f0df7	Improve implementation	2023-06-01 18:57:08 +02:00
Concedo	37659d2c4e	allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.	2023-06-01 22:33:50 +08:00
Concedo	49272e3c53	adjusted defaults	2023-06-01 20:03:44 +08:00
0cc4m	457aaf5bad	Reduce code duplication between cuda and opencl branches	2023-06-01 07:33:32 +02:00
Concedo	234270bd83	back to 32 block size, not better	2023-06-01 00:14:22 +08:00
Concedo	446e42a8c6	change dmmv block size	2023-05-31 21:40:12 +08:00
Concedo	077ee4e989	Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612 )"" This reverts commit `4afa38e744`.	2023-05-31 18:00:52 +08:00
Concedo	50c85bea4c	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 17:53:14 +08:00
Concedo	32dada5e5f	updated lite	2023-05-31 17:52:09 +08:00
0cc4m	5e1eecfe12	Adapt to #1612 cl_mem malloc changes	2023-05-31 07:07:47 +02:00
0cc4m	49aaf08387	Merge remote-tracking branch 'origin/master' into opencl-dev	2023-05-31 06:58:51 +02:00
Concedo	a5a85d68c6	Merge branch 'master' into concedo_experimental # Conflicts: # llama.cpp	2023-05-31 10:51:54 +08:00
Concedo	85c9f7df41	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 10:20:32 +08:00
Concedo	4afa38e744	Revert "opencl : no need to allocate cl_mem on heap (#1612 )" This reverts commit `bb051d9723`.	2023-05-31 10:20:23 +08:00
Henri Vasserman	ffb06a345e	OpenLLaMA 3B support (#1588 ) This adds support to llama.cpp to load the model. Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing. Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>	2023-05-30 21:24:22 +03:00
0cc4m	ac6b49ed45	Reduce queueing overhead for contiguous tensors by using single mul kernel call	2023-05-30 18:49:53 +02:00
Concedo	56456797f4	Merge branch 'master' into concedo_experimental	2023-05-30 22:15:58 +08:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
Concedo	ea336bfa33	rwkv eos	2023-05-29 22:40:27 +08:00
Concedo	6b3373cb81	revert bad fix	2023-05-29 22:06:12 +08:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00

1 2 3 4 5 ...

1077 commits