llama.cpp

Author	SHA1	Message	Date
Concedo	dd4b5c64b8	Merge branch 'master' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-06-04 17:38:22 +08:00
0cc4m	dcb2ed4826	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653 ) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation	2023-06-04 08:12:05 +02:00
Concedo	88919095b5	edit readme	2023-06-04 12:09:49 +08:00
Concedo	c3c05fc33b	further cleanup, refactor renamemode to hordeconfig	2023-06-04 11:57:46 +08:00
Concedo	2868fac676	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # README.md	2023-06-04 11:07:07 +08:00
Concedo	20803c221e	cleaning up some old junk	2023-06-04 11:05:46 +08:00
Concedo	b62279cb39	buf size for starcoder still not good	2023-06-04 00:41:08 +08:00
Henri Vasserman	d8bd0013e8	Add info about CUDA_VISIBLE_DEVICES (#1682 )	2023-06-03 16:35:20 +03:00
Jiří Podivín	b5c85468a3	Docker: change to calling convert.py (#1641 ) Deprecation disclaimer was added to convert-pth-to-ggml.py	2023-06-03 15:11:53 +03:00
Evan Jones	136476e898	Fix prompt cache saving and chat-persistent rollover (#1678 ) * Fix prompt cache saving and chat-persistent rollover (fixes #1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-06-03 07:28:45 -04:00
Concedo	c1b293d31a	fixed MPT ooms	2023-06-03 18:37:13 +08:00
Concedo	8bd9a3a48b	updated readme, improved simple launcher	2023-06-03 17:17:15 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	9839259b63	allow specifying the horde limit as well	2023-06-03 00:55:44 +08:00
Concedo	96b0e536b7	Merge branch 'opencl-dev-concedo' into concedo_experimental	2023-06-02 22:12:14 +08:00
Concedo	59fe16877d	Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.	2023-06-02 22:10:49 +08:00
Concedo	8d0c81e7cc	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-06-02 12:19:59 +08:00
Concedo	144d8a8312	updated lite	2023-06-02 12:19:51 +08:00
0cc4m	24239f0df7	Improve implementation	2023-06-01 18:57:08 +02:00
Concedo	37659d2c4e	allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.	2023-06-01 22:33:50 +08:00
Concedo	49272e3c53	adjusted defaults	2023-06-01 20:03:44 +08:00
0cc4m	457aaf5bad	Reduce code duplication between cuda and opencl branches	2023-06-01 07:33:32 +02:00
Concedo	234270bd83	back to 32 block size, not better	2023-06-01 00:14:22 +08:00
Concedo	446e42a8c6	change dmmv block size	2023-05-31 21:40:12 +08:00
Concedo	077ee4e989	Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612 )"" This reverts commit `4afa38e744`.	2023-05-31 18:00:52 +08:00
Concedo	50c85bea4c	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 17:53:14 +08:00
Concedo	32dada5e5f	updated lite	2023-05-31 17:52:09 +08:00
0cc4m	5e1eecfe12	Adapt to #1612 cl_mem malloc changes	2023-05-31 07:07:47 +02:00
0cc4m	49aaf08387	Merge remote-tracking branch 'origin/master' into opencl-dev	2023-05-31 06:58:51 +02:00
Concedo	a5a85d68c6	Merge branch 'master' into concedo_experimental # Conflicts: # llama.cpp	2023-05-31 10:51:54 +08:00
Concedo	85c9f7df41	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 10:20:32 +08:00
Concedo	4afa38e744	Revert "opencl : no need to allocate cl_mem on heap (#1612 )" This reverts commit `bb051d9723`.	2023-05-31 10:20:23 +08:00
Henri Vasserman	ffb06a345e	OpenLLaMA 3B support (#1588 ) This adds support to llama.cpp to load the model. Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing. Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>	2023-05-30 21:24:22 +03:00
0cc4m	ac6b49ed45	Reduce queueing overhead for contiguous tensors by using single mul kernel call	2023-05-30 18:49:53 +02:00
Concedo	56456797f4	Merge branch 'master' into concedo_experimental	2023-05-30 22:15:58 +08:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
Concedo	ea336bfa33	rwkv eos	2023-05-29 22:40:27 +08:00
Concedo	6b3373cb81	revert bad fix	2023-05-29 22:06:12 +08:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Concedo	ef16d09a51	fix for older gcc, updated lite	2023-05-29 18:54:15 +08:00
Concedo	3a73ebe8d2	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/full.Dockerfile # .devops/main.Dockerfile # Makefile	2023-05-29 16:47:32 +08:00
Concedo	254a9ff12c	Merge commit '`ebc5d0651a`' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-05-29 16:26:24 +08:00
Concedo	30ff1133f5	allow users to rename models for use in horde	2023-05-29 16:01:05 +08:00
Concedo	97b39f875c	fixed fstat64 build error on mac	2023-05-29 15:50:07 +08:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00

1 2 3 4 5 ...

1067 commits