llama.cpp

Author	SHA1	Message	Date
Concedo	8bd9a3a48b	updated readme, improved simple launcher	2023-06-03 17:17:15 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	9839259b63	allow specifying the horde limit as well	2023-06-03 00:55:44 +08:00
Concedo	96b0e536b7	Merge branch 'opencl-dev-concedo' into concedo_experimental	2023-06-02 22:12:14 +08:00
Concedo	59fe16877d	Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.	2023-06-02 22:10:49 +08:00
Concedo	8d0c81e7cc	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-06-02 12:19:59 +08:00
Concedo	144d8a8312	updated lite	2023-06-02 12:19:51 +08:00
0cc4m	24239f0df7	Improve implementation	2023-06-01 18:57:08 +02:00
Concedo	37659d2c4e	allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.	2023-06-01 22:33:50 +08:00
Concedo	49272e3c53	adjusted defaults	2023-06-01 20:03:44 +08:00
0cc4m	457aaf5bad	Reduce code duplication between cuda and opencl branches	2023-06-01 07:33:32 +02:00
Concedo	234270bd83	back to 32 block size, not better	2023-06-01 00:14:22 +08:00
Concedo	446e42a8c6	change dmmv block size	2023-05-31 21:40:12 +08:00
Concedo	077ee4e989	Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612 )"" This reverts commit `4afa38e744`.	2023-05-31 18:00:52 +08:00
Concedo	50c85bea4c	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 17:53:14 +08:00
Concedo	32dada5e5f	updated lite	2023-05-31 17:52:09 +08:00
0cc4m	5e1eecfe12	Adapt to #1612 cl_mem malloc changes	2023-05-31 07:07:47 +02:00
0cc4m	49aaf08387	Merge remote-tracking branch 'origin/master' into opencl-dev	2023-05-31 06:58:51 +02:00
Concedo	a5a85d68c6	Merge branch 'master' into concedo_experimental # Conflicts: # llama.cpp	2023-05-31 10:51:54 +08:00
Concedo	85c9f7df41	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 10:20:32 +08:00
Concedo	4afa38e744	Revert "opencl : no need to allocate cl_mem on heap (#1612 )" This reverts commit `bb051d9723`.	2023-05-31 10:20:23 +08:00
Henri Vasserman	ffb06a345e	OpenLLaMA 3B support (#1588 ) This adds support to llama.cpp to load the model. Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing. Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>	2023-05-30 21:24:22 +03:00
0cc4m	ac6b49ed45	Reduce queueing overhead for contiguous tensors by using single mul kernel call	2023-05-30 18:49:53 +02:00
Concedo	56456797f4	Merge branch 'master' into concedo_experimental	2023-05-30 22:15:58 +08:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
Concedo	ea336bfa33	rwkv eos	2023-05-29 22:40:27 +08:00
Concedo	6b3373cb81	revert bad fix	2023-05-29 22:06:12 +08:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Concedo	ef16d09a51	fix for older gcc, updated lite	2023-05-29 18:54:15 +08:00
Concedo	3a73ebe8d2	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/full.Dockerfile # .devops/main.Dockerfile # Makefile	2023-05-29 16:47:32 +08:00
Concedo	254a9ff12c	Merge commit '`ebc5d0651a`' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-05-29 16:26:24 +08:00
Concedo	30ff1133f5	allow users to rename models for use in horde	2023-05-29 16:01:05 +08:00
Concedo	97b39f875c	fixed fstat64 build error on mac	2023-05-29 15:50:07 +08:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
Concedo	28f1196f65	adjust default rep pen range	2023-05-28 19:36:21 +08:00
Concedo	7d159bacd7	updated kobold lite	2023-05-28 11:23:20 +08:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Concedo	dcc426e2de	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md	2023-05-28 01:08:39 +08:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Concedo	5d9f5b28a6	rwkv integration completed	2023-05-28 00:48:56 +08:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Concedo	55e0fbf024	wip integrating new rwkv	2023-05-27 22:45:28 +08:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00

1 2 3 4 5 ...

1056 commits