llama.cpp

Author	SHA1	Message	Date
Concedo	32dada5e5f	updated lite	2023-05-31 17:52:09 +08:00
Concedo	a5a85d68c6	Merge branch 'master' into concedo_experimental # Conflicts: # llama.cpp	2023-05-31 10:51:54 +08:00
Concedo	85c9f7df41	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 10:20:32 +08:00
Concedo	4afa38e744	Revert "opencl : no need to allocate cl_mem on heap (#1612 )" This reverts commit `bb051d9723`.	2023-05-31 10:20:23 +08:00
Henri Vasserman	ffb06a345e	OpenLLaMA 3B support (#1588 ) This adds support to llama.cpp to load the model. Currently missing are changes that are required from convert.py to convert the model correctly. It needs some changes to start reading the JSON configuration for HF models instead of deriving the values by guessing. Co-authored-by: FNsi <125447286+FNsi@users.noreply.github.com>	2023-05-30 21:24:22 +03:00
0cc4m	ac6b49ed45	Reduce queueing overhead for contiguous tensors by using single mul kernel call	2023-05-30 18:49:53 +02:00
Concedo	56456797f4	Merge branch 'master' into concedo_experimental	2023-05-30 22:15:58 +08:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
Concedo	ea336bfa33	rwkv eos	2023-05-29 22:40:27 +08:00
Concedo	6b3373cb81	revert bad fix	2023-05-29 22:06:12 +08:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Concedo	ef16d09a51	fix for older gcc, updated lite	2023-05-29 18:54:15 +08:00
Concedo	3a73ebe8d2	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/full.Dockerfile # .devops/main.Dockerfile # Makefile	2023-05-29 16:47:32 +08:00
Concedo	254a9ff12c	Merge commit '`ebc5d0651a`' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-05-29 16:26:24 +08:00
Concedo	30ff1133f5	allow users to rename models for use in horde	2023-05-29 16:01:05 +08:00
Concedo	97b39f875c	fixed fstat64 build error on mac	2023-05-29 15:50:07 +08:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
Concedo	28f1196f65	adjust default rep pen range	2023-05-28 19:36:21 +08:00
Concedo	7d159bacd7	updated kobold lite	2023-05-28 11:23:20 +08:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Concedo	dcc426e2de	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md	2023-05-28 01:08:39 +08:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Concedo	5d9f5b28a6	rwkv integration completed	2023-05-28 00:48:56 +08:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Concedo	55e0fbf024	wip integrating new rwkv	2023-05-27 22:45:28 +08:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00
Henri Vasserman	83c54e6da5	[CI] CLBlast: Fix directory name (#1606 )	2023-05-27 14:18:25 +02:00
Concedo	fe63bfdb0f	Revert "allow 2048 blasbatchsize" This reverts commit `94dc5c2324`.	2023-05-27 18:13:27 +08:00
0cc4m	97c5cca4e5	OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel	2023-05-27 12:00:56 +02:00
Concedo	94dc5c2324	allow 2048 blasbatchsize	2023-05-27 17:47:18 +08:00
Concedo	92a0d77712	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-05-27 17:44:14 +08:00
Concedo	abfdfb702e	added top_a sampler	2023-05-27 17:32:37 +08:00
Georgi Gerganov	bdbda1b17a	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	2023-05-27 12:23:16 +03:00
0cc4m	ebc5d0651a	Use events instead of clFinish, where possible	2023-05-27 10:03:35 +02:00
Concedo	01a0f206df	added support for starcoder, which is basically gpt2	2023-05-27 13:35:40 +08:00
Concedo	6d7749c98f	no difference	2023-05-27 12:42:19 +08:00
Concedo	bd4fe936f5	cleanup sampling code	2023-05-27 11:58:39 +08:00
Concedo	3c8f404243	integrated token probability viewer in debugmode	2023-05-26 16:40:26 +08:00
Kerfuffle	66874d4fbc	Some improvements to loading the session with --prompt-cache (#1550 ) Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.	2023-05-25 20:18:01 -06:00
Johannes Gäßler	1fcdcc28b1	cuda : performance optimizations (#1530 ) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options	2023-05-26 00:07:29 +03:00
Concedo	8b8f2f4cf5	up ver to 1.25.1	2023-05-25 14:49:30 +08:00
Concedo	e6eeb234f1	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md	2023-05-25 10:34:43 +08:00
Concedo	d2da155661	upgraded clblast	2023-05-25 10:18:12 +08:00

1 2 3 4 5 ...

1039 commits