llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	f0196a7e7a	mtl : optimize rms_norm and soft_max kernels	2023-06-01 22:51:42 +03:00
Georgi Gerganov	9665429e94	mtl : full GPU inference of the computation graph	2023-06-01 21:50:01 +03:00
Georgi Gerganov	fbd3f6258d	mtl : add non-broadcast mul kernel	2023-06-01 21:40:53 +03:00
Georgi Gerganov	42dca4004c	mtl : add silu kernel	2023-06-01 21:35:11 +03:00
Georgi Gerganov	a0cc3de59a	mtl : add f32 -> f32 cpy kernel	2023-06-01 21:30:33 +03:00
Georgi Gerganov	a266c26de2	mtl : verify V tensor contents	2023-06-01 21:27:24 +03:00
Georgi Gerganov	f67c2d8cab	ggml : update ggml_nbytes() to handle non-contiguous tensors	2023-06-01 21:27:03 +03:00
Georgi Gerganov	17930fbcb7	mtl : fix soft_max kernel	2023-06-01 20:48:24 +03:00
Georgi Gerganov	17a70362a6	mtl : add diag_mask_inf kernel	2023-06-01 20:41:54 +03:00
Georgi Gerganov	0f1c580860	mtl : add scale kernel	2023-06-01 19:52:32 +03:00
Georgi Gerganov	51efb59437	mtl : confirm f16 x f32 attention mul mat	2023-06-01 19:45:36 +03:00
Georgi Gerganov	948fcfde7e	mtl : add cpy kernel + handle view ops	2023-06-01 19:21:28 +03:00
Georgi Gerganov	94ea9e7bfe	ggml : store offset as opt arg for ggml_view_xd() operators	2023-06-01 19:21:08 +03:00
Georgi Gerganov	7ca81e9e65	mtl : add reshape and transpose handling	2023-05-31 23:01:37 +03:00
Georgi Gerganov	1213af76ce	mtl : add rope kernel	2023-05-31 22:28:59 +03:00
Georgi Gerganov	6af6a05663	ggml : fix handling of "view" ops in ggml_graph_import()	2023-05-31 22:28:15 +03:00
Georgi Gerganov	b2fd06c6aa	mtl : working mul_mat q4	2023-05-30 23:06:49 +03:00
Georgi Gerganov	29bec00ba0	mtl : another mul_mat Q4 (still does not work)	2023-05-30 22:31:07 +03:00
Georgi Gerganov	96d005225f	mtl : mul_mat fixes (still wrong)	2023-05-30 22:20:17 +03:00
Georgi Gerganov	2a24994bad	mtl : initial mul_mat Q4 kernel (wrong results)	2023-05-30 22:02:54 +03:00
Georgi Gerganov	64afc0b53a	mtl : add mul kernel + confirm working	2023-05-30 19:15:38 +03:00
Georgi Gerganov	72256ebd2b	mtl : add rms_norm kernel + confirm working	2023-05-30 19:03:04 +03:00
Georgi Gerganov	794704e409	mtl : confirmed get_rows_q4_0 is working correctly	2023-05-30 18:41:45 +03:00
Georgi Gerganov	a8fd9dc128	mtl : initial get_rows_q4_0 kernel	2023-05-29 23:12:19 +03:00
Georgi Gerganov	248a8c3379	mtl : move MSL code into separate file for easy editing	2023-05-29 22:26:40 +03:00
Georgi Gerganov	897d6d8e8f	mtl : export just a small part of the graph for now to make it easier	2023-05-29 21:40:05 +03:00
Georgi Gerganov	a792cbd0fc	mtl : no need for mtl-export tool, add cli arg for main instead	2023-05-29 21:28:59 +03:00
Georgi Gerganov	b23fe8c9c7	mtl : adapt the MNIST example as starter	2023-05-29 21:20:56 +03:00
Georgi Gerganov	98c267fc77	ci : disable temporary	2023-05-29 20:57:24 +03:00
Georgi Gerganov	f85020b19a	mtl : export the LLaMA computation graph	2023-05-29 20:49:24 +03:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
apcameron	a6704643b6	ggml : add support for the RISCV architecture (#1616 )	2023-05-27 23:03:25 +03:00
Kerfuffle	0df7d63e5b	Include server in releases + other build system cleanups (#1610 ) Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases. Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default) Fix issue where `vdot` binary wasn't removed when running `make clean`. Fix compile warnings in `server` example. Add `.hpp` files to trigger workflow (the server example has one).	2023-05-27 11:04:14 -06:00
Henri Vasserman	97c9b77c4f	Add documentation about CLBlast (#1604 ) Installing, compiling and using.	2023-05-27 18:47:55 +03:00
Henri Vasserman	0ecb1bbbeb	[CI] Fix openblas (#1613 ) * Fix OpenBLAS build * Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.	2023-05-27 17:24:06 +03:00
Georgi Gerganov	93618031c7	ggml : add ggml_tensor_overhead()	2023-05-27 16:19:56 +03:00
Henri Vasserman	83c54e6da5	[CI] CLBlast: Fix directory name (#1606 )	2023-05-27 14:18:25 +02:00
Georgi Gerganov	bdbda1b17a	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	2023-05-27 12:23:16 +03:00
Kerfuffle	66874d4fbc	Some improvements to loading the session with --prompt-cache (#1550 ) Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.	2023-05-25 20:18:01 -06:00
Johannes Gäßler	1fcdcc28b1	cuda : performance optimizations (#1530 ) * xor hack * block y dim * loop unrolling * Fixed cmake LLAMA_CUDA_BY option * Removed hipblas compatibility code * Define GGML_CUDA_DMMV_BLOCK_Y if not defined * Fewer iters, more ops per iter * Renamed DMMV X/Y compilation options	2023-05-26 00:07:29 +03:00
Henri Vasserman	ac7876ac20	Update CLBlast to 1.6.0 (#1580 ) * Update CLBlast to 1.6.0	2023-05-24 10:30:09 +03:00
Evan Jones	c31bbe934b	readme : add docs for chat-persistent.sh (#1568 ) * readme : add docs for chat-persistent.sh * Update README.md	2023-05-24 09:24:01 +03:00

1 2 3 4 5 ...

636 commits