llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	18e482a89c	mtl : preparing for merge	2023-06-04 09:27:27 +03:00
Georgi Gerganov	4df2ef3161	mtl : make it work with main example Lots of hacks but at least now it generates text	2023-06-03 09:31:33 +03:00
Georgi Gerganov	2f4e9d19cc	mtl : plug Metal inference into llama.cpp (very quick-n-dirty)	2023-06-02 22:45:34 +03:00
Georgi Gerganov	640a889632	mtl : add save/load vocab to ggml file	2023-06-02 21:00:30 +03:00
Georgi Gerganov	03c2d72867	mtl : simplify implementation	2023-06-02 20:36:26 +03:00
Georgi Gerganov	627605732c	mtl : remove printfs from inner loop	2023-06-02 19:58:08 +03:00
Georgi Gerganov	b088e14a7e	mtl : more threads for rms_norm + better timing	2023-06-02 19:26:58 +03:00
Georgi Gerganov	70c3387726	mtl : fix kernel signature + roll inner loop	2023-06-02 19:11:39 +03:00
Georgi Gerganov	847bbfe9e6	mtl : faster mul_mat_q4_0_f32 kernel	2023-06-02 18:40:25 +03:00
Georgi Gerganov	33671460b0	mtl : fix bug in f16 x f32 mul mat + speed-up computation	2023-06-02 18:23:51 +03:00
Georgi Gerganov	e55f7b0bdb	mtl : add f16 mat x f32 vec multiplication kernel	2023-06-01 23:37:49 +03:00
Georgi Gerganov	f0196a7e7a	mtl : optimize rms_norm and soft_max kernels	2023-06-01 22:51:42 +03:00
Georgi Gerganov	9665429e94	mtl : full GPU inference of the computation graph	2023-06-01 21:50:01 +03:00
Georgi Gerganov	fbd3f6258d	mtl : add non-broadcast mul kernel	2023-06-01 21:40:53 +03:00
Georgi Gerganov	42dca4004c	mtl : add silu kernel	2023-06-01 21:35:11 +03:00
Georgi Gerganov	a0cc3de59a	mtl : add f32 -> f32 cpy kernel	2023-06-01 21:30:33 +03:00
Georgi Gerganov	a266c26de2	mtl : verify V tensor contents	2023-06-01 21:27:24 +03:00
Georgi Gerganov	f67c2d8cab	ggml : update ggml_nbytes() to handle non-contiguous tensors	2023-06-01 21:27:03 +03:00
Georgi Gerganov	17930fbcb7	mtl : fix soft_max kernel	2023-06-01 20:48:24 +03:00
Georgi Gerganov	17a70362a6	mtl : add diag_mask_inf kernel	2023-06-01 20:41:54 +03:00
Georgi Gerganov	0f1c580860	mtl : add scale kernel	2023-06-01 19:52:32 +03:00
Georgi Gerganov	51efb59437	mtl : confirm f16 x f32 attention mul mat	2023-06-01 19:45:36 +03:00
Georgi Gerganov	948fcfde7e	mtl : add cpy kernel + handle view ops	2023-06-01 19:21:28 +03:00
Georgi Gerganov	94ea9e7bfe	ggml : store offset as opt arg for ggml_view_xd() operators	2023-06-01 19:21:08 +03:00
Georgi Gerganov	7ca81e9e65	mtl : add reshape and transpose handling	2023-05-31 23:01:37 +03:00
Georgi Gerganov	1213af76ce	mtl : add rope kernel	2023-05-31 22:28:59 +03:00
Georgi Gerganov	6af6a05663	ggml : fix handling of "view" ops in ggml_graph_import()	2023-05-31 22:28:15 +03:00
Georgi Gerganov	b2fd06c6aa	mtl : working mul_mat q4	2023-05-30 23:06:49 +03:00
Georgi Gerganov	29bec00ba0	mtl : another mul_mat Q4 (still does not work)	2023-05-30 22:31:07 +03:00
Georgi Gerganov	96d005225f	mtl : mul_mat fixes (still wrong)	2023-05-30 22:20:17 +03:00
Georgi Gerganov	2a24994bad	mtl : initial mul_mat Q4 kernel (wrong results)	2023-05-30 22:02:54 +03:00
Georgi Gerganov	64afc0b53a	mtl : add mul kernel + confirm working	2023-05-30 19:15:38 +03:00
Georgi Gerganov	72256ebd2b	mtl : add rms_norm kernel + confirm working	2023-05-30 19:03:04 +03:00
Georgi Gerganov	794704e409	mtl : confirmed get_rows_q4_0 is working correctly	2023-05-30 18:41:45 +03:00
Georgi Gerganov	a8fd9dc128	mtl : initial get_rows_q4_0 kernel	2023-05-29 23:12:19 +03:00
Georgi Gerganov	248a8c3379	mtl : move MSL code into separate file for easy editing	2023-05-29 22:26:40 +03:00
Georgi Gerganov	897d6d8e8f	mtl : export just a small part of the graph for now to make it easier	2023-05-29 21:40:05 +03:00
Georgi Gerganov	a792cbd0fc	mtl : no need for mtl-export tool, add cli arg for main instead	2023-05-29 21:28:59 +03:00
Georgi Gerganov	b23fe8c9c7	mtl : adapt the MNIST example as starter	2023-05-29 21:20:56 +03:00
Georgi Gerganov	98c267fc77	ci : disable temporary	2023-05-29 20:57:24 +03:00
Georgi Gerganov	f85020b19a	mtl : export the LLaMA computation graph	2023-05-29 20:49:24 +03:00
Georgi Gerganov	7552ac5863	ggml : sync cgraph import / export API	2023-05-29 19:31:44 +03:00
Georgi Gerganov	5d1830b99d	ggml : fix bug in ggml_alibi	2023-05-29 19:30:49 +03:00
DannyDaemonic	248367605e	Work around for recalculating logits in cached prompts (Fixes #1585 ) (#1609 ) * Work around for recalculating logits in cached prompts	2023-05-29 05:13:40 -07:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Johannes Gäßler	3b126f654f	LLAMA_DEBUG adds debug symbols (#1617 )	2023-05-28 21:01:02 +02:00
Kerfuffle	1b78ed2081	Only show -ngl option when relevant + other doc/arg handling updates (#1625 ) 1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356	2023-05-28 11:48:57 -06:00
Vladimir Zorin	337aea1139	examples : add --alias option to gpt_params to set use friendly model name (#1614 )	2023-05-28 20:14:24 +03:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00

1 2 3 4 5 ...

647 commits