Georgi Gerganov
18e482a89c
mtl : preparing for merge
2023-06-04 09:27:27 +03:00
Georgi Gerganov
4df2ef3161
mtl : make it work with main example
...
Lots of hacks but at least now it generates text
2023-06-03 09:31:33 +03:00
Georgi Gerganov
2f4e9d19cc
mtl : plug Metal inference into llama.cpp (very quick-n-dirty)
2023-06-02 22:45:34 +03:00
Georgi Gerganov
640a889632
mtl : add save/load vocab to ggml file
2023-06-02 21:00:30 +03:00
Georgi Gerganov
03c2d72867
mtl : simplify implementation
2023-06-02 20:36:26 +03:00
Georgi Gerganov
627605732c
mtl : remove printfs from inner loop
2023-06-02 19:58:08 +03:00
Georgi Gerganov
b088e14a7e
mtl : more threads for rms_norm + better timing
2023-06-02 19:26:58 +03:00
Georgi Gerganov
70c3387726
mtl : fix kernel signature + roll inner loop
2023-06-02 19:11:39 +03:00
Georgi Gerganov
847bbfe9e6
mtl : faster mul_mat_q4_0_f32 kernel
2023-06-02 18:40:25 +03:00
Georgi Gerganov
33671460b0
mtl : fix bug in f16 x f32 mul mat + speed-up computation
2023-06-02 18:23:51 +03:00
Georgi Gerganov
e55f7b0bdb
mtl : add f16 mat x f32 vec multiplication kernel
2023-06-01 23:37:49 +03:00
Georgi Gerganov
f0196a7e7a
mtl : optimize rms_norm and soft_max kernels
2023-06-01 22:51:42 +03:00
Georgi Gerganov
9665429e94
mtl : full GPU inference of the computation graph
2023-06-01 21:50:01 +03:00
Georgi Gerganov
fbd3f6258d
mtl : add non-broadcast mul kernel
2023-06-01 21:40:53 +03:00
Georgi Gerganov
42dca4004c
mtl : add silu kernel
2023-06-01 21:35:11 +03:00
Georgi Gerganov
a0cc3de59a
mtl : add f32 -> f32 cpy kernel
2023-06-01 21:30:33 +03:00
Georgi Gerganov
a266c26de2
mtl : verify V tensor contents
2023-06-01 21:27:24 +03:00
Georgi Gerganov
f67c2d8cab
ggml : update ggml_nbytes() to handle non-contiguous tensors
2023-06-01 21:27:03 +03:00
Georgi Gerganov
17930fbcb7
mtl : fix soft_max kernel
2023-06-01 20:48:24 +03:00
Georgi Gerganov
17a70362a6
mtl : add diag_mask_inf kernel
2023-06-01 20:41:54 +03:00
Georgi Gerganov
0f1c580860
mtl : add scale kernel
2023-06-01 19:52:32 +03:00
Georgi Gerganov
51efb59437
mtl : confirm f16 x f32 attention mul mat
2023-06-01 19:45:36 +03:00
Georgi Gerganov
948fcfde7e
mtl : add cpy kernel + handle view ops
2023-06-01 19:21:28 +03:00
Georgi Gerganov
94ea9e7bfe
ggml : store offset as opt arg for ggml_view_xd() operators
2023-06-01 19:21:08 +03:00
Georgi Gerganov
7ca81e9e65
mtl : add reshape and transpose handling
2023-05-31 23:01:37 +03:00
Georgi Gerganov
1213af76ce
mtl : add rope kernel
2023-05-31 22:28:59 +03:00
Georgi Gerganov
6af6a05663
ggml : fix handling of "view" ops in ggml_graph_import()
2023-05-31 22:28:15 +03:00
Georgi Gerganov
b2fd06c6aa
mtl : working mul_mat q4
2023-05-30 23:06:49 +03:00
Georgi Gerganov
29bec00ba0
mtl : another mul_mat Q4 (still does not work)
2023-05-30 22:31:07 +03:00
Georgi Gerganov
96d005225f
mtl : mul_mat fixes (still wrong)
2023-05-30 22:20:17 +03:00
Georgi Gerganov
2a24994bad
mtl : initial mul_mat Q4 kernel (wrong results)
2023-05-30 22:02:54 +03:00
Georgi Gerganov
64afc0b53a
mtl : add mul kernel + confirm working
2023-05-30 19:15:38 +03:00
Georgi Gerganov
72256ebd2b
mtl : add rms_norm kernel + confirm working
2023-05-30 19:03:04 +03:00
Georgi Gerganov
794704e409
mtl : confirmed get_rows_q4_0 is working correctly
2023-05-30 18:41:45 +03:00
Georgi Gerganov
a8fd9dc128
mtl : initial get_rows_q4_0 kernel
2023-05-29 23:12:19 +03:00
Georgi Gerganov
248a8c3379
mtl : move MSL code into separate file for easy editing
2023-05-29 22:26:40 +03:00
Georgi Gerganov
897d6d8e8f
mtl : export just a small part of the graph for now to make it easier
2023-05-29 21:40:05 +03:00
Georgi Gerganov
a792cbd0fc
mtl : no need for mtl-export tool, add cli arg for main instead
2023-05-29 21:28:59 +03:00
Georgi Gerganov
b23fe8c9c7
mtl : adapt the MNIST example as starter
2023-05-29 21:20:56 +03:00
Georgi Gerganov
98c267fc77
ci : disable temporary
2023-05-29 20:57:24 +03:00
Georgi Gerganov
f85020b19a
mtl : export the LLaMA computation graph
2023-05-29 20:49:24 +03:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API
2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi
2023-05-29 19:30:49 +03:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts ( Fixes #1585 ) ( #1609 )
...
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies ( #1621 )
...
Git added to build packages for version information in docker image
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols ( #1617 )
2023-05-28 21:01:02 +02:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates ( #1625 )
...
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name ( #1614 )
2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap ( #1612 )
2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported ( #1611 )
...
* Use strstr to check if fp16 supported
* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00