Commit graph

633 commits

Author SHA1 Message Date
Georgi Gerganov
42dca4004c
mtl : add silu kernel 2023-06-01 21:35:11 +03:00
Georgi Gerganov
a0cc3de59a
mtl : add f32 -> f32 cpy kernel 2023-06-01 21:30:33 +03:00
Georgi Gerganov
a266c26de2
mtl : verify V tensor contents 2023-06-01 21:27:24 +03:00
Georgi Gerganov
f67c2d8cab
ggml : update ggml_nbytes() to handle non-contiguous tensors 2023-06-01 21:27:03 +03:00
Georgi Gerganov
17930fbcb7
mtl : fix soft_max kernel 2023-06-01 20:48:24 +03:00
Georgi Gerganov
17a70362a6
mtl : add diag_mask_inf kernel 2023-06-01 20:41:54 +03:00
Georgi Gerganov
0f1c580860
mtl : add scale kernel 2023-06-01 19:52:32 +03:00
Georgi Gerganov
51efb59437
mtl : confirm f16 x f32 attention mul mat 2023-06-01 19:45:36 +03:00
Georgi Gerganov
948fcfde7e
mtl : add cpy kernel + handle view ops 2023-06-01 19:21:28 +03:00
Georgi Gerganov
94ea9e7bfe
ggml : store offset as opt arg for ggml_view_xd() operators 2023-06-01 19:21:08 +03:00
Georgi Gerganov
7ca81e9e65
mtl : add reshape and transpose handling 2023-05-31 23:01:37 +03:00
Georgi Gerganov
1213af76ce
mtl : add rope kernel 2023-05-31 22:28:59 +03:00
Georgi Gerganov
6af6a05663
ggml : fix handling of "view" ops in ggml_graph_import() 2023-05-31 22:28:15 +03:00
Georgi Gerganov
b2fd06c6aa
mtl : working mul_mat q4 2023-05-30 23:06:49 +03:00
Georgi Gerganov
29bec00ba0
mtl : another mul_mat Q4 (still does not work) 2023-05-30 22:31:07 +03:00
Georgi Gerganov
96d005225f
mtl : mul_mat fixes (still wrong) 2023-05-30 22:20:17 +03:00
Georgi Gerganov
2a24994bad
mtl : initial mul_mat Q4 kernel (wrong results) 2023-05-30 22:02:54 +03:00
Georgi Gerganov
64afc0b53a
mtl : add mul kernel + confirm working 2023-05-30 19:15:38 +03:00
Georgi Gerganov
72256ebd2b
mtl : add rms_norm kernel + confirm working 2023-05-30 19:03:04 +03:00
Georgi Gerganov
794704e409
mtl : confirmed get_rows_q4_0 is working correctly 2023-05-30 18:41:45 +03:00
Georgi Gerganov
a8fd9dc128
mtl : initial get_rows_q4_0 kernel 2023-05-29 23:12:19 +03:00
Georgi Gerganov
248a8c3379
mtl : move MSL code into separate file for easy editing 2023-05-29 22:26:40 +03:00
Georgi Gerganov
897d6d8e8f
mtl : export just a small part of the graph for now to make it easier 2023-05-29 21:40:05 +03:00
Georgi Gerganov
a792cbd0fc
mtl : no need for mtl-export tool, add cli arg for main instead 2023-05-29 21:28:59 +03:00
Georgi Gerganov
b23fe8c9c7
mtl : adapt the MNIST example as starter 2023-05-29 21:20:56 +03:00
Georgi Gerganov
98c267fc77
ci : disable temporary 2023-05-29 20:57:24 +03:00
Georgi Gerganov
f85020b19a
mtl : export the LLaMA computation graph 2023-05-29 20:49:24 +03:00
Georgi Gerganov
7552ac5863
ggml : sync cgraph import / export API 2023-05-29 19:31:44 +03:00
Georgi Gerganov
5d1830b99d
ggml : fix bug in ggml_alibi 2023-05-29 19:30:49 +03:00
DannyDaemonic
248367605e
Work around for recalculating logits in cached prompts (Fixes #1585) (#1609)
* Work around for recalculating logits in cached prompts
2023-05-29 05:13:40 -07:00
Jiří Podivín
0e730dd23b
Adding git in container package dependencies (#1621)
Git added to build packages for version information in docker image

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-05-28 21:45:50 -07:00
Johannes Gäßler
3b126f654f
LLAMA_DEBUG adds debug symbols (#1617) 2023-05-28 21:01:02 +02:00
Kerfuffle
1b78ed2081
Only show -ngl option when relevant + other doc/arg handling updates (#1625)
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS)
2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible.
3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation
4. Update `main` and `server` examples documentation to use the new style dash separator argument format
5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility.
6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-28 11:48:57 -06:00
Vladimir Zorin
337aea1139
examples : add --alias option to gpt_params to set use friendly model name (#1614) 2023-05-28 20:14:24 +03:00
Howard Su
bb051d9723
opencl : no need to allocate cl_mem on heap (#1612) 2023-05-28 20:13:36 +03:00
Howard Su
ca74884f66
opencl : use strstr to check if fp16 supported (#1611)
* Use strstr to check if fp16 supported

* Ensure ext_buffer is null terminated
2023-05-28 20:09:56 +03:00
apcameron
a6704643b6
ggml : add support for the RISCV architecture (#1616) 2023-05-27 23:03:25 +03:00
Kerfuffle
0df7d63e5b
Include server in releases + other build system cleanups (#1610)
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.

Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)

Fix issue where `vdot` binary wasn't removed when running `make clean`.

Fix compile warnings in `server` example.

Add `.hpp` files to trigger workflow (the server example has one).
2023-05-27 11:04:14 -06:00
Henri Vasserman
97c9b77c4f
Add documentation about CLBlast (#1604)
Installing, compiling and using.
2023-05-27 18:47:55 +03:00
Henri Vasserman
0ecb1bbbeb
[CI] Fix openblas (#1613)
* Fix OpenBLAS build

* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
2023-05-27 17:24:06 +03:00
Georgi Gerganov
93618031c7
ggml : add ggml_tensor_overhead() 2023-05-27 16:19:56 +03:00
Henri Vasserman
83c54e6da5
[CI] CLBlast: Fix directory name (#1606) 2023-05-27 14:18:25 +02:00
Georgi Gerganov
bdbda1b17a
ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) 2023-05-27 12:23:16 +03:00
Kerfuffle
66874d4fbc
Some improvements to loading the session with --prompt-cache (#1550)
Improvements to loading the session with `--prompt-cache` in the `main` example.

1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt.
2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.
2023-05-25 20:18:01 -06:00
Johannes Gäßler
1fcdcc28b1
cuda : performance optimizations (#1530)
* xor hack

* block y dim

* loop unrolling

* Fixed cmake LLAMA_CUDA_BY option

* Removed hipblas compatibility code

* Define GGML_CUDA_DMMV_BLOCK_Y if not defined

* Fewer iters, more ops per iter

* Renamed DMMV X/Y compilation options
2023-05-26 00:07:29 +03:00
Henri Vasserman
ac7876ac20
Update CLBlast to 1.6.0 (#1580)
* Update CLBlast to 1.6.0
2023-05-24 10:30:09 +03:00
Evan Jones
c31bbe934b
readme : add docs for chat-persistent.sh (#1568)
* readme : add docs for chat-persistent.sh

* Update README.md
2023-05-24 09:24:01 +03:00
Senemu
1359b6aba5
chat-persistent.sh : use bracket expressions in grep (#1564) 2023-05-24 09:16:22 +03:00
Maarten ter Huurne
7d873811f3
Fix handling of "invalid property" when creating OpenCL command queue (#1565)
The `clCreateCommandQueue()` function will return the code
`CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties,
not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23 19:01:15 +03:00
0cc4m
2e6cd4b025
OpenCL Token Generation Acceleration (#1459)
* Move back to C++ for OpenCL

* Refactor OpenCL code to work more like the CUDA code, add missing functions

* Deduplicate dequant kernels

* Add OpenCL compile options

* Use compile args for preprocessing constants

* Restore default platform + device selection by id behavior

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-05-23 00:33:24 +03:00