Commit graph

1976 commits

Author SHA1 Message Date
Jared Van Bortel
e6ce5f21a1 llama : revert unintended whitespace change 2024-01-26 13:10:49 -05:00
Jared Van Bortel
61a5cf88dc kompute : remove unnecessary use_mmap=false 2024-01-26 12:58:50 -05:00
Jared Van Bortel
91654ff042 kompute : fix a -Wstrict-aliasing warning 2024-01-25 17:03:06 -05:00
Jared Van Bortel
bc287047fb kompute : remove unused immintrin.h #include 2024-01-25 16:07:46 -05:00
Jared Van Bortel
3915194232 test-backend-ops : make Falcon test faster with a smaller model 2024-01-25 15:56:42 -05:00
Jared Van Bortel
3fbf0529ef kompute : mark last few failing ops as unsupported 2024-01-25 15:47:43 -05:00
Jared Van Bortel
445a3734b7 kompute : fix basic Q6_K get_rows, 26 -> 24 failures 2024-01-25 15:38:39 -05:00
Jared Van Bortel
de9fba0d39 kompute : fix basic f16 get_rows, 28 -> 26 failures 2024-01-25 15:22:26 -05:00
Jared Van Bortel
11b305082b test-backend-ops : restore softmax tests 2024-01-25 15:05:55 -05:00
Jared Van Bortel
38d1f0c7a0 kompute : fix op_gelu -> Falcon is working on AMDVLK 2024-01-25 15:01:46 -05:00
Jared Van Bortel
6fc99a6e66 test-backend-ops : test larger GELU range 2024-01-25 15:01:46 -05:00
Jared Van Bortel
1849b85473 test-backend-ops : add Falcon test 2024-01-25 13:55:49 -05:00
Jared Van Bortel
f5ac635473 kompute : fix q8_0 mmv, 41 -> 28 failures 2024-01-25 11:27:11 -05:00
Jared Van Bortel
987335ea0a kompute : fix algorithm names 2024-01-25 11:09:18 -05:00
Jared Van Bortel
ec68a9657f test-backend-ops : increase max_nmse_err so Llama passes 2024-01-24 17:31:34 -05:00
Jared Van Bortel
ebb5f7e968 test-backend-ops : test llama with different batch sizes 2024-01-24 16:55:44 -05:00
Jared Van Bortel
df687b10ab kompute : support mask parameter of softmax 2024-01-24 16:51:27 -05:00
Jared Van Bortel
8bd38fe32d test-backend-ops : test mask parameter of ggml_soft_max_ext 2024-01-24 16:28:41 -05:00
Jared Van Bortel
308f279622 kompute : support scale parameter of softmax 2024-01-24 16:17:37 -05:00
Jared Van Bortel
1450966071 test-backend-ops : test scale parameter of ggml_soft_max_ext 2024-01-24 16:17:37 -05:00
Jared Van Bortel
2852902eda test-backend-ops : add llama test 2024-01-24 16:17:29 -05:00
Jared Van Bortel
2b0f642fec fix f16 mmv, 49 -> 41 failures 2024-01-24 13:43:49 -05:00
Jared Van Bortel
1a14099c43 fix q4_0/q4_1 mmv, 65 -> 49 failures 2024-01-24 13:43:48 -05:00
Jared Van Bortel
0787b80db8 kompute : remove broken mulrow kernel -> 1 less test failure 2024-01-24 13:43:48 -05:00
Jared Van Bortel
2755ae3d10 kompute : fix more dispatch ambiguity -> 12 less failures 2024-01-24 13:43:47 -05:00
Jared Van Bortel
08e23fd78c kompute : fix op_mul kernel -> 13 less test failures 2024-01-24 13:43:47 -05:00
Jared Van Bortel
0899adf86e kompute : fix get_rows dispatch -> 4 less failures 2024-01-24 13:43:47 -05:00
Jared Van Bortel
cb9ceff966 minor cleanup 2024-01-24 13:43:46 -05:00
Georgi Gerganov
33e8d6abe1 kompute : fix ggml_add kernel (#5027) 2024-01-24 13:43:46 -05:00
Jared Van Bortel
2f6a279e29 fix supported ops for kompute backend 2024-01-24 13:43:45 -05:00
Jared Van Bortel
07530731ba never try to evaluate an empty command buffer
This fixes the immediate crashes with test-backend-ops - when
evaluatating individual no-ops like OP_VIEW, it tries to submit an empty
command buffer, which crashes RADV and hangs AMDVLK.
2024-01-24 13:43:45 -05:00
Jared Van Bortel
729e1a4cc1 sync op_rope_f16 with recent op_rope_f32 changes 2024-01-24 13:43:45 -05:00
Jared Van Bortel
e9d5223da3 actually fix this assertion 2024-01-24 13:43:44 -05:00
Jared Van Bortel
9431026a84 clean up old backend code 2024-01-24 13:43:44 -05:00
Georgi Gerganov
d6bd471693 kompute : fix rope_f32 and scale ops (#5008) 2024-01-24 13:43:44 -05:00
Jared Van Bortel
76474a7c0d kompute : ignore exceptions in ggml_vk_available_devices (#12)
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
2024-01-24 13:43:43 -05:00
Jared Van Bortel
cad72e1252 add sanity check and fix kompute teardown order 2024-01-24 13:43:43 -05:00
Jared Van Bortel
070919dbf7 attempt to get test-backend-ops working 2024-01-24 13:43:43 -05:00
Jared Van Bortel
5f660dada8 fix assertion failure 2024-01-24 13:43:42 -05:00
Jared Van Bortel
298d6eec09 kompute : initial attempt at ggml-backend v2 support 2024-01-24 13:43:40 -05:00
Jared Van Bortel
7c527eb568 Merge commit 'e7e4df031b' into HEAD 2024-01-24 13:39:17 -05:00
slaren
e7e4df031b
llama : ggml-backend integration (#4766)
* llama : ggml-backend integration

* ggml-backend : add names to buffers

* fix unmap after loading

* batched-bench : add tensor_split param

* llama : check for null tensor_split

* ggml-backend : increase GGML_MAX_BACKENDS

* improve graph splitting, partial fix for --no-kv-offload

* cuda : add ggml-backend split buffer support

* cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)

* ggml : fix null backend dereference (#4807)

* ggml : fix null backend dereference

* ggml : also check ggml_backend_is_cpu

* test-backend-ops : check buffer allocation failures

* llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)

* ggml : fix mul_mat_id work size

* llama : rewrite session kv load/set without graphs

* minor

* llama : only initialize used backends, free backends on context free

* llama : abort ctx if cuda backend init fails

* llama : rewrite lora with ggml-backend and compute on CPU

ggml-ci

* llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer

* opencl : add ggml-backend buffer type

* cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf)

* llama : on Metal, by default offload the full model

ggml-ci

* metal : page align the data ptr (#4854)

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix split buffer free

* address review comments

* llama-bench : add split-mode parameter

* fix whitespace

* opencl : fix double initialization

* server : add --split-mode parameter

* use async copy and compute to improve multi-gpu performance

ggml-ci

* use async memcpys to copy the graph outputs to the CPU

* fix opencl

* use a host buffer for the cpu compute buffer for faster copies to the gpu

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-01-12 20:07:38 +01:00
Georgi Gerganov
584d674be6
llama : remove redundant assert for StableLM (#4901) 2024-01-12 20:54:12 +02:00
Daniel Bevenius
930f907d3e
export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894)
This commit replaces the magic number used in export-lora.cpp with
the one defined in llama.h, which is indirectly included via common.h.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-01-12 19:54:53 +02:00
Zay
e790eef21c
llama.swiftui : update models layout (#4826)
* Updated Models Layout

- Added a models drawer
- Added downloading directly from Hugging Face
- Load custom models from local folder
- Delete models by swiping left

* trimmed trailing white space

* Updated Models Layout
2024-01-12 14:48:00 +02:00
Georgi Gerganov
5537d9d36b
gitignore : imatrix 2024-01-12 14:33:21 +02:00
Johannes Gäßler
1b280c9fff
CUDA: fix softmax compile for old CUDA versions (#4862) 2024-01-12 12:30:41 +01:00
Georgi Gerganov
3cabe80630
llama : fix typo "imp_embd" -> "inp_embd" 2024-01-12 13:11:15 +02:00
howlger
4315a94366
common : streamline the formatting of help (#4890)
* common : streamline the formatting of help

- Separate alternative parameters by a comma

- Do not indent `--version` differently

* Update common/common.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-12 13:05:32 +02:00
Georgi Gerganov
2d00741e12
py : fix lint (#4889) 2024-01-12 13:03:38 +02:00