Commit graph

4134 commits

Author SHA1 Message Date
FirstTimeEZ
9d1ab28aed
Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion 2024-11-18 11:56:39 +13:00
FirstTimeEZ
b75785d5b4
vulkan: change an assertion and minify others 2024-11-18 11:56:33 +13:00
FirstTimeEZ
c100e21a78
Merge branch 'ggerganov:master' into vulkan-assertion 2024-11-18 11:24:17 +13:00
Johannes Gäßler
76e9e58b78
CUDA: fix MMV kernel being used for FP16 src1 (#10357) 2024-11-17 23:20:42 +01:00
FirstTimeEZ
2ed70d8c8d
vulkan: change an assertion and minify others 2024-11-18 11:19:19 +13:00
FirstTimeEZ
b5d5af4cdb
Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion 2024-11-18 11:08:28 +13:00
FirstTimeEZ
4629b76d75
vulkan: change an assertion and minify others 2024-11-18 11:06:38 +13:00
FirstTimeEZ
a1e88f0bcc
Merge branch 'ggerganov:master' into vulkan-assertion 2024-11-18 01:04:05 +13:00
Johannes Gäßler
ce2e59ba10
CMake: fix typo in comment [no ci] (#10360) 2024-11-17 12:59:38 +01:00
Diego Devesa
be5caccef9
llama : only use default buffer types for the KV cache (#10358) 2024-11-17 12:25:45 +01:00
Georgi Gerganov
20a780c7b6
gitignore : ignore local run scripts [no ci] 2024-11-17 13:12:22 +02:00
FirstTimeEZ
b7904dd728
Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion 2024-11-17 23:56:45 +13:00
FirstTimeEZ
855a685cc0
vulkan-assertions 2024-11-17 23:56:38 +13:00
FirstTimeEZ
7345c2cccb
Merge branch 'ggerganov:master' into vulkan-assertion 2024-11-17 23:54:25 +13:00
FirstTimeEZ
3db18a765f
Merge branch 'vulkan-assertion' of https://github.com/FirstTimeEZ/llama.cpp into vulkan-assertion 2024-11-17 23:52:36 +13:00
FirstTimeEZ
281d629380
vulkan: less assertions 2024-11-17 23:52:30 +13:00
Georgi Gerganov
cf32a9b93a
metal : refactor kernel args into structs (#10238)
* metal : add kernel arg structs (wip)

* metal : fattn args

ggml-ci

* metal : cont + avoid potential int overflow [no ci]

* metal : mul mat struct (wip)

* cont : mul mat vec

* cont : pass by reference

* cont : args is first argument

* cont : use char ptr

* cont : shmem style

* cont : thread counters style

* cont : mul mm id

ggml-ci

* cont : int safety + register optimizations

ggml-ci

* metal : GGML_OP_CONCAT

ggml-ci

* metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV

* metal : GGML_OP_REPEAT

* metal : GGML_OP_CPY

* metal : GGML_OP_RMS_NORM

* metal : GGML_OP_NORM

* metal : add TODOs for rest of ops

* ggml : add ggml-metal-impl.h

ggml-ci
2024-11-17 11:23:01 +02:00
FirstTimeEZ
a43178299c
ggml : fix undefined reference to 'getcpu' (#10354)
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-17 10:39:22 +02:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) 2024-11-17 09:09:55 +01:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build (#10320) 2024-11-17 09:06:34 +01:00
Diego Devesa
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve (#9930) 2024-11-17 08:31:17 +02:00
Georgi Gerganov
24203e9dd7 ggml : inttypes.h -> cinttypes (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5d9e59979c ggml : adapt AMX to tensor->grad removal (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
a4200cafad make : add ggml-opt (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
84274a10c3 tests : remove test-grad0 2024-11-17 08:30:29 +02:00
Georgi Gerganov
68fcb4759c ggml : fix compile warnings (#0)
ggml-ci
2024-11-17 08:30:29 +02:00
Johannes Gäßler
8a43e940ab ggml: new optimization interface (ggml/988) 2024-11-17 08:30:29 +02:00
Georgi Gerganov
5c9a8b22b1 scripts : update sync 2024-11-17 08:30:29 +02:00
FirstTimeEZ
0fff7fd798
docs : vulkan build instructions to use git bash mingw64 (#10303) 2024-11-17 00:29:18 +01:00
Johannes Gäßler
4e54be0ec6
llama/ex: remove --logdir argument (#10339) 2024-11-16 23:00:41 +01:00
FirstTimeEZ
7a4ac544f6
Merge branch 'ggerganov:master' into vulkan-assertion 2024-11-17 10:50:46 +13:00
Georgi Gerganov
db4cfd5dbc llamafile : fix include path (#0)
ggml-ci
2024-11-16 20:36:26 +02:00
Georgi Gerganov
8ee0d09ae6 make : auto-determine dependencies (#0) 2024-11-16 20:36:26 +02:00
MaggotHATE
bcdb7a2386
server: (web UI) Add samplers sequence customization (#10255)
* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-11-16 14:26:54 +01:00
Georgi Gerganov
f245cc28d4
scripts : fix missing key in compare-llama-bench.py (#10332) 2024-11-16 10:32:50 +02:00
Jeff Bolz
772703c8ff
vulkan: Optimize some mat-vec mul quant shaders (#10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-16 07:26:57 +01:00
FirstTimeEZ
d283dbcd37 vulkan: change an assertion
src1 can go down the first pipeline as nullptr and src0 only needs to be checked once

this means the assertion is only required to check if the type is GGML_TYPE_F16 and can usually be skipped
2024-11-16 17:17:10 +13:00
FirstTimeEZ
dd3a6ce9f8
vulkan : add cmake preset debug/release (#10306) 2024-11-16 02:59:33 +01:00
Dan Johansson
1e58ee1318
ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) 2024-11-16 01:53:37 +01:00
FirstTimeEZ
89e4caaaf0
llama : save number of parameters and the size in llama_model (#10286)
fixes #10285
2024-11-16 01:42:13 +01:00
Srihari-mcw
74d73dc85c
Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) 2024-11-15 22:27:00 +01:00
Johannes Gäßler
4047be74da
scripts: update compare-llama-bench.py (#10319) 2024-11-15 21:19:03 +01:00
slaren
883d206fbd ggml : fix some build issues 2024-11-15 21:45:32 +02:00
Georgi Gerganov
09ecbcb596 cmake : fix ppc64 check (whisper/0)
ggml-ci
2024-11-15 15:44:06 +02:00
thewh1teagle
3225008973 ggml : vulkan logs (whisper/2547) 2024-11-15 15:44:06 +02:00
Georgi Gerganov
cbf5541a82 sync : ggml 2024-11-15 15:44:06 +02:00
Eve
18429220bd
AVX BF16 and single scale quant optimizations (#10212)
* use 128 bit loads (i've tried 256->128 to death and its slower)

* double accumulator

* avx bf16 vec dot

* +3% q4_0 inference

* +7% tg +5% pp compared to master

* slower f16c version, kep for reference

* 256b version, also slow. i tried :)

* revert f16

* faster with madd

* split to functions

* Q8_0 and IQ4_NL, 5-7% faster

* fix potential overflow (performance reduced)

* 16 bit add for q4_0 only

* merge
2024-11-15 12:47:58 +01:00
R0CKSTAR
f0204a0ec7
ci: build test musa with cmake (#10298)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-15 12:47:25 +01:00
Romain Biessy
57f8355b29
sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) 2024-11-15 13:10:45 +02:00
Xuan Son Nguyen
9901068ac7
server : (web UI) add copy button for code block, fix api key (#10242)
* server : (web ui) add copy btn for code blocks

* fix problem with api key

* use settings-modal-short-input component

* always show copy btn for code snippet
2024-11-15 10:48:49 +01:00