Georgi Gerganov
b438ff7e7c
metal : GGML_OP_RMS_NORM
2024-11-17 09:51:06 +02:00
Georgi Gerganov
2b86f84839
metal : GGML_OP_CPY
2024-11-17 09:51:06 +02:00
Georgi Gerganov
d7488ba09c
metal : GGML_OP_REPEAT
2024-11-17 09:51:06 +02:00
Georgi Gerganov
281fa05e83
metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV
2024-11-17 09:51:06 +02:00
Georgi Gerganov
4c1c7213e2
metal : GGML_OP_CONCAT
...
ggml-ci
2024-11-17 09:51:06 +02:00
Georgi Gerganov
1a8f8df35d
cont : int safety + register optimizations
...
ggml-ci
2024-11-17 09:51:06 +02:00
Georgi Gerganov
ec18f96891
cont : mul mm id
...
ggml-ci
2024-11-17 09:51:06 +02:00
Georgi Gerganov
cd89d1a877
cont : thread counters style
2024-11-17 09:51:05 +02:00
Georgi Gerganov
f759814c66
cont : shmem style
2024-11-17 09:51:05 +02:00
Georgi Gerganov
d2a055059e
cont : use char ptr
2024-11-17 09:51:05 +02:00
Georgi Gerganov
481b05df22
cont : args is first argument
2024-11-17 09:51:05 +02:00
Georgi Gerganov
4af3a87962
cont : pass by reference
2024-11-17 09:51:05 +02:00
Georgi Gerganov
07bc7610ad
cont : mul mat vec
2024-11-17 09:51:05 +02:00
Georgi Gerganov
0d0c54fc5a
metal : mul mat struct (wip)
2024-11-17 09:51:05 +02:00
Georgi Gerganov
cbae088721
metal : cont + avoid potential int overflow [no ci]
2024-11-17 09:51:05 +02:00
Georgi Gerganov
362a3f3433
metal : fattn args
...
ggml-ci
2024-11-17 09:51:04 +02:00
Georgi Gerganov
051ff11140
metal : add kernel arg structs (wip)
2024-11-17 09:51:02 +02:00
Diego Devesa
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve ( #9930 )
2024-11-17 08:31:17 +02:00
Georgi Gerganov
24203e9dd7
ggml : inttypes.h -> cinttypes ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5d9e59979c
ggml : adapt AMX to tensor->grad removal ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
a4200cafad
make : add ggml-opt ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
84274a10c3
tests : remove test-grad0
2024-11-17 08:30:29 +02:00
Georgi Gerganov
68fcb4759c
ggml : fix compile warnings ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5c9a8b22b1
scripts : update sync
2024-11-17 08:30:29 +02:00
FirstTimeEZ
0fff7fd798
docs : vulkan build instructions to use git bash mingw64 ( #10303 )
2024-11-17 00:29:18 +01:00
Johannes Gäßler
4e54be0ec6
llama/ex: remove --logdir argument ( #10339 )
2024-11-16 23:00:41 +01:00
Georgi Gerganov
db4cfd5dbc
llamafile : fix include path ( #0 )
...
ggml-ci
2024-11-16 20:36:26 +02:00
Georgi Gerganov
8ee0d09ae6
make : auto-determine dependencies ( #0 )
2024-11-16 20:36:26 +02:00
MaggotHATE
bcdb7a2386
server: (web UI) Add samplers sequence customization ( #10255 )
...
* Samplers sequence: simplified and input field.
* Removed unused function
* Modify and use `settings-modal-short-input`
* rename "name" --> "label"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-11-16 14:26:54 +01:00
Georgi Gerganov
f245cc28d4
scripts : fix missing key in compare-llama-bench.py ( #10332 )
2024-11-16 10:32:50 +02:00
Jeff Bolz
772703c8ff
vulkan: Optimize some mat-vec mul quant shaders ( #10296 )
...
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.
Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.
Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-16 07:26:57 +01:00
FirstTimeEZ
dd3a6ce9f8
vulkan : add cmake preset debug/release ( #10306 )
2024-11-16 02:59:33 +01:00
Dan Johansson
1e58ee1318
ggml : optimize Q4_0 into Q4_0_X_Y repack ( #10324 )
2024-11-16 01:53:37 +01:00
FirstTimeEZ
89e4caaaf0
llama : save number of parameters and the size in llama_model ( #10286 )
...
fixes #10285
2024-11-16 01:42:13 +01:00
Srihari-mcw
74d73dc85c
Make updates to fix issues with clang-cl builds while using AVX512 flags ( #10314 )
2024-11-15 22:27:00 +01:00
Johannes Gäßler
4047be74da
scripts: update compare-llama-bench.py ( #10319 )
2024-11-15 21:19:03 +01:00
slaren
883d206fbd
ggml : fix some build issues
2024-11-15 21:45:32 +02:00
Georgi Gerganov
09ecbcb596
cmake : fix ppc64 check (whisper/0)
...
ggml-ci
2024-11-15 15:44:06 +02:00
thewh1teagle
3225008973
ggml : vulkan logs (whisper/2547)
2024-11-15 15:44:06 +02:00
Georgi Gerganov
cbf5541a82
sync : ggml
2024-11-15 15:44:06 +02:00
Eve
18429220bd
AVX BF16 and single scale quant optimizations ( #10212 )
...
* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge
2024-11-15 12:47:58 +01:00
R0CKSTAR
f0204a0ec7
ci: build test musa with cmake ( #10298 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-15 12:47:25 +01:00
Romain Biessy
57f8355b29
sycl: Update Intel docker images to use DPC++ 2025.0 ( #10305 )
2024-11-15 13:10:45 +02:00
Xuan Son Nguyen
9901068ac7
server : (web UI) add copy button for code block, fix api key ( #10242 )
...
* server : (web ui) add copy btn for code blocks
* fix problem with api key
* use settings-modal-short-input component
* always show copy btn for code snippet
2024-11-15 10:48:49 +01:00
Chenguang Li
231f9360d9
cann: dockerfile and doc adjustment ( #10302 )
...
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-15 15:09:35 +08:00
Georgi Gerganov
4802ad350b
scripts : fix regex in sync [no ci]
2024-11-15 08:38:43 +02:00
Romain Biessy
5a54af4d4f
sycl: Use syclcompat::dp4a ( #10267 )
...
* sycl: Use syclcompat::dp4a
* Using the syclcompat version allow the compiler to optimize the
operation with native function
* Update news section
* Update CI Windows oneAPI version to 2025.0
* Reword doc
* Call syclcompat::dp4a inside dpct::dp4a
This reverts commit 90cb61d692
.
2024-11-15 11:09:12 +08:00
Charles Xu
1607a5e5b0
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels ( #9921 )
...
* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-11-15 01:28:50 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00