CUDA: Quantized matrix matrix multiplication (#2160)

* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds

This commit is contained in:

Johannes Gäßler

2023-07-29 23:04:44 +02:00

• committed by

GitHub

parent 9baf9ef304

commit 11f3ca06b8

No known key found for this signature in database

GPG key ID: 4AEE18F83AFDEB23

4 changed files with 1295 additions and 324 deletions

1590

ggml-cuda.cu

View file

File diff suppressed because it is too large Load diff

Rows
Columns

CUDA: Quantized matrix matrix multiplication (#2160)

1590 ggml-cuda.cu View file

1590

ggml-cuda.cu

View file