CUDA: Quantized matrix matrix multiplication (#2160)

* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
This commit is contained in:
Johannes Gäßler 2023-07-29 23:04:44 +02:00 committed by GitHub
parent 9baf9ef304
commit 11f3ca06b8
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 1295 additions and 324 deletions

File diff suppressed because it is too large Load diff