CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds
This commit is contained in:
parent
9baf9ef304
commit
11f3ca06b8
4 changed files with 1295 additions and 324 deletions
1590
ggml-cuda.cu
1590
ggml-cuda.cu
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue