CUDA: optimize MMQ int8 tensor core performance (#8062)
* CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr
This commit is contained in:
parent
52fc8705a0
commit
9a590c8226
3 changed files with 902 additions and 570 deletions
1412
ggml-cuda/mmq.cuh
1412
ggml-cuda/mmq.cuh
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue