CUDA: optimize MMQ int8 tensor core performance (#8062)

* CUDA: optimize MMQ int8 tensor core performance

* only a single get_mma_tile_x_k function

* simplify code, make functions constexpr
This commit is contained in:
Johannes Gäßler 2024-06-24 12:41:23 +02:00 committed by GitHub
parent 52fc8705a0
commit 9a590c8226
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 902 additions and 570 deletions

File diff suppressed because it is too large Load diff