vulkan: Optimize binary ops (#10270)

Reuse the index calculations across all of src0/src1/dst. Add a shader
variant for when src0/src1 are the same dimensions and additional modulus
for src1 aren't needed. Div/mod are slow, so add "fast" div/mod that
have a fast path when the calculation isn't needed or can be done more
cheaply.
This commit is contained in:
Jeff Bolz 2024-11-13 23:22:55 -06:00 committed by GitHub
parent 66798e42fb
commit af148c9386
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 117 additions and 52 deletions

View file

@ -3,6 +3,8 @@
#include "types.comp"
#include "generic_binary_head.comp"
layout(local_size_x = 512, local_size_y = 1, local_size_z = 1) in;
void main() {
const uint idx = gl_GlobalInvocationID.z * 262144 + gl_GlobalInvocationID.y * 512 + gl_GlobalInvocationID.x;
const int dim = p.param3;