* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too