HIP: force max threads per block to be 1024

Some old compilers still use 256. Explicitly set it to 1024 to get correct
result from ops like ARGMAX and GROUP_NORM.

Related: #10610, #11619
Signed-off-by: fxzjshm <fxzjshm@163.com>
This commit is contained in:
fxzjshm 2025-02-03 22:33:38 +08:00
parent d92cb67e37
commit 59ad593a95
No known key found for this signature in database
GPG key ID: 7638FA33A7259C00

View file

@ -46,6 +46,9 @@ endif()
message(STATUS "HIP and hipBLAS found")
# Workaround old compilers
set(CMAKE_HIP_FLAGS "${CMAKE_HIP_FLAGS} --gpu-max-threads-per-block=1024")
file(GLOB GGML_HEADERS_ROCM "../ggml-cuda/*.cuh")
list(APPEND GGML_HEADERS_ROCM "../../include/ggml-cuda.h")