cuda : fix vmm pool with multi GPU (#4620)

* cuda : fix vmm pool with multi GPU

* hip

* use recommended granularity instead of minimum

* better error checking

* fix mixtral

* use cudaMemcpy3DPeerAsync

* use cuda_pool_alloc in ggml_cuda_op_mul_mat

* consolidate error checking in ggml_cuda_set_device

* remove unnecessary inlines

ggml-ci

* style fixes

* only use vmm for the main device

* fix scratch buffer size, re-enable vmm pool for all devices

* remove unnecessary check id != g_main_device

This commit is contained in:

slaren

2023-12-26 21:23:59 +01:00

• committed by

GitHub

parent de8e496437

commit dc68f0054c

No known key found for this signature in database

GPG key ID: 4AEE18F83AFDEB23

3 changed files with 243 additions and 246 deletions

483

ggml-cuda.cu

View file

File diff suppressed because it is too large Load diff

Rows
Columns

cuda : fix vmm pool with multi GPU (#4620)

483 ggml-cuda.cu View file

483

ggml-cuda.cu

View file