Basic Vulkan Multi-GPU implementation (#5321)

* Initial Vulkan multi-gpu implementation

Move most global variables into backend context

* Add names to backend device functions

* Add further missing cleanup code

* Reduce code duplication in tensor split layer assignment

* generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h

* Only do device info print in the beginning and initialize one backend for cpu assist

Add missing cleanup code

* Rework backend memory management to make sure devices and buffers get properly allocated and freed

* Rename cpu assist free function

---------

Co-authored-by: slaren <slarengh@gmail.com>

This commit is contained in:

0cc4m

2024-02-07 07:54:50 +01:00

• committed by

GitHub

parent ed0bf32290

commit ee1628bdfe

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

5 changed files with 1587 additions and 1166 deletions

2639

ggml-vulkan.cpp

View file

File diff suppressed because it is too large Load diff

Rows
Columns

Basic Vulkan Multi-GPU implementation (#5321)

2639 ggml-vulkan.cpp View file

2639

ggml-vulkan.cpp

View file