ggml : add RPC backend (#6829)

* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos
This commit is contained in:
Radoslav Gerganov 2024-05-14 14:27:19 +03:00 committed by GitHub
parent 541600201e
commit 5e31828d3e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 1395 additions and 98 deletions

View file

@ -242,6 +242,9 @@ extern "C" {
// proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
const float * tensor_split;
// comma separated list of RPC servers to use for offloading
const char * rpc_servers;
// Called with a progress value between 0.0 and 1.0. Pass NULL to disable.
// If the provided progress_callback returns true, model loading continues.
// If it returns false, model loading is immediately aborted.