ggml : add RPC backend (#6829)

* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos
This commit is contained in:
Radoslav Gerganov 2024-05-14 14:27:19 +03:00 committed by GitHub
parent 541600201e
commit 5e31828d3e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 1395 additions and 98 deletions

View file

@ -340,6 +340,36 @@ jobs:
cd build
ctest -L main --verbose
ubuntu-latest-cmake-rpc:
runs-on: ubuntu-latest
continue-on-error: true
steps:
- name: Clone
id: checkout
uses: actions/checkout@v4
- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential
- name: Build
id: cmake_build
run: |
mkdir build
cd build
cmake -DLLAMA_RPC=ON ..
cmake --build . --config Release -j $(nproc)
- name: Test
id: cmake_test
run: |
cd build
ctest -L main --verbose
ubuntu-22-cmake-vulkan:
runs-on: ubuntu-22.04
@ -663,6 +693,8 @@ jobs:
strategy:
matrix:
include:
- build: 'rpc'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_RPC=ON -DBUILD_SHARED_LIBS=ON'
- build: 'noavx'
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DBUILD_SHARED_LIBS=ON'
- build: 'avx2'