rpc : add command line arg for specifying backend memory

ref: #7293
This commit is contained in:
Radoslav Gerganov 2024-05-15 15:29:07 +03:00
parent dda64fc17c
commit 3b3963c55c
3 changed files with 60 additions and 14 deletions

View file

@ -42,7 +42,7 @@ cmake --build . --config Release
Then, start the `rpc-server` with the backend:
```bash
$ bin/rpc-server 0.0.0.0 50052
$ bin/rpc-server -p 50052
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
@ -53,7 +53,7 @@ Starting RPC server on 0.0.0.0:50052
When using the CUDA backend, you can specify the device with the `CUDA_VISIBLE_DEVICES` environment variable, e.g.:
```bash
$ CUDA_VISIBLE_DEVICES=0 bin/rpc-server 0.0.0.0 50052
$ CUDA_VISIBLE_DEVICES=0 bin/rpc-server -p 50052
```
This way you can run multiple `rpc-server` instances on the same host, each with a different CUDA device.