From 82fadbd792f94ea1a85e6d3f792115b82583d3a0 Mon Sep 17 00:00:00 2001 From: matteo serva Date: Sat, 6 Jul 2024 09:39:41 +0200 Subject: [PATCH] adding again the documentation about unified memory --- docs/build.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/build.md b/docs/build.md index cfe42ebbf..b5fb5d012 100644 --- a/docs/build.md +++ b/docs/build.md @@ -178,7 +178,11 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht cmake --build build --config Release ``` -The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance: +The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. + +The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in linux. This allows using system RAM when the GPU VRAM is exhausted. It is useful when the model barely fits in VRAM and inference is causing OOM errors. Should be enabled with `-ngl 99` to avoid sharing memory bandwidth with the CPU. In windows this setting is available in the nvidia control panel as `System Memory Fallback`. + +The following compilation options are also available to tweak performance: | Option | Legal values | Default | Description | |-------------------------------|------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|