Add information for Podman as well as Docker
We believe Podman is a viable alternative to Docker. Lots of people have moved to Podman, and the project should make sure people adopt it. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This commit is contained in:
parent
106045e7bb
commit
df11fb7033
3 changed files with 51 additions and 37 deletions
|
@ -242,7 +242,7 @@ The project also includes many example programs and tools using the `llama` libr
|
||||||
|
|
||||||
- Clone this repository and build locally, see [how to build](docs/build.md)
|
- Clone this repository and build locally, see [how to build](docs/build.md)
|
||||||
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
|
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
|
||||||
- Use a Docker image, see [documentation for Docker](docs/docker.md)
|
- Use a container image (Docker/Podman), see [documentation for containers](docs/container.md)
|
||||||
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
|
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
|
||||||
|
|
||||||
## Obtaining and quantizing models
|
## Obtaining and quantizing models
|
||||||
|
@ -500,7 +500,7 @@ To learn more about model quantization, [read this documentation](examples/quant
|
||||||
#### Development documentation
|
#### Development documentation
|
||||||
|
|
||||||
- [How to build](docs/build.md)
|
- [How to build](docs/build.md)
|
||||||
- [Running on Docker](docs/docker.md)
|
- [Running in a container](docs/container.md)
|
||||||
- [Build on Android](docs/android.md)
|
- [Build on Android](docs/android.md)
|
||||||
- [Performance troubleshooting](docs/development/token_generation_performance_tips.md)
|
- [Performance troubleshooting](docs/development/token_generation_performance_tips.md)
|
||||||
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
|
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
|
||||||
|
|
|
@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f
|
||||||
- Using manual oneAPI installation:
|
- Using manual oneAPI installation:
|
||||||
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
|
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
|
||||||
```bash
|
```bash
|
||||||
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
|
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation
|
||||||
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
|
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
|
||||||
cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
```
|
```
|
||||||
|
|
||||||
- Using oneAPI docker image:
|
- Using oneAPI container image:
|
||||||
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
|
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
|
||||||
|
|
||||||
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
|
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
|
||||||
|
|
||||||
|
@ -280,19 +280,21 @@ cmake -B build -DGGML_VULKAN=ON
|
||||||
cmake --build build --config Release
|
cmake --build build --config Release
|
||||||
```
|
```
|
||||||
|
|
||||||
**With docker**:
|
**With containers**:
|
||||||
|
|
||||||
You don't need to install Vulkan SDK. It will be installed inside the container.
|
You don't need to install Vulkan SDK. It will be installed inside the container.
|
||||||
|
|
||||||
```sh
|
|
||||||
# Build the image
|
# Build the image
|
||||||
docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .
|
|
||||||
|
<details><summary>Docker example</summary>docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
|
||||||
|
<details><summary>Podman example</summary>podman build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
|
||||||
|
|
||||||
|
|
||||||
# Then, use it:
|
# Then, use it:
|
||||||
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
|
<details><summary>Docker example</summary>docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33</details>
|
||||||
```
|
<details><summary>Podman example</summary>podman run --security-opt label=disable -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33</details>
|
||||||
|
|
||||||
**Without docker**:
|
**Without a container**:
|
||||||
|
|
||||||
Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
|
Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
|
||||||
|
|
||||||
|
|
|
@ -1,11 +1,11 @@
|
||||||
# Docker
|
# Container
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
* Docker must be installed and running on your system.
|
* A container engine, ie Docker/Podman, must be installed and running on your system.
|
||||||
* Create a folder to store big models & intermediate files (ex. /llama/models)
|
* Create a folder to store big models & intermediate files (ex. /llama/models)
|
||||||
|
|
||||||
## Images
|
## Images
|
||||||
We have three Docker images available for this project:
|
We have three container images available for this project:
|
||||||
|
|
||||||
1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
|
1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
|
||||||
2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
|
2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
|
||||||
|
@ -27,43 +27,45 @@ The GPU enabled images are not currently tested by CI beyond being built. They a
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
|
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image.
|
||||||
|
|
||||||
Replace `/path/to/models` below with the actual path where you downloaded the models.
|
Replace `/path/to/models` below with the actual path where you downloaded the models.
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B</details>
|
||||||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
|
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B</details>
|
||||||
```
|
|
||||||
|
|
||||||
On completion, you are ready to play!
|
On completion, you are ready to play!
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
|
||||||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
|
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
|
||||||
```
|
|
||||||
|
|
||||||
or with a light image:
|
or with a light image:
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
|
||||||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
|
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512</details>
|
||||||
```
|
|
||||||
|
|
||||||
or with a server image:
|
or with a server image:
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512</details>
|
||||||
docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512
|
<details><summary>Podman example</summary>podman run --security-opt label=disable -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512</details>
|
||||||
```
|
|
||||||
|
|
||||||
## Docker With CUDA
|
|
||||||
|
## Container engines With CUDA
|
||||||
|
|
||||||
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
|
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
|
||||||
|
|
||||||
## Building Docker locally
|
## Building Container locally
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>
|
||||||
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
|
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
|
||||||
docker build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
|
docker build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
|
||||||
docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
|
docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
|
||||||
```
|
</details>
|
||||||
|
<details><summary>Podman example</summary>
|
||||||
|
podman build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
|
||||||
|
podman build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
|
||||||
|
podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
|
||||||
|
</details>
|
||||||
|
|
||||||
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
|
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
|
||||||
|
|
||||||
|
@ -82,23 +84,33 @@ The resulting images, are essentially the same as the non-CUDA images:
|
||||||
|
|
||||||
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
|
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>
|
||||||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
||||||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
||||||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
|
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
|
||||||
```
|
</details>
|
||||||
|
<details><summary>Podman example</summary>
|
||||||
|
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
||||||
|
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
|
||||||
|
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
|
||||||
|
</details>
|
||||||
|
|
||||||
## Docker With MUSA
|
## Container engines With MUSA
|
||||||
|
|
||||||
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
|
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
|
||||||
|
|
||||||
## Building Docker locally
|
## Building Container images locally
|
||||||
|
|
||||||
```bash
|
<details><summary>Docker example</summary>
|
||||||
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
|
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
|
||||||
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
|
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
|
||||||
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
|
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
|
||||||
```
|
</details>
|
||||||
|
<details><summary>Podman example</summary>
|
||||||
|
podman build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
|
||||||
|
podman build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
|
||||||
|
podman build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
|
||||||
|
</details>
|
||||||
|
|
||||||
You may want to pass in some different `ARGS`, depending on the MUSA environment supported by your container host, as well as the GPU architecture.
|
You may want to pass in some different `ARGS`, depending on the MUSA environment supported by your container host, as well as the GPU architecture.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue