update build.md

This commit is contained in:
slaren 2024-12-02 18:09:23 +01:00
parent 886c153c53
commit bbff53ae5b

View file

@ -7,20 +7,22 @@ git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp cd llama.cpp
``` ```
In order to build llama.cpp you have four different options. The following sections describe how to build with different backends and options.
- Using `CMake`: ## CPU-only Build
```bash Build llama.cpp using `CMake`:
cmake -B build
cmake --build build --config Release
```
**Notes**: ```bash
cmake -B build
cmake --build build --config Release
```
- For faster compilation, add the `-j` argument to run multiple jobs in parallel. For example, `cmake --build build --config Release -j 8` will run 8 jobs in parallel. **Notes**:
- For faster repeated compilation, install [ccache](https://ccache.dev/).
- For debug builds, there are two cases: - For faster compilation, add the `-j` argument to run multiple jobs in parallel. For example, `cmake --build build --config Release -j 8` will run 8 jobs in parallel.
- For faster repeated compilation, install [ccache](https://ccache.dev/).
- For debug builds, there are two cases:
1. Single-config generators (e.g. default = `Unix Makefiles`; note that they just ignore the `--config` flag): 1. Single-config generators (e.g. default = `Unix Makefiles`; note that they just ignore the `--config` flag):
@ -35,7 +37,7 @@ In order to build llama.cpp you have four different options.
cmake -B build -G "Xcode" cmake -B build -G "Xcode"
cmake --build build --config Debug cmake --build build --config Debug
``` ```
- Building for Windows (x86, x64 and arm64) with MSVC or clang as compilers: - Building for Windows (x86, x64 and arm64) with MSVC or clang as compilers:
- Install Visual Studio 2022, e.g. via the [Community Edition](https://visualstudio.microsoft.com/de/vs/community/). In the installer, select at least the following options (this also automatically installs the required additional tools like CMake,...): - Install Visual Studio 2022, e.g. via the [Community Edition](https://visualstudio.microsoft.com/de/vs/community/). In the installer, select at least the following options (this also automatically installs the required additional tools like CMake,...):
- Tab Workload: Desktop-development with C++ - Tab Workload: Desktop-development with C++
- Tab Components (select quickly via search): C++-_CMake_ Tools for Windows, _Git_ for Windows, C++-_Clang_ Compiler for Windows, MS-Build Support for LLVM-Toolset (clang) - Tab Components (select quickly via search): C++-_CMake_ Tools for Windows, _Git_ for Windows, C++-_Clang_ Compiler for Windows, MS-Build Support for LLVM-Toolset (clang)
@ -47,22 +49,10 @@ In order to build llama.cpp you have four different options.
``` ```
Note: Building for arm64 could also be done just with MSVC (with the build-arm64-windows-MSVC preset, or the standard CMake build instructions). But MSVC does not support inline ARM assembly-code, used e.g. for the accelerated Q4_0_4_8 CPU kernels. Note: Building for arm64 could also be done just with MSVC (with the build-arm64-windows-MSVC preset, or the standard CMake build instructions). But MSVC does not support inline ARM assembly-code, used e.g. for the accelerated Q4_0_4_8 CPU kernels.
- Using `gmake` (FreeBSD):
1. Install and activate [DRM in FreeBSD](https://wiki.freebsd.org/Graphics)
2. Add your user to **video** group
3. Install compilation dependencies.
```bash
sudo pkg install gmake automake autoconf pkgconf llvm15 openblas
gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j4
```
## Metal Build ## Metal Build
On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU. On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU.
To disable the Metal build at compile time use the `GGML_NO_METAL=1` flag or the `GGML_METAL=OFF` cmake option. To disable the Metal build at compile time use the `-DGGML_METAL=OFF` cmake option.
When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers|-ngl 0` command-line When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers|-ngl 0` command-line
argument. argument.
@ -159,9 +149,9 @@ The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enab
Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet. Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
### hipBLAS ### HIP
This provides BLAS acceleration on HIP-supported AMD GPUs. This provides GPU acceleration on HIP-supported AMD GPUs.
Make sure to have ROCm installed. Make sure to have ROCm installed.
You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick). You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick).
@ -227,6 +217,12 @@ EOF
``` ```
Switch into the `llama.cpp` directory and build using CMake.
```sh
cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
```
#### Git Bash MINGW64 #### Git Bash MINGW64
Download and install [`Git-SCM`](https://git-scm.com/downloads/win) with the default settings Download and install [`Git-SCM`](https://git-scm.com/downloads/win) with the default settings
@ -246,20 +242,21 @@ cmake --build build --config Release
Now you can load the model in conversation mode using `Vulkan` Now you can load the model in conversation mode using `Vulkan`
``` ```sh
build/bin/release/llama-cli -m "[PATH TO MODEL]" -ngl 100 -c 16384 -t 10 -n -2 -cnv build/bin/Release/llama-cli -m "[PATH TO MODEL]" -ngl 100 -c 16384 -t 10 -n -2 -cnv
``` ```
#### MSYS2 #### MSYS2
Install [MSYS2](https://www.msys2.org/) and then run the following commands in a UCRT terminal to install dependencies. Install [MSYS2](https://www.msys2.org/) and then run the following commands in a UCRT terminal to install dependencies.
```sh ```sh
pacman -S git \ pacman -S git \
mingw-w64-ucrt-x86_64-gcc \ mingw-w64-ucrt-x86_64-gcc \
mingw-w64-ucrt-x86_64-cmake \ mingw-w64-ucrt-x86_64-cmake \
mingw-w64-ucrt-x86_64-vulkan-devel \ mingw-w64-ucrt-x86_64-vulkan-devel \
mingw-w64-ucrt-x86_64-shaderc mingw-w64-ucrt-x86_64-shaderc
``` ```
Switch into `llama.cpp` directory and build using CMake.
Switch into the `llama.cpp` directory and build using CMake.
```sh ```sh
cmake -B build -DGGML_VULKAN=ON cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release cmake --build build --config Release
@ -323,11 +320,13 @@ cmake --build build --config release
You can test with: You can test with:
`./build/bin/llama-cli -m PATH_TO_MODEL -p "Building a website can be done in 10 steps:" -ngl 32`
If the fllowing info is output on screen, you are using `llama.cpp by CANN backend`:
```bash ```bash
llm_load_tensors: CANN buffer size = 13313.00 MiB ./build/bin/llama-cli -m PATH_TO_MODEL -p "Building a website can be done in 10 steps:" -ngl 32
```
If the following info is output on screen, you are using `llama.cpp` with the CANN backend:
```bash
llm_load_tensors: CANN model buffer size = 13313.00 MiB
llama_new_context_with_model: CANN compute buffer size = 1260.81 MiB llama_new_context_with_model: CANN compute buffer size = 1260.81 MiB
``` ```