update build.md

This commit is contained in:
slaren 2024-12-02 18:09:23 +01:00
parent 886c153c53
commit bbff53ae5b

View file

@ -7,9 +7,11 @@ git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp cd llama.cpp
``` ```
In order to build llama.cpp you have four different options. The following sections describe how to build with different backends and options.
- Using `CMake`: ## CPU-only Build
Build llama.cpp using `CMake`:
```bash ```bash
cmake -B build cmake -B build
@ -47,22 +49,10 @@ In order to build llama.cpp you have four different options.
``` ```
Note: Building for arm64 could also be done just with MSVC (with the build-arm64-windows-MSVC preset, or the standard CMake build instructions). But MSVC does not support inline ARM assembly-code, used e.g. for the accelerated Q4_0_4_8 CPU kernels. Note: Building for arm64 could also be done just with MSVC (with the build-arm64-windows-MSVC preset, or the standard CMake build instructions). But MSVC does not support inline ARM assembly-code, used e.g. for the accelerated Q4_0_4_8 CPU kernels.
- Using `gmake` (FreeBSD):
1. Install and activate [DRM in FreeBSD](https://wiki.freebsd.org/Graphics)
2. Add your user to **video** group
3. Install compilation dependencies.
```bash
sudo pkg install gmake automake autoconf pkgconf llvm15 openblas
gmake CC=/usr/local/bin/clang15 CXX=/usr/local/bin/clang++15 -j4
```
## Metal Build ## Metal Build
On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU. On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU.
To disable the Metal build at compile time use the `GGML_NO_METAL=1` flag or the `GGML_METAL=OFF` cmake option. To disable the Metal build at compile time use the `-DGGML_METAL=OFF` cmake option.
When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers|-ngl 0` command-line When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers|-ngl 0` command-line
argument. argument.
@ -159,9 +149,9 @@ The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enab
Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet. Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
### hipBLAS ### HIP
This provides BLAS acceleration on HIP-supported AMD GPUs. This provides GPU acceleration on HIP-supported AMD GPUs.
Make sure to have ROCm installed. Make sure to have ROCm installed.
You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick). You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html#rocm-install-quick).
@ -227,6 +217,12 @@ EOF
``` ```
Switch into the `llama.cpp` directory and build using CMake.
```sh
cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
```
#### Git Bash MINGW64 #### Git Bash MINGW64
Download and install [`Git-SCM`](https://git-scm.com/downloads/win) with the default settings Download and install [`Git-SCM`](https://git-scm.com/downloads/win) with the default settings
@ -246,8 +242,8 @@ cmake --build build --config Release
Now you can load the model in conversation mode using `Vulkan` Now you can load the model in conversation mode using `Vulkan`
``` ```sh
build/bin/release/llama-cli -m "[PATH TO MODEL]" -ngl 100 -c 16384 -t 10 -n -2 -cnv build/bin/Release/llama-cli -m "[PATH TO MODEL]" -ngl 100 -c 16384 -t 10 -n -2 -cnv
``` ```
#### MSYS2 #### MSYS2
@ -259,7 +255,8 @@ Install [MSYS2](https://www.msys2.org/) and then run the following commands in a
mingw-w64-ucrt-x86_64-vulkan-devel \ mingw-w64-ucrt-x86_64-vulkan-devel \
mingw-w64-ucrt-x86_64-shaderc mingw-w64-ucrt-x86_64-shaderc
``` ```
Switch into `llama.cpp` directory and build using CMake.
Switch into the `llama.cpp` directory and build using CMake.
```sh ```sh
cmake -B build -DGGML_VULKAN=ON cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release cmake --build build --config Release
@ -323,11 +320,13 @@ cmake --build build --config release
You can test with: You can test with:
`./build/bin/llama-cli -m PATH_TO_MODEL -p "Building a website can be done in 10 steps:" -ngl 32`
If the fllowing info is output on screen, you are using `llama.cpp by CANN backend`:
```bash ```bash
llm_load_tensors: CANN buffer size = 13313.00 MiB ./build/bin/llama-cli -m PATH_TO_MODEL -p "Building a website can be done in 10 steps:" -ngl 32
```
If the following info is output on screen, you are using `llama.cpp` with the CANN backend:
```bash
llm_load_tensors: CANN model buffer size = 13313.00 MiB
llama_new_context_with_model: CANN compute buffer size = 1260.81 MiB llama_new_context_with_model: CANN compute buffer size = 1260.81 MiB
``` ```