docs: sycl: add docker section
This commit is contained in:
parent
380eabd828
commit
ddfbc23ca9
2 changed files with 47 additions and 51 deletions
|
@ -1,22 +1,16 @@
|
|||
# llama.cpp for SYCL
|
||||
|
||||
[Background](#background)
|
||||
|
||||
[OS](#os)
|
||||
|
||||
[Intel GPU](#intel-gpu)
|
||||
|
||||
[Linux](#linux)
|
||||
|
||||
[Windows](#windows)
|
||||
|
||||
[Environment Variable](#environment-variable)
|
||||
|
||||
[Known Issue](#known-issue)
|
||||
|
||||
[Q&A](#q&a)
|
||||
|
||||
[Todo](#todo)
|
||||
- [Background](#background)
|
||||
- [OS](#os)
|
||||
- [Intel GPU](#intel-gpu)
|
||||
- Build and run:
|
||||
- [Docker](#docker)
|
||||
- [Linux](#linux)
|
||||
- [Windows](#windows)
|
||||
- [Environment Variable](#environment-variable)
|
||||
- [Known Issue](#known-issue)
|
||||
- [Q&A](#q&a)
|
||||
- [Todo](#todo)
|
||||
|
||||
## Background
|
||||
|
||||
|
@ -36,7 +30,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|
|||
|
||||
|OS|Status|Verified|
|
||||
|-|-|-|
|
||||
|Linux|Support|Ubuntu 22.04|
|
||||
|Linux|Support|Ubuntu 22.04, Fedora Silverblue 39|
|
||||
|Windows|Support|Windows 11|
|
||||
|
||||
|
||||
|
@ -51,6 +45,38 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|
|||
|Intel iGPU| Support| iGPU in i5-1250P, i7-1260P, i7-1165G7|
|
||||
|
||||
|
||||
## Docker
|
||||
|
||||
Note:
|
||||
- Only docker on Linux is tested. Docker on WSL may not work.
|
||||
- You may need to install Intel GPU driver on the host machine (See the [Linux](#linux) section to know how to do that)
|
||||
|
||||
### Build the image
|
||||
|
||||
You can choose between **F16** and **F32** build. F16 is faster for long-prompt inference.
|
||||
|
||||
|
||||
```sh
|
||||
# For F16:
|
||||
#docker build -t llama-cpp-sycl --build-arg="LLAMA_SYCL_F16=ON" -f .devops/main-intel.Dockerfile .
|
||||
|
||||
# Or, for F32:
|
||||
docker build -t llama-cpp-sycl -f .devops/main-intel.Dockerfile .
|
||||
|
||||
# Note: you can also use the ".devops/main-server.Dockerfile", which compiles the "server" example
|
||||
```
|
||||
|
||||
### Run
|
||||
|
||||
```sh
|
||||
# Firstly, find all the DRI cards:
|
||||
ls -la /dev/dri
|
||||
# Then, pick the card that you want to use.
|
||||
|
||||
# For example with "/dev/dri/card1"
|
||||
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-sycl -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
|
||||
```
|
||||
|
||||
## Linux
|
||||
|
||||
### Setup Environment
|
||||
|
@ -90,8 +116,6 @@ Platform #0: Intel(R) OpenCL HD Graphics
|
|||
|
||||
2. Install Intel® oneAPI Base toolkit.
|
||||
|
||||
Note: You can skip step this if you want to build inside docker container
|
||||
|
||||
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
|
||||
|
||||
Recommend to install to default folder: **/opt/intel/oneapi**.
|
||||
|
@ -123,20 +147,6 @@ Note:
|
|||
- You can choose between **F16** and **F32** build. F16 is faster for long-prompt inference.
|
||||
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
|
||||
|
||||
Method using **docker**:
|
||||
|
||||
```sh
|
||||
# For F16:
|
||||
#docker build -t llama-cpp-sycl:latest --build-arg="LLAMA_SYCL_F16=ON" -f .devops/main-intel.Dockerfile .
|
||||
|
||||
# Or, for F32:
|
||||
docker build -t llama-cpp-sycl -f .devops/main-intel.Dockerfile .
|
||||
|
||||
# Note: you can also use the ".devops/main-server.Dockerfile", which compiles the "server" example
|
||||
```
|
||||
|
||||
or, without docker:
|
||||
|
||||
```sh
|
||||
mkdir -p build
|
||||
cd build
|
||||
|
@ -175,8 +185,6 @@ source /opt/intel/oneapi/setvars.sh
|
|||
|
||||
3. List device ID
|
||||
|
||||
(Skip this step if you're using docker)
|
||||
|
||||
Run without parameter:
|
||||
|
||||
```sh
|
||||
|
@ -211,18 +219,6 @@ found 4 SYCL devices:
|
|||
|
||||
Set device ID = 0 by **GGML_SYCL_DEVICE=0**
|
||||
|
||||
Using docker image built from step 2:
|
||||
|
||||
```sh
|
||||
# Firstly, find all the DRI cards:
|
||||
ls -la /dev/dri
|
||||
|
||||
# Then, pick the card that you want to use. For example "/dev/dri/card1"
|
||||
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-sycl -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
|
||||
```
|
||||
|
||||
or, without docker:
|
||||
|
||||
```sh
|
||||
GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
|
||||
```
|
||||
|
|
|
@ -392,15 +392,15 @@ Building the program with BLAS support may lead to some performance improvements
|
|||
|
||||
Check [BLIS.md](docs/BLIS.md) for more information.
|
||||
|
||||
- #### SYCL (using Intel DPCPP)
|
||||
- #### SYCL
|
||||
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators.
|
||||
|
||||
llama.cpp based on SYCL is used to **support Intel GPU** (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
|
||||
|
||||
For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md).
|
||||
For detailed info, please refer to [llama.cpp for SYCL](README-sycl.md).
|
||||
|
||||
- #### Intel oneMKL
|
||||
Building through oneAPI compilers will make avx_vnni instruction set available for intel processors that do not support avx512 and avx512_vnni. Please note that this build config **does not support Intel GPU**. For Intel GPU support, please refer to [llama.cpp for SYCL](./README_sycl.md).
|
||||
Building through oneAPI compilers will make avx_vnni instruction set available for intel processors that do not support avx512 and avx512_vnni. Please note that this build config **does not support Intel GPU**. For Intel GPU support, please refer to [llama.cpp for SYCL](./README-sycl.md).
|
||||
|
||||
- Using manual oneAPI installation:
|
||||
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue