support SYCL backend windows build
This commit is contained in:
parent
d2f650cb5b
commit
838f8ea131
3 changed files with 191 additions and 14 deletions
|
@ -501,7 +501,11 @@ if (LLAMA_SYCL)
|
||||||
set(GGML_HEADERS_SYCL ggml.h ggml-sycl.h)
|
set(GGML_HEADERS_SYCL ggml.h ggml-sycl.h)
|
||||||
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
|
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
|
||||||
|
|
||||||
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
|
if (WIN32)
|
||||||
|
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl sycl7 OpenCL mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib)
|
||||||
|
else()
|
||||||
|
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
|
||||||
|
endif()
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
function(get_flags CCID CCVER)
|
function(get_flags CCID CCVER)
|
||||||
|
|
|
@ -8,10 +8,14 @@
|
||||||
|
|
||||||
[Linux](#linux)
|
[Linux](#linux)
|
||||||
|
|
||||||
|
[Windows](#windows)
|
||||||
|
|
||||||
[Environment Variable](#environment-variable)
|
[Environment Variable](#environment-variable)
|
||||||
|
|
||||||
[Known Issue](#known-issue)
|
[Known Issue](#known-issue)
|
||||||
|
|
||||||
|
[Q&A](#q&a)
|
||||||
|
|
||||||
[Todo](#todo)
|
[Todo](#todo)
|
||||||
|
|
||||||
## Background
|
## Background
|
||||||
|
@ -33,7 +37,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|
||||||
|OS|Status|Verified|
|
|OS|Status|Verified|
|
||||||
|-|-|-|
|
|-|-|-|
|
||||||
|Linux|Support|Ubuntu 22.04|
|
|Linux|Support|Ubuntu 22.04|
|
||||||
|Windows|Ongoing| |
|
|Windows|Support|Windows 11|
|
||||||
|
|
||||||
|
|
||||||
## Intel GPU
|
## Intel GPU
|
||||||
|
@ -42,7 +46,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
|
||||||
|-|-|-|
|
|-|-|-|
|
||||||
|Intel Data Center Max Series| Support| Max 1550|
|
|Intel Data Center Max Series| Support| Max 1550|
|
||||||
|Intel Data Center Flex Series| Support| Flex 170|
|
|Intel Data Center Flex Series| Support| Flex 170|
|
||||||
|Intel Arc Series| Support| Arc 770|
|
|Intel Arc Series| Support| Arc 770, 730M|
|
||||||
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
|
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
|
||||||
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
|
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
|
||||||
|
|
||||||
|
@ -131,6 +135,7 @@ cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
|
||||||
#build all binary
|
#build all binary
|
||||||
cmake --build . --config Release -v
|
cmake --build . --config Release -v
|
||||||
|
|
||||||
|
cd ..
|
||||||
```
|
```
|
||||||
|
|
||||||
or
|
or
|
||||||
|
@ -195,7 +200,7 @@ GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building
|
||||||
or run by script:
|
or run by script:
|
||||||
|
|
||||||
```
|
```
|
||||||
./examples/sycl/run_llama2.sh
|
./examples/sycl/run-llama2.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
|
@ -205,11 +210,173 @@ Note:
|
||||||
|
|
||||||
5. Check the device ID in output
|
5. Check the device ID in output
|
||||||
|
|
||||||
Like:
|
Like:
|
||||||
```
|
```
|
||||||
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Windows
|
||||||
|
|
||||||
|
### Setup Environment
|
||||||
|
|
||||||
|
1. Install Intel GPU driver.
|
||||||
|
|
||||||
|
Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
|
||||||
|
|
||||||
|
2. Install Intel® oneAPI Base toolkit.
|
||||||
|
|
||||||
|
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
|
||||||
|
|
||||||
|
Recommend to install to default folder: **/opt/intel/oneapi**.
|
||||||
|
|
||||||
|
Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.
|
||||||
|
|
||||||
|
b. Enable oneAPI running environment:
|
||||||
|
|
||||||
|
- In Search, input 'oneAPI'.
|
||||||
|
|
||||||
|
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
|
||||||
|
|
||||||
|
- In Run:
|
||||||
|
|
||||||
|
In CMD:
|
||||||
|
```
|
||||||
|
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
|
||||||
|
```
|
||||||
|
|
||||||
|
c. Check GPU
|
||||||
|
|
||||||
|
In oneAPI command line:
|
||||||
|
|
||||||
|
```
|
||||||
|
sycl-ls
|
||||||
|
```
|
||||||
|
|
||||||
|
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
|
||||||
|
|
||||||
|
Output (example):
|
||||||
|
```
|
||||||
|
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
|
||||||
|
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
|
||||||
|
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
|
||||||
|
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Install cmake & make
|
||||||
|
|
||||||
|
a. Download & install cmake for windows: https://cmake.org/download/
|
||||||
|
|
||||||
|
b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
|
||||||
|
|
||||||
|
|
||||||
|
### Build locally:
|
||||||
|
|
||||||
|
In oneAPI command line window:
|
||||||
|
|
||||||
|
```
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
|
||||||
|
#for FP16
|
||||||
|
#cmake -G "MinGW Makefiles" . -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON
|
||||||
|
|
||||||
|
#for FP32
|
||||||
|
cmake -G "MinGW Makefiles" . -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release
|
||||||
|
|
||||||
|
#build example/main only
|
||||||
|
#make main
|
||||||
|
|
||||||
|
#build all binary
|
||||||
|
make
|
||||||
|
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```
|
||||||
|
.\examples\sycl\win-build.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
Note:
|
||||||
|
|
||||||
|
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
|
||||||
|
|
||||||
|
### Run
|
||||||
|
|
||||||
|
1. Put model file to folder **models**
|
||||||
|
|
||||||
|
2. Enable oneAPI running environment
|
||||||
|
|
||||||
|
- In Search, input 'oneAPI'.
|
||||||
|
|
||||||
|
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
|
||||||
|
|
||||||
|
- In Run:
|
||||||
|
|
||||||
|
In CMD:
|
||||||
|
```
|
||||||
|
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
|
||||||
|
```
|
||||||
|
|
||||||
|
3. List device ID
|
||||||
|
|
||||||
|
Run without parameter:
|
||||||
|
|
||||||
|
```
|
||||||
|
build\bin\ls-sycl-device.exe
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
build\bin\main.exe
|
||||||
|
```
|
||||||
|
|
||||||
|
Check the ID in startup log, like:
|
||||||
|
|
||||||
|
```
|
||||||
|
found 4 SYCL devices:
|
||||||
|
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
|
||||||
|
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|
||||||
|
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
|
||||||
|
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
|
||||||
|
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
|
||||||
|
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
|
||||||
|
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
|
||||||
|
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|Attribute|Note|
|
||||||
|
|-|-|
|
||||||
|
|compute capability 1.3|Level-zero running time, recommended |
|
||||||
|
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
|
||||||
|
|
||||||
|
4. Set device ID and execute llama.cpp
|
||||||
|
|
||||||
|
Set device ID = 0 by **set GGML_SYCL_DEVICE=0**
|
||||||
|
|
||||||
|
```
|
||||||
|
set GGML_SYCL_DEVICE=0
|
||||||
|
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0
|
||||||
|
```
|
||||||
|
or run by script:
|
||||||
|
|
||||||
|
```
|
||||||
|
.\examples\sycl\win-run-llama2.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
Note:
|
||||||
|
|
||||||
|
- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.
|
||||||
|
|
||||||
|
|
||||||
|
5. Check the device ID in output
|
||||||
|
|
||||||
|
Like:
|
||||||
|
```
|
||||||
|
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
||||||
|
```
|
||||||
|
|
||||||
## Environment Variable
|
## Environment Variable
|
||||||
|
|
||||||
|
@ -220,7 +387,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
||||||
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
|
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
|
||||||
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference. <br>For FP32, not set it.|
|
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference. <br>For FP32, not set it.|
|
||||||
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|
||||||
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
|
|CMAKE_CXX_COMPILER|icpx (Linux), icx (Windows)|use icpx/icx for SYCL code path|
|
||||||
|
|
||||||
#### Running
|
#### Running
|
||||||
|
|
||||||
|
@ -232,19 +399,24 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
||||||
|
|
||||||
## Known Issue
|
## Known Issue
|
||||||
|
|
||||||
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
|
|
||||||
|
|
||||||
Miss to enable oneAPI running environment.
|
|
||||||
|
|
||||||
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
|
|
||||||
|
|
||||||
|
|
||||||
- Hang during startup
|
- Hang during startup
|
||||||
|
|
||||||
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
|
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
|
||||||
|
|
||||||
Solution: add **--no-mmap**.
|
Solution: add **--no-mmap**.
|
||||||
|
|
||||||
|
## Q&A
|
||||||
|
|
||||||
|
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
|
||||||
|
|
||||||
|
Miss to enable oneAPI running environment.
|
||||||
|
|
||||||
|
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
|
||||||
|
|
||||||
|
- In Windows, no result, not error.
|
||||||
|
|
||||||
|
Miss to enable oneAPI running environment.
|
||||||
|
|
||||||
## Todo
|
## Todo
|
||||||
|
|
||||||
- Support to build in Windows.
|
- Support to build in Windows.
|
|
@ -11,6 +11,7 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
|
||||||
### Hot topics
|
### Hot topics
|
||||||
|
|
||||||
- ⚠️ Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138
|
- ⚠️ Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138
|
||||||
|
- [SYCL backend](README-sycl.md) is ready (1/28/2024), support Linux/Windows in Intel GPUs (iGPU, Arc/Flex/Max series)
|
||||||
- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow
|
- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow
|
||||||
- Collecting Apple Silicon performance stats:
|
- Collecting Apple Silicon performance stats:
|
||||||
- M-series: https://github.com/ggerganov/llama.cpp/discussions/4167
|
- M-series: https://github.com/ggerganov/llama.cpp/discussions/4167
|
||||||
|
@ -605,7 +606,7 @@ Building the program with BLAS support may lead to some performance improvements
|
||||||
|
|
||||||
llama.cpp based on SYCL is used to support Intel GPU (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
|
llama.cpp based on SYCL is used to support Intel GPU (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
|
||||||
|
|
||||||
For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md).
|
For detailed info, please refer to [llama.cpp for SYCL](README-sycl.md).
|
||||||
|
|
||||||
|
|
||||||
### Prepare Data & Run
|
### Prepare Data & Run
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue