update readme, refactor build script

This commit is contained in:
jianyuzh 2024-01-15 13:42:24 +08:00 committed by Meng, Hengyu
parent c3c5b20ac5
commit ca2cb6982a
2 changed files with 126 additions and 124 deletions

View file

@ -8,40 +8,98 @@ oneAPI is a specification that is open and standards-based, supporting multiple
Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs. Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs.
This project is migrated the CUDA code to SYCL to support Intel CPU, GPU and FPGA.
But we focus on GPU performance tuning. If you want to run llama.cpp on Intel CPU, please use llama.cpp CPU release.
## llama.cpp for SYCL ## llama.cpp for SYCL
We migrate the CUDA code SYCL. So the SYCL code replace the CUDA funcitions in llama.cpp, without function name change. To avoid to re-invent the wheel, this code refer other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL.
That's why the code macro and log incudes CUBLAS flags. The llama.cpp for SYCL is used to support Intel GPUs.
For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
## OS ## OS
### Linux |OS|Status|Verified|
|-|-|-|
|Linux|Support|Ubuntu 22.04|
|Windows|Ongoing| |
In Linux, we reuse the CMAKE system of base. It's same as base llama.cpp.
Except branch "windows", other branches are for Linux. ## Intel GPU
### Windows |Intel GPU| Status | Verified Model|
|-|-|-|
In Windows, we change the C source files to meet the requirement of C++ compilers. |Intel Data Center Max Series| Support| Max 1550|
|Intel Data Center Flex Series| Support| Flex 170|
So the code is saved in branch **windows** only. |Intel Arc Series| Support| Arc 770|
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
It will output 1 execute file: **llamap.cpp.sycl.exe**. |Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
If you want to get more binary files, please change the build prject.
## Linux ## Linux
### Setup Environment ### Setup Environment
1. Install Intel oneAPI Base toolkit. 1. Install Intel GPU driver.
a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html).
Note: for iGPU, please install the client GPU driver.
b. Add user to group: video, render.
```
sudo usermod -aG render username
sudo usermod -aG video username
```
Note: re-login to enable it.
c. Check
```
sudo apt install clinfo
sudo clinfo -l
```
Output (example):
```
Platform #0: Intel(R) OpenCL Graphics
`-- Device #0: Intel(R) Arc(TM) A770 Graphics
Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49]
```
2. Install Intel® oneAPI Base toolkit.
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ] (https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
Recommend to install to default folder: **/opt/intel/oneapi**.
Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.
b. Check
```
source /opt/intel/oneapi/setvars.sh
sycl-ls
```
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
Output (example):
```
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
```
2. Build locally: 2. Build locally:
@ -49,137 +107,78 @@ If you want to get more binary files, please change the build prject.
mkdir -p build mkdir -p build
cd build cd build
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
#for FP16
#cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
#build example/main only
#cmake --build . --config Release --target main
#build all binary
cmake --build . --config Release -v cmake --build . --config Release -v
``` ```
or or
``` ```
./setup.sh ./sycl_build.sh
``` ```
### Run ### Run
#### Check device id 1. List device ID
Run without parameter: Run without parameter:
``` ```
export GGML_SYCL_LIST_DEVICE=1
./build/bin/main ./build/bin/main
``` ```
Check the id in startup log, like: Check the ID in startup log, like:
``` ```
SYCL devices: found 4 SYCL devices:
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3 Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2 max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0 Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0 max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
Device 4: Intel(R) UHD Graphics 770, compute capability 3.0 Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
Device 5: Intel(R) UHD Graphics 770, compute capability 1.3 max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
Exit for list devices task. unset GGML_SYCL_LIST_DEVICE to restore LLM work! Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
``` max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
#### Put model file to folder **models**
#### Modify run.sh
Up run.sh as above info:
```
...
GGML_SYCL_DEVICE=0
./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33
```
#### Run
```
./run.sh
```
## Windows
### Setup Environment
1. Install MS Visual Studio 2022.
2. Install Intel oneAPI Base toolkit.
a. Recommend to install all components and with **default path**.
b. During installation, please choose option to enable compiler in MS Visual Studio.
3. Code
Swith to branch **windows**.
Open **llama.cpp.sycl.sln** by Visual Studio 2022.
4. Set oneAPI Path (optional)
If you chagne the oneAPI installation target path, please modify the oneAPI path in the Visual Studio.
Else, skip this step.
### Build
Build by visual Studio 2022 with x64 & Release.
There will be execute file: **llama.cpp.sycl.exe**.
It will take long time to build due to enable AOT on all hardware flatforms (CPU, GPU, FPGA) as default.
To short it, change AOT target flatforms to one in Visual Studio 2022: **Specify SYCL offloading targets for AOT compilition**.
#### Run
#### Enable oneAPI Environment
Run the command in command line or powershell.
'C:\Program Files (x86)\Intel\oneAPI\setvars.bat'
##### Check device id
Run without parameter:
``` ```
set GGML_SYCL_LIST_DEVICE=1
.\x64\Release\llama.cpp.sycl.exe |Attribute|Note|
``` |-|-|
Check the id in startup log, like: |compute capability 1.3|Level-zero running time, recommended |
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
2. Put model file to folder **models**
3. Enable oneAPI running environment
``` ```
SYCL devices: source /opt/intel/oneapi/setvars.sh
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0
Device 4: Intel(R) UHD Graphics 770, compute capability 3.0
Device 5: Intel(R) UHD Graphics 770, compute capability 1.3
Exit for list devices task. unset GGML_SYCL_LIST_DEVICE to restore LLM work!
``` ```
#### Put model file to folder **models** 4. Set device ID and execute llama.cpp
#### Modify run.sh Set device ID = 0 by **GGML_SYCL_DEVICE=0**
Up run.sh as above info:
``` ```
... GGML_SYCL_DEVICE=0 && ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
set GGML_SYCL_DEVICE=0 ```
5. Check the device ID in output
.\x64\Release\llama.cpp.sycl.exe -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 Like
```
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
``` ```
#### Run
```
.\run.bat
```
### Environment Variable ### Environment Variable
@ -198,7 +197,5 @@ set GGML_SYCL_DEVICE=0
|Name|Value|Function| |Name|Value|Function|
|-|-|-| |-|-|-|
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. List the device ids by **GGML_SYCL_LIST_DEVICE**| |GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log funciton by macro: GGML_SYCL_DEBUG| |GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
|GGML_SYCL_LIST_DEVICE|0 (default) or 1|List the device ids only|

View file

@ -2,8 +2,13 @@ mkdir -p build
cd build cd build
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
#cmake .. -DLLAMA_CLBLAST=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx #for FP16
#cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON #cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON
cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build . --config Release --target main
#build example/main only
#cmake --build . --config Release --target main
#build all binary
cmake --build . --config Release -v