diff --git a/README-sycl.md b/README-sycl.md index de3641293..de4f23ca5 100644 --- a/README-sycl.md +++ b/README-sycl.md @@ -1,22 +1,16 @@ # llama.cpp for SYCL -[Background](#background) - -[OS](#os) - -[Intel GPU](#intel-gpu) - -[Linux](#linux) - -[Windows](#windows) - -[Environment Variable](#environment-variable) - -[Known Issue](#known-issue) - -[Q&A](#q&a) - -[Todo](#todo) +- [Background](#background) +- [OS](#os) +- [Intel GPU](#intel-gpu) +- Build and run: + - [Docker](#docker) + - [Linux](#linux) + - [Windows](#windows) +- [Environment Variable](#environment-variable) +- [Known Issue](#known-issue) +- [Q&A](#q&a) +- [Todo](#todo) ## Background @@ -36,7 +30,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building). |OS|Status|Verified| |-|-|-| -|Linux|Support|Ubuntu 22.04| +|Linux|Support|Ubuntu 22.04, Fedora Silverblue 39| |Windows|Support|Windows 11| @@ -51,6 +45,38 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building). |Intel iGPU| Support| iGPU in i5-1250P, i7-1260P, i7-1165G7| +## Docker + +Note: +- Only docker on Linux is tested. Docker on WSL may not work. +- You may need to install Intel GPU driver on the host machine (See the [Linux](#linux) section to know how to do that) + +### Build the image + +You can choose between **F16** and **F32** build. F16 is faster for long-prompt inference. + + +```sh +# For F16: +#docker build -t llama-cpp-sycl --build-arg="LLAMA_SYCL_F16=ON" -f .devops/main-intel.Dockerfile . + +# Or, for F32: +docker build -t llama-cpp-sycl -f .devops/main-intel.Dockerfile . + +# Note: you can also use the ".devops/main-server.Dockerfile", which compiles the "server" example +``` + +### Run + +```sh +# Firstly, find all the DRI cards: +ls -la /dev/dri +# Then, pick the card that you want to use. + +# For example with "/dev/dri/card1" +docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-sycl -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 +``` + ## Linux ### Setup Environment @@ -90,8 +116,6 @@ Platform #0: Intel(R) OpenCL HD Graphics 2. Install IntelĀ® oneAPI Base toolkit. -Note: You can skip step this if you want to build inside docker container - a. Please follow the procedure in [Get the IntelĀ® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). Recommend to install to default folder: **/opt/intel/oneapi**. @@ -123,20 +147,6 @@ Note: - You can choose between **F16** and **F32** build. F16 is faster for long-prompt inference. - By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only. -Method using **docker**: - -```sh -# For F16: -#docker build -t llama-cpp-sycl:latest --build-arg="LLAMA_SYCL_F16=ON" -f .devops/main-intel.Dockerfile . - -# Or, for F32: -docker build -t llama-cpp-sycl -f .devops/main-intel.Dockerfile . - -# Note: you can also use the ".devops/main-server.Dockerfile", which compiles the "server" example -``` - -or, without docker: - ```sh mkdir -p build cd build @@ -175,8 +185,6 @@ source /opt/intel/oneapi/setvars.sh 3. List device ID -(Skip this step if you're using docker) - Run without parameter: ```sh @@ -211,18 +219,6 @@ found 4 SYCL devices: Set device ID = 0 by **GGML_SYCL_DEVICE=0** -Using docker image built from step 2: - -```sh -# Firstly, find all the DRI cards: -ls -la /dev/dri - -# Then, pick the card that you want to use. For example "/dev/dri/card1" -docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-sycl -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -``` - -or, without docker: - ```sh GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 ``` diff --git a/README.md b/README.md index f26f8e91a..039f267e4 100644 --- a/README.md +++ b/README.md @@ -392,15 +392,15 @@ Building the program with BLAS support may lead to some performance improvements Check [BLIS.md](docs/BLIS.md) for more information. -- #### SYCL (using Intel DPCPP) +- #### SYCL SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. llama.cpp based on SYCL is used to **support Intel GPU** (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU). - For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md). + For detailed info, please refer to [llama.cpp for SYCL](README-sycl.md). - #### Intel oneMKL - Building through oneAPI compilers will make avx_vnni instruction set available for intel processors that do not support avx512 and avx512_vnni. Please note that this build config **does not support Intel GPU**. For Intel GPU support, please refer to [llama.cpp for SYCL](./README_sycl.md). + Building through oneAPI compilers will make avx_vnni instruction set available for intel processors that do not support avx512 and avx512_vnni. Please note that this build config **does not support Intel GPU**. For Intel GPU support, please refer to [llama.cpp for SYCL](./README-sycl.md). - Using manual oneAPI installation: By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps: