feature: add blis support
This commit is contained in:
parent
2d5db48371
commit
0926278434
4 changed files with 95 additions and 0 deletions
67
BLIS.md
Normal file
67
BLIS.md
Normal file
|
@ -0,0 +1,67 @@
|
|||
BLIS Installation Manual
|
||||
------------------------
|
||||
|
||||
BLIS is a portable software framework for high-performance BLAS-like dense linear algebra libraries. It has received awards and recognition, including the 2023 James H. Wilkinson Prize for Numerical Software and the 2020 SIAM Activity Group on Supercomputing Best Paper Prize. BLIS provides a new BLAS-like API and a compatibility layer for traditional BLAS routine calls. It offers features such as object-based API, typed API, BLAS and CBLAS compatibility layers.
|
||||
|
||||
Project URL: https://github.com/flame/blis
|
||||
|
||||
### Prepare:
|
||||
|
||||
Compile BLIS:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/flame/blis
|
||||
cd blis
|
||||
./configure --enable-cblas -t openmp,pthreads auto
|
||||
# will install to /usr/local/ by default.
|
||||
make -j
|
||||
```
|
||||
|
||||
Install BLIS:
|
||||
|
||||
```bash
|
||||
sudo make install
|
||||
```
|
||||
|
||||
We recommend using openmp since it's easier to modify the cores been used.
|
||||
|
||||
### llama.cpp compilation
|
||||
|
||||
Makefile:
|
||||
|
||||
```bash
|
||||
make LLAMA_BLIS=1 -j
|
||||
# make LLAMA_BLIS=1 benchmark-matmult
|
||||
```
|
||||
|
||||
CMake:
|
||||
|
||||
```bash
|
||||
mkdir build
|
||||
cd build
|
||||
cmake -DLLAMA_BLIS=ON ..
|
||||
make -j
|
||||
```
|
||||
|
||||
### llama.cpp execution
|
||||
|
||||
According to the BLIS documentation, we could set the following
|
||||
environment variables to modify the behavior of openmp:
|
||||
|
||||
```
|
||||
export GOMP_GPU_AFFINITY="0-19"
|
||||
export BLIS_NUM_THREADS=14
|
||||
```
|
||||
|
||||
And then run the binaries as normal.
|
||||
|
||||
|
||||
### Intel specific issue
|
||||
|
||||
Some might get the error message saying that `libimf.so` cannot be found.
|
||||
Please follow this [stackoverflow page](https://stackoverflow.com/questions/70687930/intel-oneapi-2022-libimf-so-no-such-file-or-directory-during-openmpi-compila).
|
||||
|
||||
### Reference:
|
||||
|
||||
1. https://github.com/flame/blis#getting-started
|
||||
2. https://github.com/flame/blis/blob/master/docs/Multithreading.md
|
|
@ -66,6 +66,7 @@ endif()
|
|||
# 3rd party libs
|
||||
option(LLAMA_ACCELERATE "llama: enable Accelerate framework" ON)
|
||||
option(LLAMA_OPENBLAS "llama: use OpenBLAS" OFF)
|
||||
option(LLAMA_BLIS "llama: use blis" OFF)
|
||||
option(LLAMA_CUBLAS "llama: use cuBLAS" OFF)
|
||||
option(LLAMA_CLBLAST "llama: use CLBlast" OFF)
|
||||
|
||||
|
@ -178,6 +179,24 @@ if (LLAMA_OPENBLAS)
|
|||
endif()
|
||||
endif()
|
||||
|
||||
if (LLAMA_BLIS)
|
||||
add_compile_definitions(GGML_USE_BLIS)
|
||||
# we don't directly call BLIS apis, use cblas wrapper instead
|
||||
add_compile_definitions(GGML_USE_OPENBLAS)
|
||||
set(BLIS_INCLUDE_SEARCH_PATHS
|
||||
/usr/include
|
||||
/usr/include/blis
|
||||
/usr/local/include
|
||||
/usr/local/include/blis
|
||||
$ENV{BLIS_HOME}
|
||||
$ENV{BLIS_HOME}/include
|
||||
)
|
||||
find_path(BLIS_INC NAMES blis.h PATHS ${BLIS_INCLUDE_SEARCH_PATHS})
|
||||
add_compile_definitions(BLIS_ENABLE_CBLAS)
|
||||
add_link_options(-lblis)
|
||||
add_compile_options(-I${BLIS_INC})
|
||||
endif()
|
||||
|
||||
if (LLAMA_CUBLAS)
|
||||
cmake_minimum_required(VERSION 3.17)
|
||||
|
||||
|
|
4
Makefile
4
Makefile
|
@ -122,6 +122,10 @@ ifdef LLAMA_OPENBLAS
|
|||
LDFLAGS += -lopenblas
|
||||
endif
|
||||
endif
|
||||
ifdef LLAMA_BLIS
|
||||
CFLAGS += -DGGML_USE_OPENBLAS -DGGML_USE_BLIS -I/usr/local/include/blis -I/usr/include/blis
|
||||
LDFLAGS += -lblis -L/usr/local/lib
|
||||
endif
|
||||
ifdef LLAMA_CUBLAS
|
||||
CFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
|
||||
CXXFLAGS += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/x86_64-linux/include
|
||||
|
|
|
@ -58,6 +58,7 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
|
|||
- Runs on the CPU
|
||||
- OpenBLAS support
|
||||
- cuBLAS and CLBlast support
|
||||
- BLIS support (cblas wrapper)
|
||||
|
||||
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
|
||||
Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves
|
||||
|
@ -278,6 +279,10 @@ Building the program with BLAS support may lead to some performance improvements
|
|||
cmake --build . --config Release
|
||||
```
|
||||
|
||||
- BLIS
|
||||
|
||||
Check [BLIS.md](BLIS.md) for more information.
|
||||
|
||||
- cuBLAS
|
||||
|
||||
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue