tune: update readme

This commit is contained in:
mqy 2023-06-19 13:50:35 +08:00
parent 6609c229e8
commit 65fd65e0c1

View file

@ -19,21 +19,60 @@ run bench ahead of time (saving tens of seconds), but there are two shortcomings
outdated format. So I integrated mulmat tune into `main` and `perplexity` as outdated format. So I integrated mulmat tune into `main` and `perplexity` as
a complementary solution. a complementary solution.
## Build into main and perplexity The `load` mode try validates at least the following fields:
- version
- model
- ftype
- n_threads
- n_profiles
- profiles
`n_threads` is very critical to performance, to select best n_threads.
when run `main` or `perplexity`, the n_threads is automatically set, the default
n_threads generally works well. Example:
```
system_info: n_threads = 4 / 12
```
This is read as use 4 of total 12 cores(with 6 physical cores).
## Build
Compile options:
- `LLAMA_TUNE` for CMake (default ON)
- `LLAMA_NO_TUNE` for Make (default undefined)
`GGML_USE_TUNE` and `GGML_TUNE_NDEBUG` are defined when llama tune is enabled.
When `GGML_USE_TUNE` is defined, mulmat_tune functionalities are compiled into
main and perplexity:
- cli args `--tune`, `--tune-file` are visible.
- try selecting fastest task profile according to tune result for mul_mat.
The standalone tool `mulmat-tune` is always build: no compile options.
**Makefile**
To use tune, at least one of the vendors have to be built:
- BLAS(ACCELERATE, OpenNBLAS, BLIS)
- ClBlast
- CUDA (may not run)
To enable the debug, comment out `-DGGML_TUNE_NDEBUG` from Makefile.
Makefile:
``` ```
make clean && make make clean && make
``` ```
CMake (with BLAS): **CMake**
``` ```
cmake --build . --target clean rm -rf build/*
cmake .. -DLLAMA_BLAS=ON cd build
cmake ..
cmake --build . --config Release cmake --build . --config Release
``` ```
Run examples: ## Run main or perplexity
``` ```
# bench and run: # bench and run:
@ -48,21 +87,7 @@ Run examples:
./main -m ./models/3B/open-llama-3b-q4-0.bin -c 512 -b 1024 -n 256 --keep 48 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt -t 4 --tune-file <FILE> ./main -m ./models/3B/open-llama-3b-q4-0.bin -c 512 -b 1024 -n 256 --keep 48 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt -t 4 --tune-file <FILE>
``` ```
# Build the standalone `mulmat-tune` ## Run mulmat-tune tool
Makefile:
```
make clean && make
```
CMake (with BLAS)
```
cmake --build . --target clean
cmake .. -DLLAMA_BLAS=ON
cmake --build . --config Release
```
Run examples:
``` ```
./mulmat-tune -h ./mulmat-tune -h
@ -82,8 +107,8 @@ Run examples:
# customized n_pass: run 1 pass only instead of the default 3. # customized n_pass: run 1 pass only instead of the default 3.
./mulmat-tune --n_pass 1 ./mulmat-tune --n_pass 1
# customized n_threads instead of the default 1. # customized n_threads instead of the default 4.
./mulmat-tune --n_threads 4 ./mulmat-tune --n_threads 6
# save to file # save to file
./mulmat-tune --file <FILE> ./mulmat-tune --file <FILE>
@ -93,9 +118,7 @@ Run examples:
``` ```
# End to End Test ## Example: compare With Master
## Compare With Master
You may want to run the following commands. Make sure the tune result file is You may want to run the following commands. Make sure the tune result file is
setup properly. setup properly.
@ -103,7 +126,7 @@ setup properly.
General steps: General steps:
1. run `./mulmat-tune -h` to see how to build for misc vendors. 1. run `./mulmat-tune -h` to see how to build for misc vendors.
To enable the debug, comment out `-DGGML_TUNE_NDEBUG` from makefile then run: then run:
``` ```
make clean; make make clean; make