Commit graph

604 commits

Author SHA1 Message Date
Concedo
811989c2ad fixed pyinstaller 2023-04-22 16:31:42 +08:00
Concedo
1b7aa2b815 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
2023-04-22 16:22:08 +08:00
Howard Su
7e312f165c
cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100)
* Fix build under Windows when enable BUILD_SHARED_LIBS

* Make AVX512 test on Windows to build the shared libs
2023-04-22 11:18:20 +03:00
Georgi Gerganov
872c365a91 ggml : fix AVX build + update to new Q8_0 format 2023-04-22 11:08:12 +03:00
Concedo
1ea0e15292 Merge branch 'master' into concedo
# Conflicts:
#	llama.cpp
2023-04-22 16:07:27 +08:00
Concedo
b53c5e7c80 update kobold lite 2023-04-22 16:04:43 +08:00
Georgi Gerganov
955ef9a5d5
ggml : alternative Q4_3 implementation using modified Q8_0 (#1109)
* ggml : prefer vzip to vuzp

This way we always use the same type of instruction across all quantizations

* ggml : alternative Q4_3 implementation using modified Q8_0

* ggml : fix Q4_3 scalar imlpementation

* ggml : slight improvement of Q4_3 - no need for loop unrolling

* ggml : fix AVX paths for Q8_0 quantization
2023-04-22 10:55:35 +03:00
Stephan Walter
c5aa5e5777
ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099)
* AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring

* finish AVX vectorization of quantize_row_q8_0

* Rename hsum_int_8 to hsum_i32_8
2023-04-22 10:37:05 +03:00
Clint Herron
e9a9cb0c54
examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107)
* Moving parameters to separate lines for readability.

* Increasing repeate_penalty to 1.1 to make alpaca more usable by default.

* Adding trailing newline.
2023-04-22 09:54:33 +03:00
xaedes
b6e7f9b09e
llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105)
* reserve correct size for logits

* add functions to get and set the whole llama state:

including rng, logits, embedding and kv_cache

* remove unused variables

* remove trailing whitespace

* fix comment
2023-04-22 09:21:32 +03:00
Concedo
6e908c1792 added lora support 2023-04-22 12:29:38 +08:00
Concedo
c454f8b848 Gpt NeoX / Pythia integration completed 2023-04-22 11:23:25 +08:00
Concedo
7b3d04e5d4 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
2023-04-22 10:58:16 +08:00
Concedo
4fa3dfe8bc just doesn't work properly on windows. will leave it as a manual flag for others 2023-04-22 10:57:38 +08:00
slaren
50cb666b8a
Improve cuBLAS performance by using a memory pool (#1094)
* Improve cuBLAS performance by using a memory pool

* Move cuda specific definitions to ggml-cuda.h/cu

* Add CXX flags to nvcc

* Change memory pool synchronization mechanism to a spin lock
General code cleanup
2023-04-21 21:59:17 +02:00
apaz
25d7abbd1f
llama : fixed rlimit error message (#888) 2023-04-21 21:48:06 +03:00
源文雨
018f2279f5
cmake : link threads publicly to ggml (#1042)
* fix: ld link test-tokenizer-0 error

```
cmake3 --build . --config Release
[  5%] Built target ggml
[ 16%] Built target llama
[ 22%] Linking CXX executable ../bin/test-tokenizer-0
../libllama.a(ggml.c.o):在函数‘ggml_graph_compute’中:
ggml.c:(.text+0xf2db):对‘pthread_create’未定义的引用
ggml.c:(.text+0xf9d4):对‘pthread_join’未定义的引用
collect2: error: ld returned 1 exit status
gmake[2]: *** [bin/test-tokenizer-0] 错误 1
gmake[1]: *** [tests/CMakeFiles/test-tokenizer-0.dir/all] 错误 2
gmake: *** [all] 错误 2
```

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt
2023-04-21 21:27:06 +03:00
Alex Klinkhamer
9411288271
main : evaluate tokens in batches after swapping context (#1014)
* examples : evaluate tokens in batches after swapping context

* Update examples/main/main.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-21 21:18:09 +03:00
Concedo
ef13443047 wip pythia integration 2023-04-22 01:08:23 +08:00
Concedo
68898046c2 accidentally added the binaries onto repo again. 2023-04-22 00:41:19 +08:00
Concedo
cee018960e Merge branch 'master' into concedo_experimental 2023-04-22 00:19:50 +08:00
xaedes
8687c1f258
llama : remember and restore kv cache data pointers (#1104)
because their value is stored in buf and overwritten by memcpy
2023-04-21 18:25:21 +03:00
Concedo
f555db44ec adding the libraries for cublas first. but i cannot get the kernel to work yet 2023-04-21 23:24:09 +08:00
Kawrakow
1bfc153e2f
ggml : a faster version for Q4_1 x Q8_0 dot products (#1083)
* A faster version for Q4_1 x Q8_0 dot products

The idea nehind being that Q8_0 quantized
values get used many times in the matrix multiplications
where they are involved. In the current implementations,
when we are evaluating the dot products, we need to compute
the sum of the quants in the Q8_0 vector, so the same
operation is repeated many times. Here we pre-compute
the sum during Q8_0 quantization, store it in the
now modified block_q8_0 struct, and then reuse this
result in the subsequent dot products.

In a synthetic benchmark (just compute a bunch of dot
products), this change speeds up the Q4_1 * Q8_0 dot
product by 80%, making the performance identical to
Q4_0 * Q8_0.

In practical application, I see a ~15% gain in speed for
token prediction on M2, and ~5% gain on Ryzen 7950X.
The speed gain in the prompt evaluation is much bigger
(around 50%).

I have only done the change for the scalar version,
ARM_NEON, and AVX2, so we still need an AVX implementation.

* Cleaning up

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-04-21 18:18:26 +03:00
Concedo
794a38a2e8 Revert "cublas is not feasible at this time. removed for now"
This reverts commit 3687db7cf7.
2023-04-21 21:02:40 +08:00
slaren
3d59769c3b
Show perplexity ETA in hours and minutes (#1096) 2023-04-21 14:57:57 +02:00
Concedo
5160053e51 merged llama adapter into the rest of the gpt adapters 2023-04-21 17:47:48 +08:00
Concedo
82d74ca1a6 Merge branch 'master' into concedo
# Conflicts:
#	.github/workflows/build.yml
2023-04-21 16:24:30 +08:00
Concedo
3687db7cf7 cublas is not feasible at this time. removed for now 2023-04-21 16:14:23 +08:00
Georgi Gerganov
d40fded93e
llama : fix comment for "output.weight" tensor 2023-04-21 10:24:02 +03:00
Stephan Walter
2510c1831f
Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088)
* Add ggml-model-*.bin checksums for 7B, 13B, 30B
* Add ggml-model-*.bin checksums for 65B

---------

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-20 23:56:44 +02:00
Georgi Gerganov
12b5900dbc
ggml : sync ggml (add GPT-NeoX RoPE implementation) 2023-04-20 23:32:59 +03:00
Georgi Gerganov
9ff334f3c9
ggml : fix bug in ggml_compute_forward_dup_f32() 2023-04-20 21:58:38 +03:00
slaren
2005469ea1
Add Q4_3 support to cuBLAS (#1086) 2023-04-20 20:49:53 +02:00
Georgi Gerganov
8a1756abdf
ggml : do not break cuBLAS build (Q4_3 is not yet implemented) 2023-04-20 21:43:50 +03:00
Georgi Gerganov
66aab46079
ggml : fix Q4_3 quantization
Broke it during conflict resolution in last PR
2023-04-20 20:44:05 +03:00
Kawrakow
38de86a711
llama : multi-threaded quantization (#1075)
* Multi-threading quantization.

Not much gain for simple quantizations, bit it will be important
for quantizations that require more CPU cycles.

* Multi-threading for quantize-stats

It now does the job in ~14 seconds on my Mac for
Q4_0, Q4_1 and Q4_2. Single-threaded it was taking
more than 2 minutes after adding the more elaborate
version of Q4_2.

* Reviewer comments

* Avoiding compiler confusion

After changing chunk_size to const int as suggested by
@ggerganov, clang and GCC starting to warn me that I don't
need to capture it in the lambda. So, I removed it from the
capture list. But that makes the MSVC build fail. So,
making it a constexpr to make every compiler happy.

* Still fighting with lambda captures in MSVC

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-20 20:42:27 +03:00
Georgi Gerganov
e0305ead3a
ggml : add Q4_3 quantization (#1082) 2023-04-20 20:35:53 +03:00
Concedo
07bb31b034 wip dont use 2023-04-21 00:35:54 +08:00
Ivan Komarov
6a9661ea5a
ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074)
[Accelerate](https://developer.apple.com/documentation/accelerate) is an Apple framework which can only be used on macOS, and the CMake build [ignores](https://github.com/ggerganov/llama.cpp/blob/master/CMakeLists.txt#L102) the `LLAMA_ACCELERATE` variable when run on non-Apple platforms. This implies setting `LLAMA_ACCELERATE` is a no-op on Ubuntu and can be removed.

This will reduce visual noise in CI check results (in addition to reducing the number of checks we have to run for every PR). Right now every sanitized build is duplicated twice for no good reason (e.g., we have `CI / ubuntu-latest-cmake-sanitizer (ADDRESS, Debug, ON)` and `CI / ubuntu-latest-cmake-sanitizer (ADDRESS, Debug, OFF)`).
2023-04-20 18:15:18 +03:00
Concedo
7ba36c2c6c trying to put out penguin based fires. sorry for inconvenience 2023-04-20 23:15:07 +08:00
源文雨
5addcb120c
fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 2023-04-20 15:28:43 +02:00
Concedo
49697d86d8 adjusted down the buf memory allocation now that realloc seems to work 2023-04-20 17:51:13 +08:00
Concedo
4605074245 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	ggml.c
2023-04-20 17:30:54 +08:00
Concedo
3e88616439 fixed WONKY CODE 2023-04-20 16:41:32 +08:00
Concedo
0b08ec7c5d forgot to remove this 2023-04-20 16:28:47 +08:00
Concedo
346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 2023-04-20 15:53:55 +08:00
Stephan Walter
c8c2c52482
AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) 2023-04-20 08:45:41 +02:00
Concedo
93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports 2023-04-20 12:23:54 +08:00
Gustavo Rocha Dias
5ca2d774cc
doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92)
the dynamic libraries also need to be updated if you replace the import libraries
2023-04-20 12:20:11 +08:00