Georgi Gerganov
4530d5c3c4
Merge branch 'master' into clblast-llama-cpp
2023-04-28 17:56:36 +03:00
Folko-Ven
78ec543733
Correcting link to w64devkit ( #1214 )
...
Correcting link to w64devkit (change seeto to skeeto).
2023-04-28 16:22:48 +02:00
Johannes Gäßler
92a6e13a31
Add Manjaro CUDA include and lib dirs to Makefile ( #1212 )
2023-04-28 15:40:32 +02:00
Yann Follet
04aaae1d79
add avx2 for dot_q8_0_q8_0, 2x faster than scalar ( #1211 )
2023-04-28 11:59:48 +00:00
0cc4m
bbfba5f740
Fix import cl file name
2023-04-27 15:30:30 +02:00
0cc4m
96346fb2a4
Rename dequant kernels file to conform with other file names
2023-04-27 15:27:22 +02:00
0cc4m
fafebff53c
Make globals static, fix indentation
2023-04-27 15:25:09 +02:00
Stephan Walter
0b2da20538
ggml : slightly faster AVX2 implementation for Q5 ( #1197 )
2023-04-26 23:26:42 +03:00
Georgi Gerganov
f9be42add0
readme : add quantization info
2023-04-26 23:24:42 +03:00
Georgi Gerganov
574406dc7e
ggml : add Q5_0 and Q5_1 quantization ( #1187 )
...
* ggml : add Q5_0 quantization (cuBLAS only)
* ggml : fix Q5_0 qh -> uint32_t
* ggml : fix q5_0 histogram stats
* ggml : q5_0 scalar dot product
* ggml : q5_0 ARM NEON dot
* ggml : q5_0 more efficient ARM NEON using uint64_t masks
* ggml : rename Q5_0 -> Q5_1
* ggml : adding Q5_0 mode
* quantize : add Q5_0 and Q5_1 to map
* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195 )
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-26 23:14:13 +03:00
Ásgeir Bjarni Ingvarsson
87a6f846d3
Allow setting the rng seed after initialization. ( #1184 )
...
The llama_set_state_data function restores the rng state to what it
was at the time llama_copy_state_data was called. But users may want
to restore the state and proceed with a different seed.
2023-04-26 22:08:43 +02:00
DaniAndTheWeb
ea3ad7eb60
Updating build instructions to include BLAS support ( #1183 )
...
* Updated build information
First update to the build instructions to include BLAS.
* Update README.md
* Update information about BLAS
* Better BLAS explanation
Adding a clearer BLAS explanation and adding a link to download the CUDA toolkit.
* Better BLAS explanation
* BLAS for Mac
Specifying that BLAS is already supported on Macs using the Accelerate Framework.
* Clarify the effect of BLAS
* Windows Make instructions
Added the instructions to build with Make on Windows
* Fixing typo
* Fix trailing whitespace
2023-04-26 22:03:03 +02:00
0cc4m
4a35ec9df5
First check error, then release event
2023-04-26 19:56:58 +02:00
Pavol Rusnak
859fee6dfb
quantize : use map
to assign quantization type from string
( #1191 )
...
instead of `int` (while `int` option still being supported)
This allows the following usage:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`
instead of:
`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-26 18:43:27 +02:00
0cc4m
ce97a807cb
Simplify code, fix include
2023-04-26 18:39:04 +02:00
0cc4m
b746458281
Use c compiler for opencl files
2023-04-26 18:38:31 +02:00
0cc4m
2b0c6a56f9
Improve code quality
...
* Move internal stuff out of header
* Use internal enums instead of CLBlast enums
* Remove leftover C++ includes and defines
* Make event use easier to read
Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-04-26 07:48:04 +02:00
Stephan Walter
4afcc37869
Update SHA256SUMS after quantization change ( #1181 )
...
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:41:56 +02:00
ostix360
667c501334
py : cast lora_alpha to int in convert-lora-to-ggml ( #1170 )
...
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:33:08 +02:00
Pavol Rusnak
bb98e77be7
nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py ( #981 )
2023-04-25 23:19:57 +02:00
Georgi Gerganov
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) ( #1179 )
...
* ggml : add Q8_0 quantization format (rename the old one to Q8_1)
* tests : fix test-quantize-fns
* ggml : finalize Q8_0 implementation
* ggml : use q4_0_q8_0 and q4_2_q8_0
* ggml : fix Q8_0 dot product bug (ARM)
* ggml : Q8_0 unroll x2
* ggml : fix bug - using wrong block type
* ggml : extend quantize_fns_t with "vec_dot_type"
* ggml : fix Q8_0 to use 255 values out of 256
* ggml : fix assert using wrong QK4_2 instead of QK4_3
2023-04-25 23:40:51 +03:00
0cc4m
137071003c
Improve btype dequant kernel selection code, add error if type is unsupported
2023-04-25 19:40:54 +02:00
unbounded
dd0eabc049
ggml : use full range for Q4_0 and Q4_2 quantization ( #729 )
...
* Use full range for q4_0 quantization
By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.
* Update quantize_row_q4_0 for AVX/AVX2
* Update quantize_row_q4_0 for WASM
Untested
* Update quantize_row_q4_0 for Arm NEON
* Update quantize_row_q4_0 for PowerPC
Untested
* Use full range for q4_2 quantization
2023-04-25 20:20:46 +03:00
0cc4m
36bfb3c158
Fix typos, use GGML_TYPE defines, improve code
2023-04-25 18:43:31 +02:00
xaedes
54bb60e268
ggml : fix bug in ggml_compute_forward_sum_f32 ( #1162 )
...
The sum over all rows is now computed instead of just the last row
2023-04-24 23:02:02 +02:00
0cc4m
daa5df51f7
Replace buffer pool with static buffers a, b, qb, c
...
Fix compile warnings
2023-04-24 22:12:02 +02:00
0cc4m
ae73887fb9
Add CLBlast to CMakeLists.txt
2023-04-24 22:10:31 +02:00
0cc4m
18cc05bde4
Fix cast in opencl kernels
2023-04-24 22:10:31 +02:00
0cc4m
8603c25e3c
Fix device selection env variable names
2023-04-24 22:10:31 +02:00
0cc4m
f469d9afa0
Double CLBlast speed by disabling OpenBLAS thread workaround
...
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-04-24 22:10:29 +02:00
0cc4m
309af7fce9
Add q4_2 and q4_3 CLBlast support, improve code
2023-04-24 22:09:08 +02:00
0cc4m
1b16b8c90d
Move CLBlast implementation to separate file
...
Add buffer reuse code (adapted from slaren's cuda implementation)
2023-04-24 22:09:08 +02:00
0cc4m
6f66870726
Finish merge of ClBlast support
2023-04-24 22:09:08 +02:00
0cc4m
b7143c1a2e
Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers
2023-04-24 22:09:08 +02:00
0cc4m
a908c37ce9
Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing
2023-04-24 22:09:08 +02:00
Georgi Gerganov
8a0f8673ba
ggml : export symbols ( #1155 )
2023-04-24 22:18:25 +03:00
xaedes
0c5692345d
examples : add save_load_state example ( #1150 )
...
* add save_load_state example
* use <cstdio> instead of <iostream> and fprintf / printf instead of cout
* renamed save-load-state example files replacing underscores by dashes
2023-04-24 19:23:31 +03:00
Georgi Gerganov
957c8ae21d
llama : increase scratch buffer size for 65B (ref #1152 )
...
Temporary solution
2023-04-24 18:47:30 +03:00
mgroeber9110
9b0a4d4214
examples/main README improvements and some light refactoring ( #1131 )
2023-04-24 15:45:32 +00:00
Stephan Walter
2ec83428de
Fix build for gcc 8 and test in CI ( #1154 )
2023-04-24 15:38:26 +00:00
slaren
e4cf982e0d
Fix cuda compilation ( #1128 )
...
* Fix: Issue with CUBLAS compilation error due to missing -fPIC flag
---------
Co-authored-by: B1gM8c <89020353+B1gM8c@users.noreply.github.com>
2023-04-24 17:29:58 +02:00
Georgi Gerganov
c4fe84fb0d
llama : refactor get / set state + remove redundant kv cache API ( #1143 )
2023-04-24 07:40:02 +03:00
slaren
1d78fecdab
Fix LoRA acronym ( #1145 )
2023-04-23 23:03:44 +02:00
Georgi Gerganov
284685f169
scripts : add helper scripts to synch ggml repo
2023-04-23 19:57:09 +03:00
DannyDaemonic
edce63baa9
Added README.md for main with examples and explanations ( #1139 )
2023-04-23 15:37:02 +00:00
Georgi Gerganov
ec9cdb6752
ggml : do not print perf ops that have not been used at all
2023-04-23 18:32:52 +03:00
Georgi Gerganov
e4422e299c
ggml : better PERF prints + support "LLAMA_PERF=1 make"
2023-04-23 18:15:39 +03:00
Stephan Walter
53c8434398
Improve AVX2 for vec_dot_q4_3_q8_0 ( #1138 )
2023-04-23 11:01:03 +00:00
Pavol Rusnak
c6524f46eb
readme : update gpt4all instructions ( #980 )
2023-04-23 10:21:26 +02:00
Yishuo Wang
c9e2c26f41
A better packNibbles
and mul_sum_i8_pairs_float
implementation using AVX512 ( #1119 )
2023-04-23 07:57:05 +00:00