Commit graph

529 commits

Author SHA1 Message Date
xaedes
4e1f81d32f
implement backward pass for ggml_get_rows and for new operation ggml_get_rows_back 2023-05-01 02:41:29 +02:00
xaedes
488decfdc5
implement backward pass of ggml_rope and ggml_rope_back 2023-05-01 02:41:28 +02:00
xaedes
36d8a051d4
remove already resolved TODO 2023-05-01 02:41:28 +02:00
xaedes
b908007471
norm & rms_norm can not be threaded:
after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees.
2023-05-01 02:41:28 +02:00
xaedes
b164343529
implement 5 of 6 missing backward pass operations used by llama
- GGML_OP_DIAG_MASK_INF
- GGML_OP_GET_ROWS
- GGML_OP_RMS_NORM
- GGML_OP_SILU
- GGML_OP_SOFT_MAX

add necessary ggml operations GGML_OP_ADD1, GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK, GGML_OP_DIAG_MASK_ZERO, and GGML_OP_ROPE_BACK

GGML_OP_ADD1 is necessary to add a scalar value in the backward pass of GGML_OP_SOFT_MAX
GGML_OP_ADD1 could also be replaced by using GGML_OP_ADD and GGML_OP_REPEAT, but the performance would be worse. additionally GGML_OP_REPEAT will return unexpected value when the the input to GGML_OP_SOFT_MAX contains only a single scalar. in this case GGML_OP_REPEAT will not return the value that should be repeated (src1) but the value which shape the result should take (src0). So in this case it can not replace GGML_OP_ADD1.

GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK and GGML_OP_ROPE_BACK are necessary for backward pass of GGML_OP_SILU, GGML_OP_RMS_NORM and GGML_OP_ROPE. The backward pass for these functions cannot be easily composed of existing operations. Since the backward pass builds a computation graph we need operations forward pass implementations of the the required backward passes. Sounds a bit confusing at first, I know...

GGML_OP_DIAG_MASK_ZERO is necessary for backward pass of GGML_OP_DIAG_MASK_INF.

Some operations where previously inplace-only. for backward pass there needs to be non-inplace variants.
staying consistent with other operations that have non-inplace and inplace variants, the operations are changed to non-inplace and
functions with "_inplace" are added which are inplace.
in llama we need to call the inplace variants so that it is implemented as before.
for llama backward pass we need to use the non-inplace variants.

still not completely implemented backward passes for llama:

- GGML_OP_ROPE: needs forward pass for GGML_OP_ROPE_BACK
- GGML_OP_GET_ROWS: only necessary for tokenizer
2023-05-01 02:41:28 +02:00
xaedes
73ac18d856
implement 8 of 14 missing backward pass operations used by llama
- GGML_OP_ADD_AT
- GGML_OP_CPY
- GGML_OP_MUL_MAT (src0.grad)
- GGML_OP_PERMUTE
- GGML_OP_RESHAPE
- GGML_OP_SCALE
- GGML_OP_TRANSPOSE
- GGML_OP_VIEW

implement additional ggml operation GGML_OP_ADD_AT, which is necessary for backward pass of GGML_OP_VIEW.

this operation adds src1 to src0 with data offset, i.e. to view(src0, ..., offset).
the values are return in a tensor size of src0. values outside of [data+offset:data+offset+nbytes(src1)] are just the original values from src0.

still missing backward passes for llama:

- GGML_OP_DIAG_MASK_INF
- GGML_OP_GET_ROWS
- GGML_OP_RMS_NORM
- GGML_OP_ROPE
- GGML_OP_SILU
- GGML_OP_SOFT_MAX
2023-05-01 02:41:28 +02:00
Georgi Gerganov
7ff0dcd320
ggml : fix UB (int << 31) 2023-04-30 22:28:51 +03:00
Pavol Rusnak
6f79699286
build: add armv{6,7,8} support to cmake (#1251)
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
2023-04-30 20:48:38 +02:00
jon-chuang
a5d30b1f53
common : better default number of threads (#934)
* commit

* fix

* try-catch

* apply code review

* improve

* improve

* add macos headers

* done

* remove color

* fix windows

* minor

* fix

* Apply suggestions from code review

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

* remove

* minor

* minor

---------

Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
0cc4m
76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
* Implement q5_0, q5_1 and q8_0

* Work around q5_0 OpenCL issue

* Fix q8_0 dequant kernel

* Move cl kernels into ggml-opencl.c

* Use two memcpy calls for q5_0 buffer transfer
2023-04-30 21:34:52 +03:00
Georgi Gerganov
6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE 2023-04-30 19:07:43 +03:00
Stephan Walter
f0d70f147d
Various fixes to mat_mul benchmark (#1253) 2023-04-30 12:32:37 +00:00
Georgi Gerganov
3e5aa8a1c4
ggml : fix labels for GGML_OP_ALIBI 2023-04-30 10:25:46 +03:00
Georgi Gerganov
c3ca7a5f05
ggml : fix 32-bit ARM NEON 2023-04-29 21:34:23 +03:00
Georgi Gerganov
e8c051611a
ggml : use vzip instead of vuzp for consistency 2023-04-29 21:12:56 +03:00
Georgi Gerganov
0b5a935099
ggml : fix visibility and unused warnings 2023-04-29 19:28:36 +03:00
Georgi Gerganov
ec728e44d7
ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) 2023-04-29 18:43:42 +03:00
Georgi Gerganov
214b6a3570
ggml : adjust mul_mat_f16 work memory (#1226)
* llama : minor - remove explicity int64_t cast

* ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS

* ggml : add asserts to guard for incorrect wsize
2023-04-29 18:43:28 +03:00
Georgi Gerganov
305eb5afd5
build : fix reference to old llama_util.h 2023-04-29 13:53:12 +03:00
Georgi Gerganov
84ca9c2ecf
examples : fix save-load-state + rename llama-util.h 2023-04-29 13:48:11 +03:00
Georgi Gerganov
334637e43e
common : change default parameters to pre-#1126 (#1223) 2023-04-29 09:51:06 +03:00
Ivan Stepanov
dd7eff57d8
llama : new sampling algorithms (#1126)
* Sample interface, new samplers.

New samplers:
- locally typical sampling
- tail free sampling
- frequency and presence penalty
- mirostat

Ignore EOS fix: -inf should be used.

* mirostat

* Added --logit-bias and --no-penalize-nl, removed std::span

* Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

Use C++11, clarify llama API documentation, rename Mirostat parameters to --mirostat_lr and --mirostat_ent, add temperature sampling for Mirostat, simplify Mirostat sampling API parameters (removed N and *k)

* Save and load example adjust

* Tests

* Windows build fix

* Windows test fix
2023-04-29 08:34:41 +03:00
slaren
7fc50c051a
cuBLAS: use host pinned memory and dequantize while copying (#1207)
* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
2023-04-29 02:04:18 +02:00
Henri Vasserman
b1ee8f59b4
cuBLAS: non-contiguous tensor support (#1215)
* Cuda: non-contiguous tensor support

* remove extra stuff

* rename

* fix error

* more fixes, now OpenBLAS and CLBlast build too

* now then?
2023-04-29 01:31:56 +02:00
Stephan Walter
36d19a603b
Remove Q4_3 which is no better than Q5 (#1218) 2023-04-28 23:10:43 +00:00
Georgi Gerganov
7f15c5c477
readme : update hot topics 2023-04-28 21:32:52 +03:00
Georgi Gerganov
55390bcaf2
ggml : sync ggml (ggml_alibi) 2023-04-28 20:51:05 +03:00
CRD716
5fba3c016b
examples : add Jeopardy example (#1168)
* Basic Setup

* Prevent Results.txt from coming up

* Prefixes, Line separators, etc

* editorcheck

* introduction to give more consistent results

* Basic graph thing

* Grading, ready for testing!

* Y'all ready to get funky?

* fix column removal stuff

* missed a few
2023-04-28 19:13:33 +03:00
Evan Jones
1481a9cf25
llama : add session file format and saved sessions in main (#1169) 2023-04-28 18:59:37 +03:00
Georgi Gerganov
11d902364b
ggml : add helper debug printf in soft_max 2023-04-28 17:59:08 +03:00
0cc4m
7296c961d9
ggml : add CLBlast support (#1164)
* Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing

* Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers

* Finish merge of ClBlast support

* Move CLBlast implementation to separate file

Add buffer reuse code (adapted from slaren's cuda implementation)

* Add q4_2 and q4_3 CLBlast support, improve code

* Double CLBlast speed by disabling OpenBLAS thread workaround

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>

* Fix device selection env variable names

* Fix cast in opencl kernels

* Add CLBlast to CMakeLists.txt

* Replace buffer pool with static buffers a, b, qb, c

Fix compile warnings

* Fix typos, use GGML_TYPE defines, improve code

* Improve btype dequant kernel selection code, add error if type is unsupported

* Improve code quality

* Move internal stuff out of header
* Use internal enums instead of CLBlast enums
* Remove leftover C++ includes and defines
* Make event use easier to read

Co-authored-by: Henri Vasserman <henv@hot.ee>

* Use c compiler for opencl files

* Simplify code, fix include

* First check error, then release event

* Make globals static, fix indentation

* Rename dequant kernels file to conform with other file names

* Fix import cl file name

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-28 17:57:16 +03:00
Folko-Ven
78ec543733
Correcting link to w64devkit (#1214)
Correcting link to w64devkit (change seeto to skeeto).
2023-04-28 16:22:48 +02:00
Johannes Gäßler
92a6e13a31
Add Manjaro CUDA include and lib dirs to Makefile (#1212) 2023-04-28 15:40:32 +02:00
Yann Follet
04aaae1d79
add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) 2023-04-28 11:59:48 +00:00
Stephan Walter
0b2da20538
ggml : slightly faster AVX2 implementation for Q5 (#1197) 2023-04-26 23:26:42 +03:00
Georgi Gerganov
f9be42add0
readme : add quantization info 2023-04-26 23:24:42 +03:00
Georgi Gerganov
574406dc7e
ggml : add Q5_0 and Q5_1 quantization (#1187)
* ggml : add Q5_0 quantization (cuBLAS only)

* ggml : fix Q5_0 qh -> uint32_t

* ggml : fix q5_0 histogram stats

* ggml : q5_0 scalar dot product

* ggml : q5_0 ARM NEON dot

* ggml : q5_0 more efficient ARM NEON using uint64_t masks

* ggml : rename Q5_0 -> Q5_1

* ggml : adding Q5_0 mode

* quantize : add Q5_0 and Q5_1 to map

* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195)

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-26 23:14:13 +03:00
Ásgeir Bjarni Ingvarsson
87a6f846d3
Allow setting the rng seed after initialization. (#1184)
The llama_set_state_data function restores the rng state to what it
was at the time llama_copy_state_data was called. But users may want
to restore the state and proceed with a different seed.
2023-04-26 22:08:43 +02:00
DaniAndTheWeb
ea3ad7eb60
Updating build instructions to include BLAS support (#1183)
* Updated build information

First update to the build instructions to include BLAS.

* Update README.md

* Update information about BLAS

* Better BLAS explanation

Adding a clearer BLAS explanation and adding a link to download the CUDA toolkit.

* Better BLAS explanation

* BLAS for Mac

Specifying that BLAS is already supported on Macs using the Accelerate Framework.

* Clarify the effect of BLAS

* Windows Make instructions

Added the instructions to build with Make on Windows

* Fixing typo

* Fix trailing whitespace
2023-04-26 22:03:03 +02:00
Pavol Rusnak
859fee6dfb
quantize : use map to assign quantization type from string (#1191)
instead of `int` (while `int` option still being supported)

This allows the following usage:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0`

instead of:

`./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-26 18:43:27 +02:00
Stephan Walter
4afcc37869
Update SHA256SUMS after quantization change (#1181)
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:41:56 +02:00
ostix360
667c501334
py : cast lora_alpha to int in convert-lora-to-ggml (#1170)
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-25 23:33:08 +02:00
Pavol Rusnak
bb98e77be7
nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) 2023-04-25 23:19:57 +02:00
Georgi Gerganov
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179)
* ggml : add Q8_0 quantization format (rename the old one to Q8_1)

* tests : fix test-quantize-fns

* ggml : finalize Q8_0 implementation

* ggml : use q4_0_q8_0 and q4_2_q8_0

* ggml : fix Q8_0 dot product bug (ARM)

* ggml : Q8_0 unroll x2

* ggml : fix bug - using wrong block type

* ggml : extend quantize_fns_t with "vec_dot_type"

* ggml : fix Q8_0 to use 255 values out of 256

* ggml : fix assert using wrong QK4_2 instead of QK4_3
2023-04-25 23:40:51 +03:00
unbounded
dd0eabc049
ggml : use full range for Q4_0 and Q4_2 quantization (#729)
* Use full range for q4_0 quantization

By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.

* Update quantize_row_q4_0 for AVX/AVX2

* Update quantize_row_q4_0 for WASM

Untested

* Update quantize_row_q4_0 for Arm NEON

* Update quantize_row_q4_0 for PowerPC

Untested

* Use full range for q4_2 quantization
2023-04-25 20:20:46 +03:00
xaedes
54bb60e268
ggml : fix bug in ggml_compute_forward_sum_f32 (#1162)
The sum over all rows is now computed instead of just the last row
2023-04-24 23:02:02 +02:00
Georgi Gerganov
8a0f8673ba
ggml : export symbols (#1155) 2023-04-24 22:18:25 +03:00
xaedes
0c5692345d
examples : add save_load_state example (#1150)
* add save_load_state example

* use <cstdio> instead of <iostream> and fprintf / printf instead of cout

* renamed save-load-state example files replacing underscores by dashes
2023-04-24 19:23:31 +03:00
Georgi Gerganov
957c8ae21d
llama : increase scratch buffer size for 65B (ref #1152)
Temporary solution
2023-04-24 18:47:30 +03:00
mgroeber9110
9b0a4d4214
examples/main README improvements and some light refactoring (#1131) 2023-04-24 15:45:32 +00:00