Commit graph

517 commits

Author SHA1 Message Date
xaedes
3dbd649cf9
fix diag_mask to work with non-inplace input 2023-05-01 14:43:48 +02:00
xaedes
b9920e5c3e
test-grad0 : fix test for div
nargs and ndims was swapped, corrupting the stack
2023-05-01 14:43:48 +02:00
xaedes
19f51592b5
successfully test diag_mask_inf and diag_mask_zero backward 2023-05-01 14:43:48 +02:00
xaedes
d42531fa56
fix comments 2023-05-01 14:43:48 +02:00
xaedes
1997152f7f
test-grad0.c add TODO for view_2d and view_3d
add_at (required for view backward pass) is a bit tricky for n_dims > 1.
2023-05-01 14:43:48 +02:00
xaedes
c601df973c
successfully test transpose backward and permute for all permutations
also test sub, mul and div up to max n_dims
2023-05-01 14:43:47 +02:00
xaedes
3d21f2646e
implement ggml_cont backward pass 2023-05-01 14:43:47 +02:00
xaedes
02d3fd0894
fix sub, mul and div functions to work correctly with transposed tensors
uses the same logic as in add
2023-05-01 14:43:47 +02:00
xaedes
b0555fce95
some minor test-grad0 fixes 2023-05-01 14:43:47 +02:00
xaedes
a7a837047c
successfully test permute backward 2023-05-01 14:43:47 +02:00
xaedes
86b44a02e4
test-grad0.c : add print_elements to help with debugging 2023-05-01 14:43:47 +02:00
xaedes
339b2adf48
fix ggml_forward_add1 functions to work correctly with transposed tensors
uses the same logic as in ggml_compute_forward_add1_q_f32, but make it consistent across all ggml_compute_forward_add1_... functions.
this also slightly changes the mem access pattern of the different threads to works as in ggml_compute_forward_add1_q_f32.
2023-05-01 14:43:47 +02:00
xaedes
b9416d71f8
fix ggml_forward_add functions to work correctly with transposed tensors
uses the same logic as in ggml_compute_forward_add_q_f32, but make it consistent across all ggml_compute_forward_add_... functions.
this also slightly changes the mem access pattern of the different threads to works as in ggml_compute_forward_add_q_f32.
2023-05-01 14:43:46 +02:00
xaedes
410a47a79e
minor code format improvement 2023-05-01 14:43:46 +02:00
xaedes
124fdca973
successfully test view backward 2023-05-01 14:43:46 +02:00
xaedes
cecd6c7665
bug fix for add_at forward
required for view backward pass

src0 values must be copied to dst, because during addition we don't touch all dst elements in contrast to the normal add function.
2023-05-01 14:43:46 +02:00
xaedes
83fa6b3bcb
fix ggml_compute_forward_dup_same_cont for when nelements < nthreads
when more threads are used than elements exist ie1 was less than ie0, resulting in invalid negative byte count argument in memcpy
2023-05-01 14:43:46 +02:00
xaedes
c1a8893de3
de-duplicate ggml_forward_dup code taking care of contiguous tensors of same type.
with this we can duplicate tensor of any typ as long as they are contiguous.
2023-05-01 02:42:27 +02:00
xaedes
38675e537c
add shape annotations for llama 2023-05-01 02:42:13 +02:00
xaedes
93106504fd
align shape annotations 2023-05-01 02:42:05 +02:00
xaedes
fea42be47a
successfully test soft_max backward 2023-05-01 02:41:58 +02:00
xaedes
1a80e9a0fa
correctly implement softmax backward pass using new operation ggml_diag
ggml_diag constructs diagonal matrices with entries.
ggml_diag(shape[a,1,c,d]) -> shape[a,a,c,d]
2023-05-01 02:41:30 +02:00
xaedes
54ab300cc4
add test-opt.c
this uses ggml_opt to train a,b for minimal e=sum(sqr(c - a*b)) for random initial a,b,c
2023-05-01 02:41:30 +02:00
xaedes
ecf949b10f
successfully test reshape backward 2023-05-01 02:41:30 +02:00
xaedes
c483a7dac5
bug fix for reshape backward pass 2023-05-01 02:41:30 +02:00
xaedes
b2bd8222da
successfully test cpy backward 2023-05-01 02:41:30 +02:00
xaedes
0ea8201c86
bug fix for cpy backward pass 2023-05-01 02:41:30 +02:00
xaedes
7571147242
successfully test rope backward 2023-05-01 02:41:30 +02:00
xaedes
b583136cfa
improve performance of sqr backward pass
use scale(x,y) instead of mul(x,repeat(y,x))
2023-05-01 02:41:30 +02:00
xaedes
bfe507213c
improve performance of sum backward pass
use add1(x,y) instead of add(x,repeat(y,x))
2023-05-01 02:41:30 +02:00
xaedes
0197bcb0ff
successfully test scale backward 2023-05-01 02:41:29 +02:00
xaedes
a367eb9eda
bug fix for scale backward pass
use sum instead of mean for gradient of scalar scale parameter
2023-05-01 02:41:29 +02:00
xaedes
671e5922e2
successfully test silu backward 2023-05-01 02:41:29 +02:00
xaedes
6fb08b4554
bug fixes for silu_back 2023-05-01 02:41:29 +02:00
xaedes
9d6fc28f18
disable graph dot export as it floods console 2023-05-01 02:41:29 +02:00
xaedes
9345f4c3a5
test both gradients of mul_mat 2023-05-01 02:41:29 +02:00
xaedes
20e3c1d2b4
use GGML_PRINT_DEBUG for debug messages which will otherwise flood the console 2023-05-01 02:41:29 +02:00
xaedes
0da26753fd
add test-grad0.c 2023-05-01 02:41:29 +02:00
xaedes
4e1f81d32f
implement backward pass for ggml_get_rows and for new operation ggml_get_rows_back 2023-05-01 02:41:29 +02:00
xaedes
488decfdc5
implement backward pass of ggml_rope and ggml_rope_back 2023-05-01 02:41:28 +02:00
xaedes
36d8a051d4
remove already resolved TODO 2023-05-01 02:41:28 +02:00
xaedes
b908007471
norm & rms_norm can not be threaded:
after investigation rms norm for quite some time I come to the conclusion that neither norm, nor rms_norm can be threaded, because we need mean over all items, not just of the slices each thread sees.
2023-05-01 02:41:28 +02:00
xaedes
b164343529
implement 5 of 6 missing backward pass operations used by llama
- GGML_OP_DIAG_MASK_INF
- GGML_OP_GET_ROWS
- GGML_OP_RMS_NORM
- GGML_OP_SILU
- GGML_OP_SOFT_MAX

add necessary ggml operations GGML_OP_ADD1, GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK, GGML_OP_DIAG_MASK_ZERO, and GGML_OP_ROPE_BACK

GGML_OP_ADD1 is necessary to add a scalar value in the backward pass of GGML_OP_SOFT_MAX
GGML_OP_ADD1 could also be replaced by using GGML_OP_ADD and GGML_OP_REPEAT, but the performance would be worse. additionally GGML_OP_REPEAT will return unexpected value when the the input to GGML_OP_SOFT_MAX contains only a single scalar. in this case GGML_OP_REPEAT will not return the value that should be repeated (src1) but the value which shape the result should take (src0). So in this case it can not replace GGML_OP_ADD1.

GGML_OP_SILU_BACK, GGML_OP_RMS_NORM_BACK and GGML_OP_ROPE_BACK are necessary for backward pass of GGML_OP_SILU, GGML_OP_RMS_NORM and GGML_OP_ROPE. The backward pass for these functions cannot be easily composed of existing operations. Since the backward pass builds a computation graph we need operations forward pass implementations of the the required backward passes. Sounds a bit confusing at first, I know...

GGML_OP_DIAG_MASK_ZERO is necessary for backward pass of GGML_OP_DIAG_MASK_INF.

Some operations where previously inplace-only. for backward pass there needs to be non-inplace variants.
staying consistent with other operations that have non-inplace and inplace variants, the operations are changed to non-inplace and
functions with "_inplace" are added which are inplace.
in llama we need to call the inplace variants so that it is implemented as before.
for llama backward pass we need to use the non-inplace variants.

still not completely implemented backward passes for llama:

- GGML_OP_ROPE: needs forward pass for GGML_OP_ROPE_BACK
- GGML_OP_GET_ROWS: only necessary for tokenizer
2023-05-01 02:41:28 +02:00
xaedes
73ac18d856
implement 8 of 14 missing backward pass operations used by llama
- GGML_OP_ADD_AT
- GGML_OP_CPY
- GGML_OP_MUL_MAT (src0.grad)
- GGML_OP_PERMUTE
- GGML_OP_RESHAPE
- GGML_OP_SCALE
- GGML_OP_TRANSPOSE
- GGML_OP_VIEW

implement additional ggml operation GGML_OP_ADD_AT, which is necessary for backward pass of GGML_OP_VIEW.

this operation adds src1 to src0 with data offset, i.e. to view(src0, ..., offset).
the values are return in a tensor size of src0. values outside of [data+offset:data+offset+nbytes(src1)] are just the original values from src0.

still missing backward passes for llama:

- GGML_OP_DIAG_MASK_INF
- GGML_OP_GET_ROWS
- GGML_OP_RMS_NORM
- GGML_OP_ROPE
- GGML_OP_SILU
- GGML_OP_SOFT_MAX
2023-05-01 02:41:28 +02:00
Georgi Gerganov
7ff0dcd320
ggml : fix UB (int << 31) 2023-04-30 22:28:51 +03:00
Pavol Rusnak
6f79699286
build: add armv{6,7,8} support to cmake (#1251)
- flags copied from Makefile
- updated comments in both CMakeLists.txt and Makefile to match reality
2023-04-30 20:48:38 +02:00
jon-chuang
a5d30b1f53
common : better default number of threads (#934)
* commit

* fix

* try-catch

* apply code review

* improve

* improve

* add macos headers

* done

* remove color

* fix windows

* minor

* fix

* Apply suggestions from code review

Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>

* remove

* minor

* minor

---------

Co-authored-by: jon-chuang <jon-chuang@users.noreply.github.com>
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-04-30 21:41:35 +03:00
0cc4m
76a884920a
ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225)
* Implement q5_0, q5_1 and q8_0

* Work around q5_0 OpenCL issue

* Fix q8_0 dequant kernel

* Move cl kernels into ggml-opencl.c

* Use two memcpy calls for q5_0 buffer transfer
2023-04-30 21:34:52 +03:00
Georgi Gerganov
6bc4400e67
ggml : add Q5 WASM SIMD + GGML_FTYPE 2023-04-30 19:07:43 +03:00
Stephan Walter
f0d70f147d
Various fixes to mat_mul benchmark (#1253) 2023-04-30 12:32:37 +00:00