Commit graph

2762 commits

Author SHA1 Message Date
Julia Longtin
270204e57b fix loop. 2024-06-09 18:03:01 +00:00
Julia Longtin
dda250f637 move sub earlier, and move the compare of iterations to outside, and at the end of the loop. 2024-06-09 18:03:01 +00:00
Julia Longtin
f555f9d075 spacing and comment changes. 2024-06-09 18:03:01 +00:00
Julia Longtin
204bc1ffdc remove useless prefetches. 2024-06-09 18:03:01 +00:00
Julia Longtin
d8d574c56f perform better prefetches, and invert the test of our clear flag for clarity. 2024-06-09 18:03:01 +00:00
Julia Longtin
a14fe02cf8 use vbroadcastss in place of vbroadcast32x4. 2024-06-09 18:03:01 +00:00
Julia Longtin
b1c9622d9e Use a vectorized assembly function to handle remaining chunks less than vector wide. 2024-06-09 18:03:01 +00:00
Julia Longtin
6e0258abac broadcast a single int8, instead of 4 of them. 2024-06-09 18:03:01 +00:00
Julia Longtin
664a6025a1 use different restrict syntax, to make g++ happy. 2024-06-09 18:03:01 +00:00
Julia Longtin
2cf193efc0 fix typo 2024-06-09 18:03:01 +00:00
Julia Longtin
c39fa8b6b8 remove a warning. 2024-06-09 18:03:01 +00:00
Julia Longtin
9fa06f4767 add batch fp16<->fp32 conversion functions. 2024-06-09 18:02:57 +00:00
Julia Longtin
1c2fdc3412 minor spacing and comment changes. 2024-06-09 18:01:49 +00:00
Julia Longtin
54f181d24a spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. 2024-06-09 18:01:49 +00:00
Julia Longtin
9a799ebdae spacing and capitalization changes. 2024-06-09 18:01:49 +00:00
Julia Longtin
0124f7acd8 use or, instead of and. bug fix? 2024-06-09 18:01:49 +00:00
Julia Longtin
dc1f639bf0 comment and spacing fixes. 2024-06-09 18:01:49 +00:00
Julia Longtin
4fb1547ba6 fix an offset error, and get rid of tabs. 2024-06-09 18:01:49 +00:00
Julia Longtin
e37b7f8497 fix some small errors. 2024-06-09 18:01:49 +00:00
Julia Longtin
c3d438bce2 further optimizations. 0.99 tokens per second. 2024-06-09 18:01:49 +00:00
Julia Longtin
d966ac2ebe replace tabs with spaces. 2024-06-09 18:01:49 +00:00
Julia Longtin
fb83cd987d reformat, and label what these files are. 2024-06-09 18:01:49 +00:00
Julia Longtin
b8abefbec6 use GGML_F32_EPR, and remove some dead code. 2024-06-09 18:01:49 +00:00
Julia Longtin
f84859a926 whoops. missing tab. 2024-06-09 18:01:49 +00:00
Julia Longtin
ded4da4518 add Makefile rule for generation .s file, for manual inspection. 2024-06-09 18:01:49 +00:00
Julia Longtin
aeb5ae85ad formatting changes. 2024-06-09 18:01:49 +00:00
Julia Longtin
3ff09248ff indent headers consistently. 2024-06-09 18:01:49 +00:00
Julia Longtin
3cf6eb0cc0 formatting. 2024-06-09 18:01:49 +00:00
Julia Longtin
90498c1181 minor changes. 2024-06-09 18:01:49 +00:00
Julia Longtin
33cc1d8c8e massively rewrite assembly routines. 2024-06-09 18:01:49 +00:00
Julia Longtin
20c2bc53f9 fix vector sizes. 2024-06-09 18:01:49 +00:00
Julia Longtin
2a47e5f05f separate filling aux16 from consuming aux16 by making it an array of vectors. 2024-06-09 18:01:49 +00:00
Julia Longtin
e579af1e95 loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors. 2024-06-09 18:01:49 +00:00
Julia Longtin
1c182a3896 promote aux8 into a vector. 2024-06-09 18:01:49 +00:00
Julia Longtin
3fef54f5ce fix our reference to src in the second place, and use a more accurate comment. 2024-06-09 18:01:49 +00:00
Julia Longtin
3cdfc9c596 spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src. 2024-06-09 18:01:49 +00:00
Julia Longtin
98c9b6972a better comments, and fix some small errors. 2024-06-09 18:01:49 +00:00
Julia Longtin
0c01d07835 perform 16 operations at a time. 2024-06-09 18:01:49 +00:00
Julia Longtin
d34e0ff835 use proper mov operator, and pass addresses. 2024-06-09 18:01:49 +00:00
Julia Longtin
e3468e041b attempt our first FMA. 2024-06-09 18:01:49 +00:00
Julia Longtin
da69ed5b3a add I32 vector memory clearing. 2024-06-09 18:01:49 +00:00
Julia Longtin
10237df57a promote aux32 to a vector. 2024-06-09 18:01:49 +00:00
Julia Longtin
3c29fd57ce add missing address of operators. 2024-06-09 18:01:49 +00:00
Julia Longtin
45c94bd89d promote aux16 to a vector. 2024-06-09 18:01:49 +00:00
Julia Longtin
31b8a5afd7 use quotes properly. 2024-06-09 18:01:49 +00:00
Julia Longtin
ed639a6cf9 use better memory save operator. 2024-06-09 18:01:49 +00:00
Julia Longtin
5c010f761f expand mask, and align memory. 2024-06-09 18:01:49 +00:00
Julia Longtin
7a00422fa3 try to use vectorized zeroing function. 2024-06-09 18:01:49 +00:00
Julia Longtin
2870bfc6dd add missing variable. 2024-06-09 18:01:48 +00:00
Julia Longtin
656bf28c91 copy right block. 2024-06-09 18:01:48 +00:00