Commit graph

2976 commits

Author SHA1 Message Date
Julia Longtin
f7b062fce9 do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unalligned reads across a 64 byte boundary. 2024-05-13 22:17:32 +00:00
Julia Longtin
464a74f0c0 make offset available in a register. 2024-05-13 22:17:32 +00:00
Julia Longtin
92bd588433 load from identical addresses for low and high side. 2024-05-13 22:17:32 +00:00
Julia Longtin
f0d4f513c6 minor comment fixes. 2024-05-13 22:17:32 +00:00
Julia Longtin
d52607486d make the offset of q4 available. 2024-05-13 22:17:32 +00:00
Julia Longtin
259da93e5a add missing vector. 2024-05-13 22:17:32 +00:00
Julia Longtin
5c76364410 fill and increment r12 and r13. 2024-05-13 22:17:32 +00:00
Julia Longtin
5e7d7ab70d relabel some other labels. 2024-05-13 22:17:32 +00:00
Julia Longtin
939606155a rename some labels. 2024-05-13 22:17:32 +00:00
Julia Longtin
25fc1d669c rename label 1 to 3. 2024-05-13 22:17:32 +00:00
Julia Longtin
8e854f4c3f introduce r10 and r11, for vloadunpackhd. 2024-05-13 22:17:32 +00:00
Julia Longtin
5615c8611f spacing changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
6ed6f2f536 spacing changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
9372048875 add missing jump. 2024-05-13 22:17:32 +00:00
Julia Longtin
41a9ed02f1 look at the right final memory location. 2024-05-13 22:17:32 +00:00
Julia Longtin
e8087c5d60 subtract the correct amount. 2024-05-13 22:17:32 +00:00
Julia Longtin
cfe47d048d change from handling three iterations per loop to four. 2024-05-13 22:17:32 +00:00
Julia Longtin
7819247614 comment clarification. 2024-05-13 22:17:32 +00:00
Julia Longtin
fc828b46df correct a comment, and use jz when comparing to zero. 2024-05-13 22:17:32 +00:00
Julia Longtin
f1af881a23 use values inside of the loop as soon as we have them. 2024-05-13 22:17:32 +00:00
Julia Longtin
6bd8dcb282 fix loop. 2024-05-13 22:17:32 +00:00
Julia Longtin
50800b91f0 move sub earlier, and move the compare of iterations to outside, and at the end of the loop. 2024-05-13 22:17:32 +00:00
Julia Longtin
1d74ddb15c spacing and comment changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
156b9b676a remove useless prefetches. 2024-05-13 22:17:31 +00:00
Julia Longtin
cb96a48ed1 perform better prefetches, and invert the test of our clear flag for clarity. 2024-05-13 22:17:31 +00:00
Julia Longtin
14638be66c use vbroadcastss in place of vbroadcast32x4. 2024-05-13 22:17:31 +00:00
Julia Longtin
0261b3b8f8 Use a vectorized assembly function to handle remaining chunks less than vector wide. 2024-05-13 22:17:31 +00:00
Julia Longtin
7efdcf5b4f broadcast a single int8, instead of 4 of them. 2024-05-13 22:17:31 +00:00
Julia Longtin
201566c965 use different restrict syntax, to make g++ happy. 2024-05-13 22:17:31 +00:00
Julia Longtin
bf674be34f fix typo 2024-05-13 22:17:31 +00:00
Julia Longtin
30e8b37f33 remove a warning. 2024-05-13 22:17:31 +00:00
Julia Longtin
de44c6633e add batch fp16<->fp32 conversion functions. 2024-05-13 22:17:27 +00:00
Julia Longtin
b33cd8d614 minor spacing and comment changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
e108564e2d spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. 2024-05-13 22:12:55 +00:00
Julia Longtin
1ba6534846 spacing and capitalization changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
93d0a0ae7a use or, instead of and. bug fix? 2024-05-13 22:12:55 +00:00
Julia Longtin
2cfc15b0a9 comment and spacing fixes. 2024-05-13 22:12:55 +00:00
Julia Longtin
d27cd93d11 fix an offset error, and get rid of tabs. 2024-05-13 22:12:55 +00:00
Julia Longtin
5b2023bb12 fix some small errors. 2024-05-13 22:12:55 +00:00
Julia Longtin
934f869a51 further optimizations. 0.99 tokens per second. 2024-05-13 22:12:55 +00:00
Julia Longtin
a33c82b6bb replace tabs with spaces. 2024-05-13 22:12:55 +00:00
Julia Longtin
039685d78c reformat, and label what these files are. 2024-05-13 22:12:55 +00:00
Julia Longtin
feb8bccfab use GGML_F32_EPR, and remove some dead code. 2024-05-13 22:12:55 +00:00
Julia Longtin
7214391ff7 whoops. missing tab. 2024-05-13 22:12:55 +00:00
Julia Longtin
10f06379d7 add Makefile rule for generation .s file, for manual inspection. 2024-05-13 22:12:55 +00:00
Julia Longtin
e544a3faa2 formatting changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
481f1746c0 indent headers consistently. 2024-05-13 22:12:55 +00:00
Julia Longtin
aa33f281e3 formatting. 2024-05-13 22:12:55 +00:00
Julia Longtin
021ae03bd6 minor changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
efcd202f0f massively rewrite assembly routines. 2024-05-13 22:12:55 +00:00