Commit graph

2971 commits

Author SHA1 Message Date
Julia Longtin
259da93e5a add missing vector. 2024-05-13 22:17:32 +00:00
Julia Longtin
5c76364410 fill and increment r12 and r13. 2024-05-13 22:17:32 +00:00
Julia Longtin
5e7d7ab70d relabel some other labels. 2024-05-13 22:17:32 +00:00
Julia Longtin
939606155a rename some labels. 2024-05-13 22:17:32 +00:00
Julia Longtin
25fc1d669c rename label 1 to 3. 2024-05-13 22:17:32 +00:00
Julia Longtin
8e854f4c3f introduce r10 and r11, for vloadunpackhd. 2024-05-13 22:17:32 +00:00
Julia Longtin
5615c8611f spacing changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
6ed6f2f536 spacing changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
9372048875 add missing jump. 2024-05-13 22:17:32 +00:00
Julia Longtin
41a9ed02f1 look at the right final memory location. 2024-05-13 22:17:32 +00:00
Julia Longtin
e8087c5d60 subtract the correct amount. 2024-05-13 22:17:32 +00:00
Julia Longtin
cfe47d048d change from handling three iterations per loop to four. 2024-05-13 22:17:32 +00:00
Julia Longtin
7819247614 comment clarification. 2024-05-13 22:17:32 +00:00
Julia Longtin
fc828b46df correct a comment, and use jz when comparing to zero. 2024-05-13 22:17:32 +00:00
Julia Longtin
f1af881a23 use values inside of the loop as soon as we have them. 2024-05-13 22:17:32 +00:00
Julia Longtin
6bd8dcb282 fix loop. 2024-05-13 22:17:32 +00:00
Julia Longtin
50800b91f0 move sub earlier, and move the compare of iterations to outside, and at the end of the loop. 2024-05-13 22:17:32 +00:00
Julia Longtin
1d74ddb15c spacing and comment changes. 2024-05-13 22:17:32 +00:00
Julia Longtin
156b9b676a remove useless prefetches. 2024-05-13 22:17:31 +00:00
Julia Longtin
cb96a48ed1 perform better prefetches, and invert the test of our clear flag for clarity. 2024-05-13 22:17:31 +00:00
Julia Longtin
14638be66c use vbroadcastss in place of vbroadcast32x4. 2024-05-13 22:17:31 +00:00
Julia Longtin
0261b3b8f8 Use a vectorized assembly function to handle remaining chunks less than vector wide. 2024-05-13 22:17:31 +00:00
Julia Longtin
7efdcf5b4f broadcast a single int8, instead of 4 of them. 2024-05-13 22:17:31 +00:00
Julia Longtin
201566c965 use different restrict syntax, to make g++ happy. 2024-05-13 22:17:31 +00:00
Julia Longtin
bf674be34f fix typo 2024-05-13 22:17:31 +00:00
Julia Longtin
30e8b37f33 remove a warning. 2024-05-13 22:17:31 +00:00
Julia Longtin
de44c6633e add batch fp16<->fp32 conversion functions. 2024-05-13 22:17:27 +00:00
Julia Longtin
b33cd8d614 minor spacing and comment changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
e108564e2d spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. 2024-05-13 22:12:55 +00:00
Julia Longtin
1ba6534846 spacing and capitalization changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
93d0a0ae7a use or, instead of and. bug fix? 2024-05-13 22:12:55 +00:00
Julia Longtin
2cfc15b0a9 comment and spacing fixes. 2024-05-13 22:12:55 +00:00
Julia Longtin
d27cd93d11 fix an offset error, and get rid of tabs. 2024-05-13 22:12:55 +00:00
Julia Longtin
5b2023bb12 fix some small errors. 2024-05-13 22:12:55 +00:00
Julia Longtin
934f869a51 further optimizations. 0.99 tokens per second. 2024-05-13 22:12:55 +00:00
Julia Longtin
a33c82b6bb replace tabs with spaces. 2024-05-13 22:12:55 +00:00
Julia Longtin
039685d78c reformat, and label what these files are. 2024-05-13 22:12:55 +00:00
Julia Longtin
feb8bccfab use GGML_F32_EPR, and remove some dead code. 2024-05-13 22:12:55 +00:00
Julia Longtin
7214391ff7 whoops. missing tab. 2024-05-13 22:12:55 +00:00
Julia Longtin
10f06379d7 add Makefile rule for generation .s file, for manual inspection. 2024-05-13 22:12:55 +00:00
Julia Longtin
e544a3faa2 formatting changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
481f1746c0 indent headers consistently. 2024-05-13 22:12:55 +00:00
Julia Longtin
aa33f281e3 formatting. 2024-05-13 22:12:55 +00:00
Julia Longtin
021ae03bd6 minor changes. 2024-05-13 22:12:55 +00:00
Julia Longtin
efcd202f0f massively rewrite assembly routines. 2024-05-13 22:12:55 +00:00
Julia Longtin
e66a97f765 fix vector sizes. 2024-05-13 22:12:55 +00:00
Julia Longtin
5a6024279f separate filling aux16 from consuming aux16 by making it an array of vectors. 2024-05-13 22:12:55 +00:00
Julia Longtin
d351d995b0 loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors. 2024-05-13 22:12:55 +00:00
Julia Longtin
185d4b8bf7 promote aux8 into a vector. 2024-05-13 22:12:55 +00:00
Julia Longtin
a95c7b0138 fix our reference to src in the second place, and use a more accurate comment. 2024-05-13 22:12:55 +00:00