Julia Longtin
|
f0d4f513c6
|
minor comment fixes.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
d52607486d
|
make the offset of q4 available.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
259da93e5a
|
add missing vector.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
5c76364410
|
fill and increment r12 and r13.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
5e7d7ab70d
|
relabel some other labels.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
939606155a
|
rename some labels.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
25fc1d669c
|
rename label 1 to 3.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
8e854f4c3f
|
introduce r10 and r11, for vloadunpackhd.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
5615c8611f
|
spacing changes.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
6ed6f2f536
|
spacing changes.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
9372048875
|
add missing jump.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
41a9ed02f1
|
look at the right final memory location.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
e8087c5d60
|
subtract the correct amount.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
cfe47d048d
|
change from handling three iterations per loop to four.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
7819247614
|
comment clarification.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
fc828b46df
|
correct a comment, and use jz when comparing to zero.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
f1af881a23
|
use values inside of the loop as soon as we have them.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
6bd8dcb282
|
fix loop.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
50800b91f0
|
move sub earlier, and move the compare of iterations to outside, and at the end of the loop.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
1d74ddb15c
|
spacing and comment changes.
|
2024-05-13 22:17:32 +00:00 |
|
Julia Longtin
|
156b9b676a
|
remove useless prefetches.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
cb96a48ed1
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
14638be66c
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
0261b3b8f8
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
7efdcf5b4f
|
broadcast a single int8, instead of 4 of them.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
201566c965
|
use different restrict syntax, to make g++ happy.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
bf674be34f
|
fix typo
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
30e8b37f33
|
remove a warning.
|
2024-05-13 22:17:31 +00:00 |
|
Julia Longtin
|
de44c6633e
|
add batch fp16<->fp32 conversion functions.
|
2024-05-13 22:17:27 +00:00 |
|
Julia Longtin
|
b33cd8d614
|
minor spacing and comment changes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
e108564e2d
|
spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
1ba6534846
|
spacing and capitalization changes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
93d0a0ae7a
|
use or, instead of and. bug fix?
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
2cfc15b0a9
|
comment and spacing fixes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
d27cd93d11
|
fix an offset error, and get rid of tabs.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
5b2023bb12
|
fix some small errors.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
934f869a51
|
further optimizations. 0.99 tokens per second.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
a33c82b6bb
|
replace tabs with spaces.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
039685d78c
|
reformat, and label what these files are.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
feb8bccfab
|
use GGML_F32_EPR, and remove some dead code.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
7214391ff7
|
whoops. missing tab.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
10f06379d7
|
add Makefile rule for generation .s file, for manual inspection.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
e544a3faa2
|
formatting changes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
481f1746c0
|
indent headers consistently.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
aa33f281e3
|
formatting.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
021ae03bd6
|
minor changes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
efcd202f0f
|
massively rewrite assembly routines.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
e66a97f765
|
fix vector sizes.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
5a6024279f
|
separate filling aux16 from consuming aux16 by making it an array of vectors.
|
2024-05-13 22:12:55 +00:00 |
|
Julia Longtin
|
d351d995b0
|
loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors.
|
2024-05-13 22:12:55 +00:00 |
|