Julia Longtin
|
bd22e9d28a
|
do 2 rounds of 4, instead of 4 rounds of 2. and properly offset unalligned reads across a 64 byte boundary.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
7925fb1f64
|
make offset available in a register.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
084e3683fb
|
load from identical addresses for low and high side.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
420e9dbd44
|
minor comment fixes.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
3d39d619da
|
make the offset of q4 available.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
257c06b73c
|
add missing vector.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
50887fc9fd
|
fill and increment r12 and r13.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
0c0137ef18
|
relabel some other labels.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
eefa650da0
|
rename some labels.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
9aa34c8884
|
rename label 1 to 3.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
9f3623fffc
|
introduce r10 and r11, for vloadunpackhd.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
a273a9ebf2
|
spacing changes.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
fc23c22fd2
|
spacing changes.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
4d948317c8
|
add missing jump.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
1b7ca0b413
|
look at the right final memory location.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
47ca67a062
|
subtract the correct amount.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
511ad8043f
|
change from handling three iterations per loop to four.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
4097cde569
|
comment clarification.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
f3b86eb792
|
correct a comment, and use jz when comparing to zero.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
9a1a53be8e
|
use values inside of the loop as soon as we have them.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
270204e57b
|
fix loop.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
dda250f637
|
move sub earlier, and move the compare of iterations to outside, and at the end of the loop.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
f555f9d075
|
spacing and comment changes.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
204bc1ffdc
|
remove useless prefetches.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
d8d574c56f
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
a14fe02cf8
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
b1c9622d9e
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
6e0258abac
|
broadcast a single int8, instead of 4 of them.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
664a6025a1
|
use different restrict syntax, to make g++ happy.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
2cf193efc0
|
fix typo
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
c39fa8b6b8
|
remove a warning.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
9fa06f4767
|
add batch fp16<->fp32 conversion functions.
|
2024-06-09 18:02:57 +00:00 |
|
Julia Longtin
|
1c2fdc3412
|
minor spacing and comment changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
54f181d24a
|
spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
9a799ebdae
|
spacing and capitalization changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
0124f7acd8
|
use or, instead of and. bug fix?
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
dc1f639bf0
|
comment and spacing fixes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
4fb1547ba6
|
fix an offset error, and get rid of tabs.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
e37b7f8497
|
fix some small errors.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
c3d438bce2
|
further optimizations. 0.99 tokens per second.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
d966ac2ebe
|
replace tabs with spaces.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
fb83cd987d
|
reformat, and label what these files are.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
b8abefbec6
|
use GGML_F32_EPR, and remove some dead code.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
f84859a926
|
whoops. missing tab.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
ded4da4518
|
add Makefile rule for generation .s file, for manual inspection.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
aeb5ae85ad
|
formatting changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3ff09248ff
|
indent headers consistently.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3cf6eb0cc0
|
formatting.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
90498c1181
|
minor changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
33cc1d8c8e
|
massively rewrite assembly routines.
|
2024-06-09 18:01:49 +00:00 |
|