Julia Longtin
|
d8d574c56f
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
a14fe02cf8
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
b1c9622d9e
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
6e0258abac
|
broadcast a single int8, instead of 4 of them.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
664a6025a1
|
use different restrict syntax, to make g++ happy.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
2cf193efc0
|
fix typo
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
c39fa8b6b8
|
remove a warning.
|
2024-06-09 18:03:01 +00:00 |
|
Julia Longtin
|
9fa06f4767
|
add batch fp16<->fp32 conversion functions.
|
2024-06-09 18:02:57 +00:00 |
|
Julia Longtin
|
1c2fdc3412
|
minor spacing and comment changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
54f181d24a
|
spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
9a799ebdae
|
spacing and capitalization changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
0124f7acd8
|
use or, instead of and. bug fix?
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
dc1f639bf0
|
comment and spacing fixes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
4fb1547ba6
|
fix an offset error, and get rid of tabs.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
e37b7f8497
|
fix some small errors.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
c3d438bce2
|
further optimizations. 0.99 tokens per second.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
d966ac2ebe
|
replace tabs with spaces.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
fb83cd987d
|
reformat, and label what these files are.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
b8abefbec6
|
use GGML_F32_EPR, and remove some dead code.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
f84859a926
|
whoops. missing tab.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
ded4da4518
|
add Makefile rule for generation .s file, for manual inspection.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
aeb5ae85ad
|
formatting changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3ff09248ff
|
indent headers consistently.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3cf6eb0cc0
|
formatting.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
90498c1181
|
minor changes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
33cc1d8c8e
|
massively rewrite assembly routines.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
20c2bc53f9
|
fix vector sizes.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
2a47e5f05f
|
separate filling aux16 from consuming aux16 by making it an array of vectors.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
e579af1e95
|
loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
1c182a3896
|
promote aux8 into a vector.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3fef54f5ce
|
fix our reference to src in the second place, and use a more accurate comment.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3cdfc9c596
|
spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
98c9b6972a
|
better comments, and fix some small errors.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
0c01d07835
|
perform 16 operations at a time.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
d34e0ff835
|
use proper mov operator, and pass addresses.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
e3468e041b
|
attempt our first FMA.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
da69ed5b3a
|
add I32 vector memory clearing.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
10237df57a
|
promote aux32 to a vector.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
3c29fd57ce
|
add missing address of operators.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
45c94bd89d
|
promote aux16 to a vector.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
31b8a5afd7
|
use quotes properly.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
ed639a6cf9
|
use better memory save operator.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
5c010f761f
|
expand mask, and align memory.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
7a00422fa3
|
try to use vectorized zeroing function.
|
2024-06-09 18:01:49 +00:00 |
|
Julia Longtin
|
2870bfc6dd
|
add missing variable.
|
2024-06-09 18:01:48 +00:00 |
|
Julia Longtin
|
656bf28c91
|
copy right block.
|
2024-06-09 18:01:48 +00:00 |
|
Julia Longtin
|
e99f3a9bf4
|
fix typo.
|
2024-06-09 18:01:48 +00:00 |
|
Julia Longtin
|
84093a6be6
|
promote aux16 into a vector. (part three)
|
2024-06-09 18:01:48 +00:00 |
|
Julia Longtin
|
66d26d4914
|
promote aux16 into a vector.
|
2024-06-09 18:01:48 +00:00 |
|
Julia Longtin
|
2f0a949ae0
|
promote aux16 into a vector.
|
2024-06-09 18:01:48 +00:00 |
|