Julia Longtin
|
047defea41
|
rename some labels.
|
2024-05-11 17:56:10 +00:00 |
|
Julia Longtin
|
a1d0da669d
|
rename label 1 to 3.
|
2024-05-11 14:24:30 +00:00 |
|
Julia Longtin
|
0a0bb9b7db
|
introduce r10 and r11, for vloadunpackhd.
|
2024-05-11 14:02:36 +00:00 |
|
Julia Longtin
|
9d7f967e88
|
spacing changes.
|
2024-05-11 13:35:50 +00:00 |
|
Julia Longtin
|
6c4e687b85
|
spacing changes.
|
2024-05-11 13:26:00 +00:00 |
|
Julia Longtin
|
b34575b1f3
|
add missing jump.
|
2024-05-11 12:53:23 +00:00 |
|
Julia Longtin
|
fa0226c8df
|
look at the right final memory location.
|
2024-05-11 11:27:52 +00:00 |
|
Julia Longtin
|
fba57c125c
|
subtract the correct amount.
|
2024-05-11 11:11:15 +00:00 |
|
Julia Longtin
|
3156e639bf
|
change from handling three iterations per loop to four.
|
2024-05-11 11:07:16 +00:00 |
|
Julia Longtin
|
a82ada7dcd
|
comment clarification.
|
2024-05-10 21:57:16 +00:00 |
|
Julia Longtin
|
4a3c42c82c
|
correct a comment, and use jz when comparing to zero.
|
2024-05-10 20:30:56 +00:00 |
|
Julia Longtin
|
806472787d
|
use values inside of the loop as soon as we have them.
|
2024-05-10 19:33:58 +00:00 |
|
Julia Longtin
|
21a1e740c2
|
fix loop.
|
2024-05-10 17:07:27 +00:00 |
|
Julia Longtin
|
7e44eabe0f
|
move sub earlier, and move the compare of iterations to outside, and at the end of the loop.
|
2024-05-10 17:03:41 +00:00 |
|
Julia Longtin
|
7966c8e443
|
spacing and comment changes.
|
2024-05-10 16:50:39 +00:00 |
|
Julia Longtin
|
650094e17b
|
remove useless prefetches.
|
2024-05-10 16:28:53 +00:00 |
|
Julia Longtin
|
0ff7d5dd1a
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-05-10 16:14:28 +00:00 |
|
Julia Longtin
|
b00607d1ab
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-05-10 15:52:35 +00:00 |
|
Julia Longtin
|
f6edcc4061
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-05-10 14:52:46 +00:00 |
|
Julia Longtin
|
2282ac4d9f
|
broadcast a single int8, instead of 4 of them.
|
2024-05-10 14:19:27 +00:00 |
|
Julia Longtin
|
867de5edce
|
use different restrict syntax, to make g++ happy.
|
2024-05-09 23:08:43 +00:00 |
|
Julia Longtin
|
e1fdfaae45
|
fix typo
|
2024-05-09 20:41:50 +00:00 |
|
Julia Longtin
|
a283551db0
|
remove a warning.
|
2024-05-09 20:40:50 +00:00 |
|
Julia Longtin
|
af4ee51fa7
|
add batch fp16<->fp32 conversion functions.
|
2024-05-09 19:31:28 +00:00 |
|
Julia Longtin
|
81ca166ecd
|
minor spacing and comment changes.
|
2024-05-09 16:57:59 +00:00 |
|
Julia Longtin
|
047291fb42
|
spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned.
|
2024-04-26 14:44:08 +00:00 |
|
Julia Longtin
|
77d4ca906b
|
spacing and capitalization changes.
|
2024-04-25 21:23:22 +00:00 |
|
Julia Longtin
|
d69cf87fce
|
use or, instead of and. bug fix?
|
2024-04-24 17:50:12 +00:00 |
|
Julia Longtin
|
8cae9a9ef6
|
comment and spacing fixes.
|
2024-04-24 17:38:42 +00:00 |
|
Julia Longtin
|
90e99eaf1c
|
fix an offset error, and get rid of tabs.
|
2024-04-22 18:29:31 +00:00 |
|
Julia Longtin
|
6d16090246
|
fix some small errors.
|
2024-04-22 18:22:22 +00:00 |
|
Julia Longtin
|
e298d9e65e
|
further optimizations. 0.99 tokens per second.
|
2024-04-22 18:16:28 +00:00 |
|
Julia Longtin
|
53773e0b4a
|
replace tabs with spaces.
|
2024-04-03 23:42:34 +00:00 |
|
Julia Longtin
|
9152143fe7
|
reformat, and label what these files are.
|
2024-04-03 23:21:24 +00:00 |
|
Julia Longtin
|
9ad5efafb0
|
use GGML_F32_EPR, and remove some dead code.
|
2024-04-03 22:04:45 +00:00 |
|
Julia Longtin
|
84df774d6a
|
whoops. missing tab.
|
2024-04-03 21:58:29 +00:00 |
|
Julia Longtin
|
9412572205
|
add Makefile rule for generation .s file, for manual inspection.
|
2024-04-03 20:30:25 +00:00 |
|
Julia Longtin
|
6f67ea886f
|
formatting changes.
|
2024-04-03 20:24:00 +00:00 |
|
Julia Longtin
|
96fdd214c8
|
indent headers consistently.
|
2024-04-03 19:01:18 +00:00 |
|
Julia Longtin
|
cb4422625a
|
Merge pull request #1 from julialongtin/k1om
K1om initial support. Round 1.
|
2024-04-02 17:07:46 +00:00 |
|
Julia Longtin
|
47190a7fe2
|
formatting.
|
2024-04-02 17:01:53 +00:00 |
|
Julia Longtin
|
8c17353717
|
minor changes.
|
2024-04-02 16:55:40 +00:00 |
|
Julia Longtin
|
9f569ca50b
|
massively rewrite assembly routines.
|
2024-04-02 15:41:56 +00:00 |
|
Julia Longtin
|
12c9576aec
|
fix vector sizes.
|
2024-03-25 19:43:37 +00:00 |
|
Julia Longtin
|
bc3d6db862
|
separate filling aux16 from consuming aux16 by making it an array of vectors.
|
2024-03-24 14:18:08 +00:00 |
|
Julia Longtin
|
ca0dc26704
|
loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors.
|
2024-03-24 13:35:05 +00:00 |
|
Julia Longtin
|
cf481cf901
|
promote aux8 into a vector.
|
2024-03-24 12:50:01 +00:00 |
|
Julia Longtin
|
169a145409
|
fix our reference to src in the second place, and use a more accurate comment.
|
2024-03-24 12:41:21 +00:00 |
|
Julia Longtin
|
c28bfe4552
|
spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src.
|
2024-03-24 12:37:47 +00:00 |
|
Julia Longtin
|
ba4f4129b3
|
better comments, and fix some small errors.
|
2024-03-24 12:17:06 +00:00 |
|