Julia Longtin
|
a82ada7dcd
|
comment clarification.
|
2024-05-10 21:57:16 +00:00 |
|
Julia Longtin
|
4a3c42c82c
|
correct a comment, and use jz when comparing to zero.
|
2024-05-10 20:30:56 +00:00 |
|
Julia Longtin
|
806472787d
|
use values inside of the loop as soon as we have them.
|
2024-05-10 19:33:58 +00:00 |
|
Julia Longtin
|
21a1e740c2
|
fix loop.
|
2024-05-10 17:07:27 +00:00 |
|
Julia Longtin
|
7e44eabe0f
|
move sub earlier, and move the compare of iterations to outside, and at the end of the loop.
|
2024-05-10 17:03:41 +00:00 |
|
Julia Longtin
|
7966c8e443
|
spacing and comment changes.
|
2024-05-10 16:50:39 +00:00 |
|
Julia Longtin
|
650094e17b
|
remove useless prefetches.
|
2024-05-10 16:28:53 +00:00 |
|
Julia Longtin
|
0ff7d5dd1a
|
perform better prefetches, and invert the test of our clear flag for clarity.
|
2024-05-10 16:14:28 +00:00 |
|
Julia Longtin
|
b00607d1ab
|
use vbroadcastss in place of vbroadcast32x4.
|
2024-05-10 15:52:35 +00:00 |
|
Julia Longtin
|
f6edcc4061
|
Use a vectorized assembly function to handle remaining chunks less than vector wide.
|
2024-05-10 14:52:46 +00:00 |
|
Julia Longtin
|
2282ac4d9f
|
broadcast a single int8, instead of 4 of them.
|
2024-05-10 14:19:27 +00:00 |
|
Julia Longtin
|
867de5edce
|
use different restrict syntax, to make g++ happy.
|
2024-05-09 23:08:43 +00:00 |
|
Julia Longtin
|
e1fdfaae45
|
fix typo
|
2024-05-09 20:41:50 +00:00 |
|
Julia Longtin
|
a283551db0
|
remove a warning.
|
2024-05-09 20:40:50 +00:00 |
|
Julia Longtin
|
af4ee51fa7
|
add batch fp16<->fp32 conversion functions.
|
2024-05-09 19:31:28 +00:00 |
|
Julia Longtin
|
81ca166ecd
|
minor spacing and comment changes.
|
2024-05-09 16:57:59 +00:00 |
|
Julia Longtin
|
047291fb42
|
spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned.
|
2024-04-26 14:44:08 +00:00 |
|
Julia Longtin
|
77d4ca906b
|
spacing and capitalization changes.
|
2024-04-25 21:23:22 +00:00 |
|
Julia Longtin
|
d69cf87fce
|
use or, instead of and. bug fix?
|
2024-04-24 17:50:12 +00:00 |
|
Julia Longtin
|
8cae9a9ef6
|
comment and spacing fixes.
|
2024-04-24 17:38:42 +00:00 |
|
Julia Longtin
|
90e99eaf1c
|
fix an offset error, and get rid of tabs.
|
2024-04-22 18:29:31 +00:00 |
|
Julia Longtin
|
6d16090246
|
fix some small errors.
|
2024-04-22 18:22:22 +00:00 |
|
Julia Longtin
|
e298d9e65e
|
further optimizations. 0.99 tokens per second.
|
2024-04-22 18:16:28 +00:00 |
|
Julia Longtin
|
53773e0b4a
|
replace tabs with spaces.
|
2024-04-03 23:42:34 +00:00 |
|
Julia Longtin
|
9152143fe7
|
reformat, and label what these files are.
|
2024-04-03 23:21:24 +00:00 |
|
Julia Longtin
|
9ad5efafb0
|
use GGML_F32_EPR, and remove some dead code.
|
2024-04-03 22:04:45 +00:00 |
|
Julia Longtin
|
84df774d6a
|
whoops. missing tab.
|
2024-04-03 21:58:29 +00:00 |
|
Julia Longtin
|
9412572205
|
add Makefile rule for generation .s file, for manual inspection.
|
2024-04-03 20:30:25 +00:00 |
|
Julia Longtin
|
6f67ea886f
|
formatting changes.
|
2024-04-03 20:24:00 +00:00 |
|
Julia Longtin
|
96fdd214c8
|
indent headers consistently.
|
2024-04-03 19:01:18 +00:00 |
|
Julia Longtin
|
cb4422625a
|
Merge pull request #1 from julialongtin/k1om
K1om initial support. Round 1.
|
2024-04-02 17:07:46 +00:00 |
|
Julia Longtin
|
47190a7fe2
|
formatting.
|
2024-04-02 17:01:53 +00:00 |
|
Julia Longtin
|
8c17353717
|
minor changes.
|
2024-04-02 16:55:40 +00:00 |
|
Julia Longtin
|
9f569ca50b
|
massively rewrite assembly routines.
|
2024-04-02 15:41:56 +00:00 |
|
Julia Longtin
|
12c9576aec
|
fix vector sizes.
|
2024-03-25 19:43:37 +00:00 |
|
Julia Longtin
|
bc3d6db862
|
separate filling aux16 from consuming aux16 by making it an array of vectors.
|
2024-03-24 14:18:08 +00:00 |
|
Julia Longtin
|
ca0dc26704
|
loosen alignment requirements for zeros, add missing function, and promote aux8 to an array of vectors.
|
2024-03-24 13:35:05 +00:00 |
|
Julia Longtin
|
cf481cf901
|
promote aux8 into a vector.
|
2024-03-24 12:50:01 +00:00 |
|
Julia Longtin
|
169a145409
|
fix our reference to src in the second place, and use a more accurate comment.
|
2024-03-24 12:41:21 +00:00 |
|
Julia Longtin
|
c28bfe4552
|
spacing changes, eliminate dead references to k1 or zero, and use the right type when referring to src.
|
2024-03-24 12:37:47 +00:00 |
|
Julia Longtin
|
ba4f4129b3
|
better comments, and fix some small errors.
|
2024-03-24 12:17:06 +00:00 |
|
Julia Longtin
|
03a3e0eb7a
|
perform 16 operations at a time.
|
2024-03-24 12:04:44 +00:00 |
|
Julia Longtin
|
5935bb34f4
|
use proper mov operator, and pass addresses.
|
2024-03-23 23:46:36 +00:00 |
|
Julia Longtin
|
a5132a1507
|
attempt our first FMA.
|
2024-03-23 22:16:57 +00:00 |
|
Julia Longtin
|
4477b8e123
|
add I32 vector memory clearing.
|
2024-03-23 21:16:23 +00:00 |
|
Julia Longtin
|
ea1edb0600
|
promote aux32 to a vector.
|
2024-03-23 21:12:35 +00:00 |
|
Julia Longtin
|
f967690a41
|
add missing address of operators.
|
2024-03-23 21:05:50 +00:00 |
|
Julia Longtin
|
2fdd11fe3a
|
promote aux16 to a vector.
|
2024-03-23 21:00:51 +00:00 |
|
Julia Longtin
|
f09b3ed79e
|
use quotes properly.
|
2024-03-23 20:53:16 +00:00 |
|
Julia Longtin
|
bb5eb95816
|
use better memory save operator.
|
2024-03-23 20:49:11 +00:00 |
|