Commit graph

2513 commits

Author SHA1 Message Date
Julia Longtin
b23ab86eda make offset available in a register. 2024-05-11 19:57:45 +00:00
Julia Longtin
1072686dcf load from identical addresses for low and high side. 2024-05-11 19:48:53 +00:00
Julia Longtin
3449b0f359 minor comment fixes. 2024-05-11 19:47:20 +00:00
Julia Longtin
efdb4116d1 make the offset of q4 available. 2024-05-11 19:39:53 +00:00
Julia Longtin
9550ca516f add missing vector. 2024-05-11 19:29:09 +00:00
Julia Longtin
653a565a02 fill and increment r12 and r13. 2024-05-11 19:24:11 +00:00
Julia Longtin
7fa2d73b0a relabel some other labels. 2024-05-11 19:02:48 +00:00
Julia Longtin
047defea41 rename some labels. 2024-05-11 17:56:10 +00:00
Julia Longtin
a1d0da669d rename label 1 to 3. 2024-05-11 14:24:30 +00:00
Julia Longtin
0a0bb9b7db introduce r10 and r11, for vloadunpackhd. 2024-05-11 14:02:36 +00:00
Julia Longtin
9d7f967e88 spacing changes. 2024-05-11 13:35:50 +00:00
Julia Longtin
6c4e687b85 spacing changes. 2024-05-11 13:26:00 +00:00
Julia Longtin
b34575b1f3 add missing jump. 2024-05-11 12:53:23 +00:00
Julia Longtin
fa0226c8df look at the right final memory location. 2024-05-11 11:27:52 +00:00
Julia Longtin
fba57c125c subtract the correct amount. 2024-05-11 11:11:15 +00:00
Julia Longtin
3156e639bf change from handling three iterations per loop to four. 2024-05-11 11:07:16 +00:00
Julia Longtin
a82ada7dcd comment clarification. 2024-05-10 21:57:16 +00:00
Julia Longtin
4a3c42c82c correct a comment, and use jz when comparing to zero. 2024-05-10 20:30:56 +00:00
Julia Longtin
806472787d use values inside of the loop as soon as we have them. 2024-05-10 19:33:58 +00:00
Julia Longtin
21a1e740c2 fix loop. 2024-05-10 17:07:27 +00:00
Julia Longtin
7e44eabe0f move sub earlier, and move the compare of iterations to outside, and at the end of the loop. 2024-05-10 17:03:41 +00:00
Julia Longtin
7966c8e443 spacing and comment changes. 2024-05-10 16:50:39 +00:00
Julia Longtin
650094e17b remove useless prefetches. 2024-05-10 16:28:53 +00:00
Julia Longtin
0ff7d5dd1a perform better prefetches, and invert the test of our clear flag for clarity. 2024-05-10 16:14:28 +00:00
Julia Longtin
b00607d1ab use vbroadcastss in place of vbroadcast32x4. 2024-05-10 15:52:35 +00:00
Julia Longtin
f6edcc4061 Use a vectorized assembly function to handle remaining chunks less than vector wide. 2024-05-10 14:52:46 +00:00
Julia Longtin
2282ac4d9f broadcast a single int8, instead of 4 of them. 2024-05-10 14:19:27 +00:00
Julia Longtin
867de5edce use different restrict syntax, to make g++ happy. 2024-05-09 23:08:43 +00:00
Julia Longtin
e1fdfaae45 fix typo 2024-05-09 20:41:50 +00:00
Julia Longtin
a283551db0 remove a warning. 2024-05-09 20:40:50 +00:00
Julia Longtin
af4ee51fa7 add batch fp16<->fp32 conversion functions. 2024-05-09 19:31:28 +00:00
Julia Longtin
81ca166ecd minor spacing and comment changes. 2024-05-09 16:57:59 +00:00
Julia Longtin
047291fb42 spacing and capitalization changes. Fix the register list of GGML_5bit_Unpacked_Unaligned. 2024-04-26 14:44:08 +00:00
Julia Longtin
77d4ca906b spacing and capitalization changes. 2024-04-25 21:23:22 +00:00
Julia Longtin
d69cf87fce use or, instead of and. bug fix? 2024-04-24 17:50:12 +00:00
Julia Longtin
8cae9a9ef6 comment and spacing fixes. 2024-04-24 17:38:42 +00:00
Julia Longtin
90e99eaf1c fix an offset error, and get rid of tabs. 2024-04-22 18:29:31 +00:00
Julia Longtin
6d16090246 fix some small errors. 2024-04-22 18:22:22 +00:00
Julia Longtin
e298d9e65e further optimizations. 0.99 tokens per second. 2024-04-22 18:16:28 +00:00
Julia Longtin
53773e0b4a replace tabs with spaces. 2024-04-03 23:42:34 +00:00
Julia Longtin
9152143fe7 reformat, and label what these files are. 2024-04-03 23:21:24 +00:00
Julia Longtin
9ad5efafb0 use GGML_F32_EPR, and remove some dead code. 2024-04-03 22:04:45 +00:00
Julia Longtin
84df774d6a whoops. missing tab. 2024-04-03 21:58:29 +00:00
Julia Longtin
9412572205 add Makefile rule for generation .s file, for manual inspection. 2024-04-03 20:30:25 +00:00
Julia Longtin
6f67ea886f formatting changes. 2024-04-03 20:24:00 +00:00
Julia Longtin
96fdd214c8 indent headers consistently. 2024-04-03 19:01:18 +00:00
Julia Longtin
cb4422625a
Merge pull request #1 from julialongtin/k1om
K1om initial support. Round 1.
2024-04-02 17:07:46 +00:00
Julia Longtin
47190a7fe2 formatting. 2024-04-02 17:01:53 +00:00
Julia Longtin
8c17353717 minor changes. 2024-04-02 16:55:40 +00:00
Julia Longtin
9f569ca50b massively rewrite assembly routines. 2024-04-02 15:41:56 +00:00