linux-stable/lib/raid6
WANG Xuerui f209132104 raid6: Add LoongArch SIMD recovery implementation
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.

The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).

Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:

> lasx  2data: 354.987 MiB/s
> lasx  datap: 350.430 MiB/s
> lsx   2data: 340.026 MiB/s
> lsx   datap: 337.318 MiB/s
> intx1 2data: 164.280 MiB/s
> intx1 datap: 187.966 MiB/s

Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
..
test raid6: Add LoongArch SIMD recovery implementation 2023-09-06 22:53:55 +08:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
Makefile raid6: Add LoongArch SIMD recovery implementation 2023-09-06 22:53:55 +08:00
algos.c raid6: Add LoongArch SIMD recovery implementation 2023-09-06 22:53:55 +08:00
altivec.uc
avx2.c lib/raid6: Use strict priority ranking for pq gen() benchmarking 2022-01-06 08:37:03 -08:00
avx512.c lib/raid6: Use strict priority ranking for pq gen() benchmarking 2022-01-06 08:37:03 -08:00
int.uc
loongarch.h raid6: Add LoongArch SIMD syndrome calculation 2023-09-06 22:53:55 +08:00
loongarch_simd.c raid6: Add LoongArch SIMD syndrome calculation 2023-09-06 22:53:55 +08:00
mktables.c raid6: guard the tables.c include of <linux/export.h> with __KERNEL__ 2023-08-15 09:40:27 -07:00
mmx.c
neon.c
neon.h raid6: neon: add missing prototypes 2023-06-13 15:13:20 -07:00
neon.uc raid6: neon: add missing prototypes 2023-06-13 15:13:20 -07:00
recov.c raid6: remove the <linux/export.h> include from recov.c 2023-08-15 09:40:27 -07:00
recov_avx2.c x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2 2020-04-09 00:12:48 +09:00
recov_avx512.c
recov_loongarch_simd.c raid6: Add LoongArch SIMD recovery implementation 2023-09-06 22:53:55 +08:00
recov_neon.c raid6: neon: add missing prototypes 2023-06-13 15:13:20 -07:00
recov_neon_inner.c raid6: neon: add missing prototypes 2023-06-13 15:13:20 -07:00
recov_s390xc.c
recov_ssse3.c x86: remove always-defined CONFIG_AS_SSSE3 2020-04-09 00:01:59 +09:00
s390vx.uc s390/vx: add vx-insn.h wrapper include file 2022-12-06 16:18:23 +01:00
sse1.c
sse2.c
unroll.awk lib: raid6: fix awk build warnings 2019-12-09 18:55:03 +01:00
vpermxor.uc lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR 2022-03-08 15:20:21 -08:00
x86.h