linux-stable/arch
Ard Biesheuvel 86cd97ec4b crypto: arm/chacha-neon - optimize for non-block size multiples
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.

For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.

Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.

This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)

   Cortex-A8  (BeagleBone)       :   7%
   Cortex-A15 (Calxeda Midway)   :  21%
   Cortex-A53 (Raspberry Pi 3)   :   3%
   Cortex-A72 (Raspberry Pi 4)   :  19%

Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-13 20:38:44 +11:00
..
alpha arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
arc treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
arm crypto: arm/chacha-neon - optimize for non-block size multiples 2020-11-13 20:38:44 +11:00
arm64 crypto: arm64/poly1305-neon - reorder PAC authentication with SP update 2020-11-06 14:29:11 +11:00
c6x arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
csky treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
h8300 arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
hexagon arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
ia64 treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
m68k arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
microblaze treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
mips treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
nds32 arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
nios2 arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
openrisc arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
parisc treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
powerpc treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
riscv treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
s390 treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
sh treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
sparc treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
um treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
x86 crypto: hash - Use memzero_explicit() for clearing state 2020-10-30 17:35:03 +11:00
xtensa treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
.gitignore
Kconfig Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-22 09:59:21 -07:00