Commit graph

8 commits

Author SHA1 Message Date
Ard Biesheuvel
97b70180b7 crypto: aegis128/neon - move final tag check to SIMD domain
Instead of calculating the tag and returning it to the caller on
decryption, use a SIMD compare and min across vector to perform
the comparison. This is slightly more efficient, and removes the
need on the caller's part to wipe the tag from memory if the
decryption failed.

While at it, switch to unsigned int when passing cryptlen and
assoclen - we don't support input sizes where it matters anyway.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-27 17:13:40 +11:00
Ard Biesheuvel
ad00d41b47 crypto: aegis128/neon - optimize tail block handling
Avoid copying the tail block via a stack buffer if the total size
exceeds a single AEGIS block. In this case, we can use overlapping
loads and stores and NEON permutation instructions instead, which
leads to a modest performance improvement on some cores (< 5%),
and is slightly cleaner. Note that we still need to use a stack
buffer if the entire input is smaller than 16 bytes, given that
we cannot use 16 byte NEON loads and stores safely in this case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-27 17:13:39 +11:00
Ard Biesheuvel
528282630c crypto: aegis128 - duplicate init() and final() hooks in SIMD code
In order to speed up aegis128 processing even more, duplicate the init()
and final() routines as SIMD versions in their entirety. This results
in a 2x speedup on ARM Cortex-A57 for ~1500 byte packets (using AES
instructions).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-10-26 02:06:05 +11:00
Ard Biesheuvel
389139b34f crypto: arm64/aegis128 - use explicit vector load for permute vectors
When building the new aegis128 NEON code in big endian mode, Clang
complains about the const uint8x16_t permute vectors in the following
way:

  crypto/aegis128-neon-inner.c:58:40: warning: vector initializers are not
      compatible with NEON intrinsics in big endian mode
      [-Wnonportable-vector-initialization]
                static const uint8x16_t shift_rows = {
                                                     ^
  crypto/aegis128-neon-inner.c:58:40: note: consider using vld1q_u8() to
      initialize a vector from memory, or vcombine_u8(vcreate_u8(), vcreate_u8())
      to initialize from integer constants

Since the same issue applies to the uint8x16x4_t loads of the AES Sbox,
update those references as well. However, since GCC does not implement
the vld1q_u8_x4() intrinsic, switch from IS_ENABLED() to a preprocessor
conditional to conditionally include this code.

Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-30 18:05:27 +10:00
Ard Biesheuvel
198429631a crypto: arm64/aegis128 - implement plain NEON version
Provide a version of the core AES transform to the aegis128 SIMD
code that does not rely on the special AES instructions, but uses
plain NEON instructions instead. This allows the SIMD version of
the aegis128 driver to be used on arm64 systems that do not
implement those instructions (which are not mandatory in the
architecture), such as the Raspberry Pi 3.

Since GCC makes a mess of this when using the tbl/tbx intrinsics
to perform the sbox substitution, preload the Sbox into v16..v31
in this case and use inline asm to emit the tbl/tbx instructions.
Clang does not support this approach, nor does it require it, since
it does a much better job at code generation, so there we use the
intrinsics as usual.

Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-15 21:52:15 +10:00
Ard Biesheuvel
a4397635af crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics
Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.

This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.

Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-15 21:52:15 +10:00
Herbert Xu
c9f1fd4f2f Revert "crypto: aegis128 - add support for SIMD acceleration"
This reverts commit ecc8bc81f2
("crypto: aegis128 - provide a SIMD implementation based on NEON
intrinsics") and commit 7cdc0ddbf7
("crypto: aegis128 - add support for SIMD acceleration").

They cause compile errors on platforms other than ARM because
the mechanism to selectively compile the SIMD code is broken.

Repoted-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-02 13:31:35 +10:00
Ard Biesheuvel
ecc8bc81f2 crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics
Provide an accelerated implementation of aegis128 by wiring up the
SIMD hooks in the generic driver to an implementation based on NEON
intrinsics, which can be compiled to both ARM and arm64 code.

This results in a performance of 2.2 cycles per byte on Cortex-A53,
which is a performance increase of ~11x compared to the generic
code.

Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26 15:03:58 +10:00