Commit graph

309 commits

Author SHA1 Message Date
Wang, Rui Y
fd09967b83 crypto: sha-mb - Fix load failure
On  Monday, February 1, 2016 4:18 PM, Herbert Xu wrote:
>
> On Wed, Jan 27, 2016 at 05:08:35PM +0800, Rui Wang wrote:
>>
>> +static int sha1_mb_async_import(struct ahash_request *req, const void
>> +*in) {
>> +	struct ahash_request *mcryptd_req = ahash_request_ctx(req);
>> +	struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
>> +	struct sha1_mb_ctx *ctx = crypto_ahash_ctx(tfm);
>> +	struct mcryptd_ahash *mcryptd_tfm = ctx->mcryptd_tfm;
>> +	struct crypto_shash *child = mcryptd_ahash_child(mcryptd_tfm);
>> +	struct mcryptd_hash_request_ctx *rctx;
>> +	struct shash_desc *desc;
>> +	int err;
>> +
>> +	memcpy(mcryptd_req, req, sizeof(*req));
>> +	ahash_request_set_tfm(mcryptd_req, &mcryptd_tfm->base);
>> +	rctx = ahash_request_ctx(mcryptd_req);
>> +	desc = &rctx->desc;
>> +	desc->tfm = child;
>> +	desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
>> +
>> +	err = crypto_shash_init(desc);
>> +	if (err)
>> +		return err;
>
> What is this desc for?

Hi Herbert,

Yeah I just realized that the call to crypto_shash_init() isn't necessary
here. What it does is overwritten by crypto_ahash_import(). But this desc
still needs to be initialized here because it's newly allocated by
ahash_request_alloc(). We eventually calls the shash version of import()
which needs desc as an argument. The real context to be imported is then
derived from shash_desc_ctx(desc).

desc is a sub-field of struct mcryptd_hash_request_ctx, which is again a
sub-field of the bigger blob allocated by ahash_request_alloc(). The entire
blob's size is set in sha1_mb_async_init_tfm(). So a better version is as
follows:

(just removed the call to crypto_shash_init())

>From 4bcb73adbef99aada94c49f352063619aa24d43d Mon Sep 17 00:00:00 2001
From: Rui Wang <rui.y.wang@intel.com>
Date: Mon, 14 Dec 2015 17:22:13 +0800
Subject: [PATCH v2 1/4] crypto x86/sha1_mb: Fix load failure

modprobe sha1_mb fails with the following message:

modprobe: ERROR: could not insert 'sha1_mb': No such device

It is because it needs to set its statesize and implement its
import() and export() interface.

v2: remove redundant call to crypto_shash_init()

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-02-06 15:33:23 +08:00
Megha Dey
10cff58c67 crypto: sha1-mb - Add missing args_digest offset
The _args_digest is defined as _args+_digest, both of which are the first
members of 2 separate structures, effectively yielding _args_digest to have
a value of zero. Thus, no errors have spawned yet due to this. To ensure
sanity, adding the missing _args_digest offset to the sha1_mb_mgr_submit.S.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-01-27 20:36:19 +08:00
Eli Cooper
cbe09bd51b crypto: chacha20-ssse3 - Align stack pointer to 64 bytes
This aligns the stack pointer in chacha20_4block_xor_ssse3 to 64 bytes.
Fixes general protection faults and potential kernel panics.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-01-25 21:47:45 +08:00
Linus Torvalds
c597b6bcd5 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Algorithms:
   - Add RSA padding algorithm

  Drivers:
   - Add GCM mode support to atmel
   - Add atmel support for SAMA5D2 devices
   - Add cipher modes to talitos
   - Add rockchip driver for rk3288
   - Add qat support for C3XXX and C62X"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (103 commits)
  crypto: hifn_795x, picoxcell - use ablkcipher_request_cast
  crypto: qat - fix SKU definiftion for c3xxx dev
  crypto: qat - Fix random config build issue
  crypto: ccp - use to_pci_dev and to_platform_device
  crypto: qat - Rename dh895xcc mmp firmware
  crypto: 842 - remove WARN inside printk
  crypto: atmel-aes - add debug facilities to monitor register accesses.
  crypto: atmel-aes - add support to GCM mode
  crypto: atmel-aes - change the DMA threshold
  crypto: atmel-aes - fix the counter overflow in CTR mode
  crypto: atmel-aes - fix atmel-ctr-aes driver for RFC 3686
  crypto: atmel-aes - create sections to regroup functions by usage
  crypto: atmel-aes - fix typo and indentation
  crypto: atmel-aes - use SIZE_IN_WORDS() helper macro
  crypto: atmel-aes - improve performances of data transfer
  crypto: atmel-aes - fix atmel_aes_remove()
  crypto: atmel-aes - remove useless AES_FLAGS_DMA flag
  crypto: atmel-aes - reduce latency of DMA completion
  crypto: atmel-aes - remove unused 'err' member of struct atmel_aes_dev
  crypto: atmel-aes - rework crypto request completion
  ...
2016-01-12 18:51:14 -08:00
Borislav Petkov
362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Wang, Rui Y
3a020a723c crypto: ghash-clmulni - Fix load failure
ghash_clmulni_intel fails to load on Linux 4.3+ with the following message:
"modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument"

After 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash
drivers are required to implement import()/export(), and must have a non-
zero statesize.

This patch has been tested with the algif_hash interface. The calculated
digest values, after several rounds of import()s and export()s, match those
calculated by tcrypt.

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-12-04 22:29:53 +08:00
Linus Torvalds
ccc9d4a6d6 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:

   - Add support for cipher output IVs in testmgr
   - Add missing crypto_ahash_blocksize helper
   - Mark authenc and des ciphers as not allowed under FIPS.

Algorithms:

   - Add CRC support to 842 compression
   - Add keywrap algorithm
   - A number of changes to the akcipher interface:
      + Separate functions for setting public/private keys.
      + Use SG lists.

Drivers:

   - Add Intel SHA Extension optimised SHA1 and SHA256
   - Use dma_map_sg instead of custom functions in crypto drivers
   - Add support for STM32 RNG
   - Add support for ST RNG
   - Add Device Tree support to exynos RNG driver
   - Add support for mxs-dcp crypto device on MX6SL
   - Add xts(aes) support to caam
   - Add ctr(aes) and xts(aes) support to qat
   - A large set of fixes from Russell King for the marvell/cesa driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
  crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
  crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
  hwrng: exynos - Add Device Tree support
  hwrng: exynos - Fix missing configuration after suspend to RAM
  hwrng: exynos - Add timeout for waiting on init done
  dt-bindings: rng: Describe Exynos4 PRNG bindings
  crypto: marvell/cesa - use __le32 for hardware descriptors
  crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
  crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
  crypto: marvell/cesa - use gfp_t for gfp flags
  crypto: marvell/cesa - use dma_addr_t for cur_dma
  crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
  crypto: caam - fix indentation of close braces
  crypto: caam - only export the state we really need to export
  crypto: caam - fix non-block aligned hash calculation
  crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
  crypto: caam - print errno code when hash registration fails
  crypto: marvell/cesa - fix memory leak
  crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
  crypto: marvell/cesa - rearrange handling for sw padded hashes
  ...
2015-11-04 09:11:12 -08:00
Linus Torvalds
ce4d72fac1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "There are two main areas of changes:

   - Rework of the extended FPU state code to robustify the kernel's
     usage of cpuid provided xstate sizes - and related changes (Dave
     Hansen)"

   - math emulation enhancements: new modern FPU instructions support,
     with testcases, plus cleanups (Denys Vlasnko)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/fpu: Fixup uninitialized feature_name warning
  x86/fpu/math-emu: Add support for FISTTP instructions
  x86/fpu/math-emu, selftests: Add test for FISTTP instructions
  x86/fpu/math-emu: Add support for FCMOVcc insns
  x86/fpu/math-emu: Add support for F[U]COMI[P] insns
  x86/fpu/math-emu: Remove define layer for undocumented opcodes
  x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns
  x86/fpu/math-emu: Remove !NO_UNDOC_CODE
  x86/fpu: Check CPU-provided sizes against struct declarations
  x86/fpu: Check to ensure increasing-offset xstate offsets
  x86/fpu: Correct and check XSAVE xstate size calculations
  x86/fpu: Add C structures for AVX-512 state components
  x86/fpu: Rework YMM definition
  x86/fpu/mpx: Rework MPX 'xstate' types
  x86/fpu: Add xfeature_enabled() helper instead of test_bit()
  x86/fpu: Remove 'xfeature_nr'
  x86/fpu: Rework XSTATE_* macros to remove magic '2'
  x86/fpu: Rename XFEATURES_NR_MAX
  x86/fpu: Rename XSAVE macros
  x86/fpu: Remove partial LWP support definitions
  ...
2015-11-03 20:50:26 -08:00
Ben Hutchings
92b279070d crypto: camellia_aesni_avx - Fix CPU feature checks
We need to explicitly check the AVX and AES CPU features, as we can't
infer them from the related XSAVE feature flags.  For example, the
Core i3 2310M passes the XSAVE feature test but does not implement
AES-NI.

Reported-and-tested-by: Stéphane Glondu <glondu@debian.org>
References: https://bugs.debian.org/800934
Fixes: ce4f5f9b65 ("x86/fpu, crypto x86/camellia_aesni_avx: Simplify...")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable <stable@vger.kernel.org> # 4.2
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-10-08 21:36:49 +08:00
Borislav Petkov
158ecc3918 x86/fpu: Fixup uninitialized feature_name warning
Hand in &feature_name to cpu_has_xfeatures() as it is supposed
to. Fixes an uninitialized warning.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: brgerst@gmail.com
Cc: dvlasenk@redhat.com
Cc: fenghua.yu@intel.com
Cc: luto@amacapital.net
Cc: tim.c.chen@linux.intel.com
Fixes: d91cab7813 ("x86/fpu: Rename XSAVE macros")
Link: http://lkml.kernel.org/r/20150923104901.GA3538@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-24 09:21:20 +02:00
Nicolas Iooss
97bce7e0b5 crypto: crc32c-pclmul - use .rodata instead of .rotata
Module crc32c-intel uses a special read-only data section named .rotata.
This section is defined for K_table, and its name seems to be a spelling
mistake for .rodata.

Fixes: 473946e674 ("crypto: crc32c-pclmul - Shrink K_table to 32-bit words")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 23:05:57 +08:00
tim
be6ec98ddb crypto: x86/sha - Restructure x86 sha512 glue code to expose all the available sha512 transforms
Restructure the x86 sha512 glue code so we will expose sha512 transforms
based on SSSE3, AVX or AVX2 as separate individual drivers when cpu
provides support. This will make it easy for alternative algorithms to
be used if desired and makes the code cleaner and easier to maintain.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:11 +08:00
tim
5dda42fc89 crypto: x86/sha - Restructure x86 sha256 glue code to expose all the available sha256 transforms
Restructure the x86 sha256 glue code so we will expose sha256 transforms
based on SSSE3, AVX, AVX2 or SHA-NI extension as separate individual
drivers when cpu provides such support. This will make it easy for
alternative algorithms to be used if desired and makes the code cleaner
and easier to maintain.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:11 +08:00
tim
85c66ecd6f crypto: x86/sha - Restructure x86 sha1 glue code to expose all the available sha1 transforms
Restructure the x86 sha1 glue code so we will expose sha1 transforms based
on SSSE3, AVX, AVX2 or SHA-NI extension as separate individual drivers
when cpu provides such support. This will make it easy for alternative
algorithms to be used if desired and makes the code cleaner and easier
to maintain.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:11 +08:00
tim
e38b6b7fcf crypto: x86/sha - Add build support for Intel SHA Extensions optimized SHA1 and SHA256
This patch provides the configuration and build support to
include and build the optimized SHA1 and SHA256 update transforms
for the kernel's crypto library.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
95fca7df0b crypto: x86/sha - glue code for Intel SHA extensions optimized SHA1 & SHA256
This patch adds the glue code to detect and utilize the Intel SHA
extensions optimized SHA1 and SHA256 update transforms when available.

This code has been tested on Broxton for functionality.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
600a2334e8 crypto: x86/sha - Intel SHA Extensions optimized SHA256 transform function
This patch includes the Intel SHA Extensions optimized implementation
of SHA-256 update function. This function has been tested on Broxton
platform and measured a speed up of 3.6x over the SSSE3 implementiation
for 4K blocks.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:06 +08:00
tim
c356a7e975 crypto: x86/sha - Intel SHA Extensions optimized SHA1 transform function
This patch includes the Intel SHA Extensions optimized implementation
of SHA-1 update function. This function has been tested on Broxton
platform and measured a speed up of 3.6x over the SSSE3 implementiation
for 4K blocks.

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-21 22:01:05 +08:00
Dave Hansen
d91cab7813 x86/fpu: Rename XSAVE macros
There are two concepts that have some confusing naming:
 1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
 2. Extended State Component masks (currently called XSTATE_*)

The numbers are (currently) from 0-9.  State component 3 is the
bounds registers for MPX, for instance.

But when we want to enable "state component 3", we go set a bit
in XCR0.  The bit we set is 1<<3.  We can check to see if a
state component feature is enabled by looking at its bit.

The current 'xfeature_bit's are at best xfeature bit _numbers_.
Calling them bits is at best inconsistent with ending the enum
list with 'XFEATURES_NR_MAX'.

This patch renames the enum to be 'xfeature'.  These also
happen to be what the Intel documentation calls a "state
component".

We also want to differentiate these from the "XSTATE_*" macros.
The "XSTATE_*" macros are a mask, and we rename them to match.

These macros are reasonably widely used so this patch is a
wee bit big, but this really is just a rename.

The only non-mechanical part of this is the

	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/

We need a better name for it, but that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:46 +02:00
Andrey Ryabinin
71c6da846b crypto: ghash-clmulni: specify context size for ghash async algorithm
Currently context size (cra_ctxsize) doesn't specified for
ghash_async_alg. Which means it's zero. Thus crypto_create_tfm()
doesn't allocate needed space for ghash_async_ctx, so any
read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid.

Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ryabinin <aryabinin@odin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-09-04 23:21:07 +08:00
Herbert Xu
5e4b8c1fcc crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag
This patch removes the CRYPTO_ALG_AEAD_NEW flag now that everyone
has been converted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-08-17 16:53:53 +08:00
Martin Willi
b1ccc8f4b6 crypto: poly1305 - Add a four block AVX2 variant for x86_64
Extends the x86_64 Poly1305 authenticator by a function processing four
consecutive Poly1305 blocks in parallel using AVX2 instructions.

For large messages, throughput increases by ~15-45% compared to two
block SSE2:

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3826224 opers/sec,  367317552 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5948638 opers/sec,  571069267 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9439110 opers/sec,  906154627 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1367756 opers/sec,  393913872 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2056881 opers/sec,  592381958 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711153 opers/sec, 1068812179 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  574940 opers/sec,  607136745 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1948830 opers/sec, 2057964585 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  293308 opers/sec,  610082096 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates): 1235224 opers/sec, 2569267792 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  684405 opers/sec, 2825226316 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  367101 opers/sec, 3019039446 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:29 +08:00
Martin Willi
da35b22df3 crypto: poly1305 - Add a two block SSE2 variant for x86_64
Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
unrolling can be more effectively mapped to SSE instructions, further
increasing throughput.

For large messages, throughput increases by ~45-65% compared to single
block SSE2:

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3809514 opers/sec,  365713411 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5973423 opers/sec,  573448627 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9446779 opers/sec,  906890803 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1364814 opers/sec,  393066691 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2045780 opers/sec,  589184697 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3711946 opers/sec, 1069040592 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  573686 opers/sec,  605812732 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1647802 opers/sec, 1740079440 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  292970 opers/sec,  609378224 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  943229 opers/sec, 1961916528 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  494623 opers/sec, 2041804569 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  254045 opers/sec, 2089271014 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:28 +08:00
Martin Willi
c70f4abef0 crypto: poly1305 - Add a SSE2 SIMD variant for x86_64
Implements an x86_64 assembler driver for the Poly1305 authenticator. This
single block variant holds the 130-bit integer in 5 32-bit words, but uses
SSE to do two multiplications/additions in parallel.

When calling updates with small blocks, the overhead for kernel_fpu_begin/
kernel_fpu_end() negates the perfmance gain. We therefore use the
poly1305-generic fallback for small updates.

For large messages, throughput increases by ~5-10% compared to
poly1305-generic:

testing speed of poly1305 (poly1305-generic)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 4080026 opers/sec,  391682496 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 6221094 opers/sec,  597225024 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9609750 opers/sec,  922536057 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1459379 opers/sec,  420301267 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2115179 opers/sec,  609171609 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3729874 opers/sec, 1074203856 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  593000 opers/sec,  626208000 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1081536 opers/sec, 1142102332 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  302077 opers/sec,  628320576 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  554384 opers/sec, 1153120176 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  278715 opers/sec, 1150536345 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  140202 opers/sec, 1153022070 bytes/sec

testing speed of poly1305 (poly1305-simd)
test  0 (   96 byte blocks,   16 bytes per update,   6 updates): 3790063 opers/sec,  363846076 bytes/sec
test  1 (   96 byte blocks,   32 bytes per update,   3 updates): 5913378 opers/sec,  567684355 bytes/sec
test  2 (   96 byte blocks,   96 bytes per update,   1 updates): 9352574 opers/sec,  897847104 bytes/sec
test  3 (  288 byte blocks,   16 bytes per update,  18 updates): 1362145 opers/sec,  392297990 bytes/sec
test  4 (  288 byte blocks,   32 bytes per update,   9 updates): 2007075 opers/sec,  578037628 bytes/sec
test  5 (  288 byte blocks,  288 bytes per update,   1 updates): 3709811 opers/sec, 1068425798 bytes/sec
test  6 ( 1056 byte blocks,   32 bytes per update,  33 updates):  566272 opers/sec,  597984182 bytes/sec
test  7 ( 1056 byte blocks, 1056 bytes per update,   1 updates): 1111657 opers/sec, 1173910108 bytes/sec
test  8 ( 2080 byte blocks,   32 bytes per update,  65 updates):  288857 opers/sec,  600823808 bytes/sec
test  9 ( 2080 byte blocks, 2080 bytes per update,   1 updates):  590746 opers/sec, 1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update,   1 updates):  301825 opers/sec, 1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update,   1 updates):  153075 opers/sec, 1258896201 bytes/sec

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:27 +08:00
Martin Willi
3d1e93cdf1 crypto: chacha20 - Add an eight block AVX2 variant for x86_64
Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.

For large messages, throughput increases by ~55-70% compared to four block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:25 +08:00
Martin Willi
274f938e0a crypto: chacha20 - Add a four block SSSE3 variant for x86_64
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.

For large messages, throughput increases by ~110% compared to single block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:25 +08:00
Martin Willi
c9320b6dcb crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.

For large messages, throughput increases by ~65% compared to
chacha20-generic:

testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)

Benchmark results from a Core i5-4670T.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-17 21:20:24 +08:00
Herbert Xu
e9b8d2c20a crypto: aesni - Use new IV convention
This patch converts rfc4106 to the new calling convention where
the IV is now in the AD and needs to be skipped.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-07-14 14:56:47 +08:00
Tadeusz Struk
0fbafd06bd crypto: aesni - fix failing setkey for rfc4106-gcm-aesni
rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-29 16:06:30 +08:00
Linus Torvalds
44d21c3f3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.2:

  API:

   - Convert RNG interface to new style.

   - New AEAD interface with one SG list for AD and plain/cipher text.
     All external AEAD users have been converted.

   - New asymmetric key interface (akcipher).

  Algorithms:

   - Chacha20, Poly1305 and RFC7539 support.

   - New RSA implementation.

   - Jitter RNG.

   - DRBG is now seeded with both /dev/random and Jitter RNG.  If kernel
     pool isn't ready then DRBG will be reseeded when it is.

   - DRBG is now the default crypto API RNG, replacing krng.

   - 842 compression (previously part of powerpc nx driver).

  Drivers:

   - Accelerated SHA-512 for arm64.

   - New Marvell CESA driver that supports DMA and more algorithms.

   - Updated powerpc nx 842 support.

   - Added support for SEC1 hardware to talitos"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (292 commits)
  crypto: marvell/cesa - remove COMPILE_TEST dependency
  crypto: algif_aead - Temporarily disable all AEAD algorithms
  crypto: af_alg - Forbid the use internal algorithms
  crypto: echainiv - Only hold RNG during initialisation
  crypto: seqiv - Add compatibility support without RNG
  crypto: eseqiv - Offer normal cipher functionality without RNG
  crypto: chainiv - Offer normal cipher functionality without RNG
  crypto: user - Add CRYPTO_MSG_DELRNG
  crypto: user - Move cryptouser.h to uapi
  crypto: rng - Do not free default RNG when it becomes unused
  crypto: skcipher - Allow givencrypt to be NULL
  crypto: sahara - propagate the error on clk_disable_unprepare() failure
  crypto: rsa - fix invalid select for AKCIPHER
  crypto: picoxcell - Update to the current clk API
  crypto: nx - Check for bogus firmware properties
  crypto: marvell/cesa - add DT bindings documentation
  crypto: marvell/cesa - add support for Kirkwood and Dove SoCs
  crypto: marvell/cesa - add support for Orion SoCs
  crypto: marvell/cesa - add allhwsupport module parameter
  crypto: marvell/cesa - add support for all armada SoCs
  ...
2015-06-22 21:04:48 -07:00
Jeremiah Mahler
de1e00871d crypto: aesni - fix crypto_fpu_exit() section mismatch
The '__init aesni_init()' function calls the '__exit crypto_fpu_exit()'
function directly.  Since they are in different sections, this generates
a warning.

  make CONFIG_DEBUG_SECTION_MISMATCH=y
  ...
  WARNING: arch/x86/crypto/aesni-intel.o(.init.text+0x12b): Section
  mismatch in reference from the function init_module() to the function
  .exit.text:crypto_fpu_exit()
  The function __init init_module() references
  a function __exit crypto_fpu_exit().
  This is often seen when error handling in the init function
  uses functionality in the exit path.
  The fix is often to remove the __exit annotation of
  crypto_fpu_exit() so it may be used outside an exit section.

Fix the warning by removing the __exit annotation.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-15 18:15:58 +08:00
Herbert Xu
b7c89d9e2f crypto: aesni - Convert rfc4106 to new AEAD interface
This patch converts the low-level __gcm-aes-aesni algorithm to
the new AEAD interface.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-03 10:51:24 +08:00
Herbert Xu
af05b3009b crypto: aesni - Convert top-level rfc4106 algorithm to new interface
This patch converts rfc4106-gcm-aesni to the new AEAD interface.
The low-level interface remains as is for now because we can't
touch it until cryptd itself is upgraded.

In the conversion I've also removed the duplicate copy of the
context in the top-level algorithm.  Now all processing is carried
out in the low-level __driver-gcm-aes-aesni algorithm.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-03 10:48:36 +08:00
Herbert Xu
6d7e3d8995 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Merge the crypto tree for 4.1 to pull in the changeset that disables
algif_aead.
2015-05-28 11:16:41 +08:00
Ingo Molnar
b54b4bbbf5 x86/fpu, crypto: Fix AVX2 feature tests
For some CPU models I broke the AVX2 feature detection in:

  7bc371faa9 ("x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks")
  534ff06e39 ("x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks")

... because I did not realize that it's possible for a CPU to support
the xstate necessary for AVX2 execution (XSTATE_YMM), but not have
the AVX2 instructions themselves.

Restore the necessary CPUID checks as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-22 10:58:45 +02:00
Ingo Molnar
57dd083e0c x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c
This file only uses the public FPU APIs, so remove the xcr.h, fpu/xstate.h
and fpu/internal.h headers and add the fpu/api.h include.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
534ff06e39 x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
d1e509660c x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:59 +02:00
Ingo Molnar
1debf7db2b x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
c93b8a3963 x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
d5d34d98d2 x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:58 +02:00
Ingo Molnar
c1c23f7e5e x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
4eecd2616d x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
7bc371faa9 x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
70d51eb65d x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit.

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:57 +02:00
Ingo Molnar
ce4f5f9b65 x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.

This has the following advantages to the driver:

 - Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.

 - Removes detection complexity from the driver, no more raw XGETBV instruction

 - Shrinks the code a bit:

     text    data     bss     dec     hex filename
     2128    2896       0    5024    13a0 camellia_aesni_avx_glue.o.before
     2067    2896       0    4963    1363 camellia_aesni_avx_glue.o.after

 - Standardizes feature name error message printouts across drivers

There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:56 +02:00
Ingo Molnar
669ebabb79 x86/fpu: Rename fpu/xsave.h to fpu/xstate.h
'xsave' is an x86 instruction name to most people - but xsave.h is
about a lot more than just the XSAVE instruction: it includes
definitions and support, both internal and external, related to
xstate and xfeatures support.

As a first step in cleaning up the various xstate uses rename this
header to 'fpu/xstate.h' to better reflect what this header file
is about.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:54 +02:00
Ingo Molnar
78f7f1e54b x86/fpu: Rename fpu-internal.h to fpu/internal.h
This unifies all the FPU related header files under a unified, hiearchical
naming scheme:

 - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                         widely included in almost all kernel code, and hence kept
                         as small as possible.

 - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.

 - asm/fpu/internal.h:   FPU subsystem internal methods

 - asm/fpu/xsave.h:      XSAVE support internal methods

(Also standardize the header guard in asm/fpu/internal.h.)

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:31 +02:00
Ingo Molnar
a137fb6bbf x86/fpu: Move xsave.h to fpu/xsave.h
Move the xsave.h header file to the FPU directory as well.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00
Ingo Molnar
df6b35f409 x86/fpu: Rename i387.h to fpu/api.h
We already have fpu/types.h, move i387.h to fpu/api.h.

The file name has become a misnomer anyway: it offers generic FPU APIs,
but is not limited to i387 functionality.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:47:30 +02:00