linux-stable/arch/x86/lib
Matt Fleming df13086490 x86/asm/64: Align start of __clear_user() loop to 16-bytes
commit bb5570ad3b upstream.

x86 CPUs can suffer severe performance drops if a tight loop, such as
the ones in __clear_user(), straddles a 16-byte instruction fetch
window, or worse, a 64-byte cacheline. This issues was discovered in the
SUSE kernel with the following commit,

  1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

which increased the code object size from 10 bytes to 15 bytes and
caused the 8-byte copy loop in __clear_user() to be split across a
64-byte cacheline.

Aligning the start of the loop to 16-bytes makes this fit neatly inside
a single instruction fetch window again and restores the performance of
__clear_user() which is used heavily when reading from /dev/zero.

Here are some numbers from running libmicro's read_z* and pread_z*
microbenchmarks which read from /dev/zero:

  Zen 1 (Naples)

  libmicro-file
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                    revert-1153933703d9+               align16+
  Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
  Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
  Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
  Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
  Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
  Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)

Note that this doesn't affect Haswell or Broadwell microarchitectures
which seem to avoid the alignment issue by executing the loop straight
out of the Loop Stream Detector (verified using perf events).

Fixes: 1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 15:37:08 -04:00
..
.gitignore
atomic64_32.c
atomic64_386_32.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
atomic64_cx8_32.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cache-smp.c smp: Remove smp_call_function() and on_each_cpu() return values 2019-06-23 14:26:26 +02:00
checksum_32.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
clear_page_64.S treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
cmdline.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 2019-06-19 17:09:53 +02:00
cmpxchg8b_emu.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
cmpxchg16b_emu.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
copy_page_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
copy_user_64.S x86/asm: Make some functions local labels 2019-09-06 10:41:11 +02:00
cpu.c x86/lib/cpu: Address missing prototypes warning 2019-08-08 08:25:53 +02:00
csum-copy_64.S x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups 2018-09-03 15:12:09 +02:00
csum-partial_64.c x86/lib: Fix indentation issue, remove extra tab 2019-03-21 12:24:38 +01:00
csum-wrappers_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223 2019-05-30 11:29:55 -07:00
delay.c x86/asm: Fix MWAITX C-state hint value 2019-10-08 13:25:24 +02:00
error-inject.c x86/asm: Mark all top level asm statements as .text 2019-04-19 17:46:55 +02:00
getuser.S x86/asm: Make some functions local labels 2019-09-06 10:41:11 +02:00
hweight.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
insn-eval.c x86/insn-eval: Fix use-after-free access to LDT entry 2019-06-07 11:11:06 -07:00
insn.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
iomap_copy_64.S treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 2019-06-19 17:09:56 +02:00
iomem.c x86: explicitly align IO accesses in memcpy_{to,from}io 2019-02-01 09:07:48 -08:00
kaslr.c x86/kaslr: Fix incorrect i8254 outb() parameters 2019-01-11 21:35:47 +01:00
Makefile Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-05-06 13:50:15 -07:00
memcpy_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memcpy_64.S treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
memmove_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memset_64.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
misc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mmx_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg-export.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg.S License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-smp.c x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well 2018-03-28 10:34:13 +02:00
msr.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
putuser.S x86/asm: Make some functions local labels 2019-09-06 10:41:11 +02:00
retpoline.S Revert "x86/retpoline: Simplify vmexit_fill_RSB()" 2018-02-20 09:38:26 +01:00
string_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
strstr_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
usercopy.c x86/nmi: Fix NMI uaccess race against CR3 switching 2018-08-31 17:08:22 +02:00
usercopy_32.c docs/core-api/mm: fix user memory accessors formatting 2019-03-05 21:07:20 -08:00
usercopy_64.c x86/asm/64: Align start of __clear_user() loop to 16-bytes 2020-06-30 15:37:08 -04:00
x86-opcode-map.txt x86/decoder: Add TEST opcode to Group3-2 2020-02-24 08:36:55 +01:00