linux-stable/arch/x86/lib
Linus Torvalds 47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd5 ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ff: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
..
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
atomic64_32.c
atomic64_386_32.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
atomic64_cx8_32.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
cache-smp.c smp: Remove smp_call_function() and on_each_cpu() return values 2019-06-23 14:26:26 +02:00
checksum_32.S x86/checksum_32: Remove .fixup usage 2021-12-11 09:09:49 +01:00
clear_page_64.S x86: improve on the non-rep 'clear_user' function 2023-04-18 17:05:28 -07:00
cmdline.c x86/lib: Fix compiler and kernel-doc warnings 2023-01-03 18:46:21 +01:00
cmpxchg8b_emu.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
cmpxchg16b_emu.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
copy_mc.c x86, libnvdimm/test: Remove COPY_MC_TEST 2020-10-26 18:08:35 +01:00
copy_mc_64.S x86/copy_mc_64: Remove .fixup usage 2021-12-11 09:09:46 +01:00
copy_page_64.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
copy_user_64.S x86: re-introduce support for ERMS copies for user space accesses 2023-05-26 12:34:20 -07:00
copy_user_uncached_64.S x86: rewrite '__copy_user_nocache' function 2023-04-20 18:53:49 -07:00
cpu.c x86/lib/cpu: Address missing prototypes warning 2019-08-08 08:25:53 +02:00
csum-copy_64.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
csum-partial_64.c uml/x86: use x86 load_unaligned_zeropad() 2022-01-30 21:26:39 -05:00
csum-wrappers_64.c net: unexport csum_and_copy_{from,to}_user 2022-04-29 14:37:59 -07:00
delay.c x86/delay: Fix the wrong asm constraint in delay_loop() 2022-04-05 21:21:57 +02:00
error-inject.c x86/error_inject: Align function properly 2022-10-17 16:40:59 +02:00
getuser.S x86/mm: Rework address range check in get_user() and put_user() 2023-03-16 13:08:38 -07:00
hweight.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
inat.c x86/insn: Add a __ignore_sync_check__ marker 2021-03-15 11:00:57 +01:00
insn-eval.c x86/insn: Avoid namespace clash by separating instruction decoder MMIO type from MMIO trace type 2023-01-03 18:46:06 +01:00
insn.c x86/insn: Use get_unaligned() instead of memcpy() 2021-10-06 11:56:37 +02:00
iomap_copy_64.S x86/asm: Fix an assembler warning with current binutils 2023-01-03 17:55:11 +01:00
iomem.c x86: kmsan: handle open-coded assembly in lib/iomem.c 2022-10-03 14:03:24 -07:00
kaslr.c x86/kaslr: Fix build warning in KASLR code in boot stub 2022-04-11 09:41:12 +02:00
Makefile x86: rewrite '__copy_user_nocache' function 2023-04-20 18:53:49 -07:00
memcpy_32.c x86/mem: Move memmove to out of line assembler 2022-11-01 15:44:07 -07:00
memcpy_64.S x86: don't use REP_GOOD or ERMS for small memory copies 2023-04-18 17:05:28 -07:00
memmove_32.S x86/mem: Move memmove to out of line assembler 2022-11-01 15:44:07 -07:00
memmove_64.S entry, kasan, x86: Disallow overriding mem*() functions 2023-01-13 11:48:17 +01:00
memset_64.S x86: don't use REP_GOOD or ERMS for small memory clearing 2023-04-18 17:05:28 -07:00
misc.c x86/lib: Include <asm/misc.h> to fix a missing prototypes warning at build time 2023-01-03 11:11:03 +01:00
msr-reg-export.c
msr-reg.S x86: Prepare asm files for straight-line-speculation 2021-12-08 12:25:37 +01:00
msr-smp.c x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes 2021-03-22 21:37:03 +01:00
msr.c x86/msr: Make locally used functions static 2021-04-08 11:57:40 +02:00
pc-conf-reg.c x86: Add support for 0x22/0x23 port I/O configuration space 2021-08-10 23:31:43 +02:00
putuser.S x86/mm: Rework address range check in get_user() and put_user() 2023-03-16 13:08:38 -07:00
retpoline.S x86/retbleed: Fix return thunk alignment 2023-05-12 17:19:53 -05:00
string_32.c lib/string: Move helper functions out of string.c 2021-09-25 08:20:49 -07:00
strstr_32.c
usercopy.c x86/uaccess: instrument copy_from_user_nmi() 2022-11-08 15:57:24 -08:00
usercopy_32.c x86/usercopy: Remove .fixup usage 2021-12-11 09:09:50 +01:00
usercopy_64.c - Unify duplicated __pa() and __va() definitions 2023-04-28 09:22:30 -07:00
x86-opcode-map.txt x86/opcode: Add the LKGS instruction to x86-opcode-map 2023-01-12 13:06:36 +01:00