linux-stable/arch/x86/entry
Andy Lutomirski 7536656f08 x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

 1. It checked the interrupt frame's CS and EIP.  This wasn't
    obviously correct on Xen or if vm86 mode was in use [1].

 2. In the NMI handler, it did some frightening digging into the
    stack frame.  I'm not convinced this digging was correct.

 3. The fixup didn't switch stacks and then switch back.  Instead, it
    synthesized a brand new stack frame that would redirect the IRET
    back to the SYSENTER code.  That frame was highly questionable.
    For one thing, if NMI nested inside #DB, we would effectively
    abort the #DB prologue, which was probably safe but was
    frightening.  For another, the code used PUSHFL to write the
    FLAGS portion of the frame, which was simply bogus -- by the time
    PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
    flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
    shenanigans as far as I can tell.  A user can't point EIP at
    entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
    like all kernel addresses, is greater than 0xffff and would thus
    violate the CS segment limit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
..
syscalls x86/syscalls/64: Mark sys_iopl() as using ptregs 2016-02-01 08:53:25 +01:00
vdso x86/vdso: Use static_cpu_has() 2016-01-30 11:22:23 +01:00
vsyscall x86/vdso: Disallow vvar access to vclock IO for never-used vclocks 2016-01-12 11:59:35 +01:00
calling.h x86/asm/entry: Remove unused SAVE_ALL/RESTORE_ALL macros for !CONFIG_x86_64 2016-01-19 08:24:03 +01:00
common.c x86/entry/compat: Keep TS_COMPAT set during signal delivery 2016-02-17 09:51:06 +01:00
entry_32.S x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup 2016-03-10 09:48:14 +01:00
entry_64.S x86/entry/64: Fix fast-path syscall return register state 2016-02-01 08:53:25 +01:00
entry_64_compat.S x86/entry: Vastly simplify SYSENTER TF (single-step) handling 2016-03-10 09:48:13 +01:00
Makefile x86/entry: Move C entry and exit code to arch/x86/entry/common.c 2015-07-07 10:59:05 +02:00
syscall_32.c x86/syscalls: Add syscall entry qualifiers 2016-01-29 09:46:38 +01:00
syscall_64.c x86/entry/64: Always run ptregs-using syscalls on the slow path 2016-01-29 09:46:38 +01:00
thunk_32.S
thunk_64.S