linux-stable/arch/x86
Sebastian Andrzej Siewior 5cf0791da5 x86/mm: Disable preemption during CR3 read+write
There's a subtle preemption race on UP kernels:

Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:

 -> mmput()
 -> exit_mmap()
 -> tlb_finish_mmu()
 -> tlb_flush_mmu()
 -> tlb_flush_mmu_tlbonly()
 -> tlb_flush()
 -> flush_tlb_mm_range()
 -> __flush_tlb_up()
 -> __flush_tlb()
 ->  __native_flush_tlb()

At this point current->mm is NULL but current->active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:37:16 +02:00
..
boot Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:37:12 -04:00
configs arch/defconfig: remove CONFIG_RESOURCE_COUNTERS 2016-05-23 17:04:14 -07:00
crypto x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads 2016-07-20 09:39:50 +02:00
entry x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text 2016-08-10 14:19:33 +02:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-06 09:10:36 -04:00
ia32 mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
include x86/mm: Disable preemption during CR3 read+write 2016-08-10 15:37:16 +02:00
kernel x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization 2016-08-10 14:45:19 +02:00
kvm nvmx: mark ept single context invalidation as supported 2016-08-04 14:21:52 +02:00
lguest lguest: Read offset of device_cap later 2016-06-10 11:39:09 +02:00
lib x86/hweight: Don't clobber %rdi 2016-08-08 10:58:25 -07:00
math-emu
mm x86/mm/KASLR: Increase BRK pages for KASLR memory randomization 2016-08-10 14:45:19 +02:00
net bpf, x86: add support for constant blinding 2016-05-16 13:49:32 -04:00
oprofile x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage 2016-04-13 11:37:41 +02:00
pci dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
platform RTC for 4.8 2016-08-05 09:48:22 -04:00
power x86/power/64: Do not refer to __PAGE_OFFSET from assembly code 2016-08-03 01:35:38 +02:00
purgatory Add sancov plugin 2016-06-07 22:57:10 +02:00
ras x86/RAS/AMD: Reduce the number of IPIs when prepping error injection 2016-07-08 11:29:26 +02:00
realmode Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:37:12 -04:00
tools x86/insn: Add AVX-512 support to the instruction decoder 2016-07-21 09:37:11 -03:00
um Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2016-08-04 19:37:59 -04:00
video x86/video: Don't assume all FB devices are PCI devices 2016-03-15 11:08:26 +01:00
xen kexec: allow kdump with crash_kexec_post_notifiers 2016-08-02 19:35:30 -04:00
.gitignore
Kbuild perf/x86: Move perf_event.c ............... => x86/events/core.c 2016-02-09 10:23:49 +01:00
Kconfig Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
Kconfig.cpu
Kconfig.debug Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 09:32:27 -07:00
Makefile kbuild: abort build on bad stack protector flag 2016-07-26 16:19:19 -07:00
Makefile.um
Makefile_32.cpu