Commit graph

271 commits

Author SHA1 Message Date
Borislav Petkov
de82fbc382 x86/fpu: Remove check_fpu() indirection
Rename it to fpu__init_check_bugs() and do the CPU feature check at
entry, thus getting rid of the old fpu__init_check_bugs() wrapper.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:43 +02:00
Borislav Petkov
78df526c74 x86/fpu/regset: Replace static_cpu_has() usage with boot_cpu_has()
fpregs_{g,s}et() are not sizzling-hot paths to justify the need for
static_cpu_has(). Use the normal boot_cpu_has() helper.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459837795-2588-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:42 +02:00
Borislav Petkov
782511b00f x86/cpufeature: Replace cpu_has_xsaves with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <kvm@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:42 +02:00
Borislav Petkov
d366bf7eb9 x86/cpufeature: Replace cpu_has_xsave with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1459801503-15600-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
01f8fd7379 x86/cpufeature: Replace cpu_has_fxsr with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
a402a8dffc x86/cpufeature: Replace cpu_has_fpu with boot_cpu_has() usage
Use static_cpu_has() in the timing-sensitive paths in fpstate_init() and
fpu__copy().

While at it, simplify the use in init_cyrix() and get rid of the ternary
operator.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:40 +02:00
Borislav Petkov
dda9edf7c1 x86/cpufeature: Replace cpu_has_xmm with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:40 +02:00
Linus Torvalds
d88f48e128 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix hotplug bugs
   - fix irq live lock
   - fix various topology handling bugs
   - fix APIC ACK ordering
   - fix PV iopl handling
   - fix speling
   - fix/tweak memcpy_mcsafe() return value
   - fix fbcon bug
   - remove stray prototypes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/msr: Remove unused native_read_tscp()
  x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
  x86/oprofile/nmi: Add missing hotplug FROZEN handling
  x86/hpet: Use proper mask to modify hotplug action
  x86/apic/uv: Fix the hotplug notifier
  x86/apb/timer: Use proper mask to modify hotplug action
  x86/topology: Use total_cpus not nr_cpu_ids for logical packages
  x86/topology: Fix Intel HT disable
  x86/topology: Fix logical package mapping
  x86/irq: Cure live lock in fixup_irqs()
  x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
  x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
  x86/iopl: Fix iopl capability check on Xen PV
  x86/iopl/64: Properly context-switch IOPL on Xen PV
  selftests/x86: Add an iopl test
  x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
  x86/video: Don't assume all FB devices are PCI devices
  arch/x86/irq: Purge useless handler declarations from hw_irq.h
  x86: Fix misspellings in comments
2016-03-24 09:47:32 -07:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Ingo Molnar
00f5268501 Merge branch 'x86/cleanups' into x86/urgent
Pull in some merge window leftovers.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:44:57 +01:00
Linus Torvalds
ecc026bff6 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The biggest change in terms of impact is the changing of the FPU
  context switch model to 'eagerfpu' for all CPU types, via: commit
  58122bf1d8: "x86/fpu: Default eagerfpu=on on all CPUs"

  This makes all FPU saves and restores synchronous and makes the FPU
  code a lot more obvious to read.  In the next cycle, if this change is
  problem free, we'll remove the old lazy FPU restore code altogether.

  This change flushed out some old bugs, which should all be fixed by
  now, BYMMV"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Default eagerfpu=on on all CPUs
  x86/fpu: Speed up lazy FPU restores slightly
  x86/fpu: Fold fpu_copy() into fpu__copy()
  x86/fpu: Fix FNSAVE usage in eagerfpu mode
  x86/fpu: Fix math emulation in eager fpu mode
2016-03-15 10:23:56 -07:00
Linus Torvalds
ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Fenghua Yu
d050049442 x86/cpufeature: Enable new AVX-512 features
A few new AVX-512 instruction groups/features are added in cpufeatures.h
for enuermation: AVX512DQ, AVX512BW, and AVX512VL.

Clear the flags in fpu__xstate_clear_all_cpu_caps().

The specification for latest AVX-512 including the features can be found at:

  https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up
the flags and enable them in KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com
[ Added more detailed feature descriptions. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12 17:30:53 +01:00
Borislav Petkov
6e6867093d x86/fpu: Fix eager-FPU handling on legacy FPU machines
i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.

So after we made eagerfpu the default for all CPU types:

  58122bf1d8 x86/fpu: Default eagerfpu=on on all CPUs

these old FPU designs broke. First, Andy Shevchenko reported a splat:

  WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160

which was us trying to execute FXRSTOR on those machines even though
they don't support it.

After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.

Take care of all that.

Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12 16:13:55 +01:00
Yu-cheng Yu
a65050c6f1 x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:

  394db20ca2 ("x86/fpu: Disable AVX when eagerfpu is off")

... is incorrect and that the original behavior of the FPU code was correct.

Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:

  Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:

   'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
    MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
    an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
    by the new task.'

  Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
  Specification:

   'AVX instructions refer to exceptions by classes that include #NM
    "Device Not Available" exception for lazy context switch.'

So revert the commit.

Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:15:58 +01:00
Andy Lutomirski
f363938c70 x86/fpu: Fix 'no387' regression
After fixing FPU option parsing, we now parse the 'no387' boot option
too early: no387 clears X86_FEATURE_FPU before it's even probed, so
the boot CPU promptly re-enables it.

I suspect it gets even more confused on SMP.

Fix the probing code to leave X86_FEATURE_FPU off if it's been
disabled by setup_clear_cpu_cap().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Fixes: 4f81cbafcc ("x86/fpu: Fix early FPU command-line parsing")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-09 13:54:40 +01:00
Adam Buchbinder
6a6256f9e0 x86: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:44:58 +01:00
Dave Hansen
62b5f7d013 mm/core, x86/mm/pkeys: Add execute-only protection keys support
Protection keys provide new page-based protection in hardware.
But, they have an interesting attribute: they only affect data
accesses and never affect instruction fetches.  That means that
if we set up some memory which is set as "access-disabled" via
protection keys, we can still execute from it.

This patch uses protection keys to set up mappings to do just that.
If a user calls:

	mmap(..., PROT_EXEC);
or
	mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only without PROT_READ/WRITE), the kernel will
notice this, and set a special protection key on the memory.  It
also sets the appropriate bits in the Protection Keys User Rights
(PKRU) register so that the memory becomes unreadable and
unwritable.

I haven't found any userspace that does this today.  With this
facility in place, we expect userspace to move to use it
eventually.  Userspace _could_ start doing this today.  Any
PROT_EXEC calls get converted to PROT_READ inside the kernel, and
would transparently be upgraded to "true" PROT_EXEC with this
code.  IOW, userspace never has to do any PROT_EXEC runtime
detection.

This feature provides enhanced protection against leaking
executable memory contents.  This helps thwart attacks which are
attempting to find ROP gadgets on the fly.

But, the security provided by this approach is not comprehensive.
The PKRU register which controls access permissions is a normal
user register writable from unprivileged userspace.  An attacker
who can execute the 'wrpkru' instruction can easily disable the
protection provided by this feature.

The protection key that is used for execute-only support is
permanently dedicated at compile time.  This is fine for now
because there is currently no API to set a protection key other
than this one.

Despite there being a constant PKRU value across the entire
system, we do not set it unless this feature is in use in a
process.  That is to preserve the PKRU XSAVE 'init state',
which can lead to faster context switches.

PKRU *is* a user register and the kernel is modifying it.  That
means that code doing:

	pkru = rdpkru()
	pkru |= 0x100;
	mmap(..., PROT_EXEC);
	wrpkru(pkru);

could lose the bits in PKRU that enforce execute-only
permissions.  To avoid this, we suggest avoiding ever calling
mmap() or mprotect() when the PKRU value is expected to be
unstable.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: keescook@google.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210240.CB4BB5CA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:33 +01:00
Dave Hansen
8459429693 x86/mm/pkeys: Allow kernel to modify user pkey rights register
The Protection Key Rights for User memory (PKRU) is a 32-bit
user-accessible register.  It contains two bits for each
protection key: one to write-disable (WD) access to memory
covered by the key and another to access-disable (AD).

Userspace can read/write the register with the RDPKRU and WRPKRU
instructions.  But, the register is saved and restored with the
XSAVE family of instructions, which means we have to treat it
like a floating point register.

The kernel needs to write to the register if it wants to
implement execute-only memory or if it implements a system call
to change PKRU.

To do this, we need to create a 'pkru_state' buffer, read the old
contents in to it, modify it, and then tell the FPU code that
there is modified data in there so it can (possibly) move the
buffer back in to the registers.

This uses the fpu__xfeature_set_state() function that we defined
in the previous patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210236.0BE13217@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:32 +01:00
Dave Hansen
b8b9b6ba9d x86/fpu: Allow setting of XSAVE state
We want to modify the Protection Key rights inside the kernel, so
we need to change PKRU's contents.  But, if we do a plain
'wrpkru', when we return to userspace we might do an XRSTOR and
wipe out the kernel's 'wrpkru'.  So, we need to go after PKRU in
the xsave buffer.

We do this by:

  1. Ensuring that we have the XSAVE registers (fpregs) in the
     kernel FPU buffer (fpstate)
  2. Looking up the location of a given state in the buffer
  3. Filling in the stat
  4. Ensuring that the hardware knows that state is present there
     (basically that the 'init optimization' is not in place).
  5. Copying the newly-modified state back to the registers if
     necessary.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210235.5A3139BF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:32 +01:00
Dave Hansen
c8df400984 x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures
The protection keys register (PKRU) is saved and restored using
xsave.  Define the data structure that we will use to access it
inside the xsave buffer.

Note that we also have to widen the printk of the xsave feature
masks since this is feature 0x200 and we only did two characters
before.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:14 +01:00
Dave Hansen
1f96b1efba x86/fpu: Add placeholder for 'Processor Trace' XSAVE state
There is an XSAVE state component for Intel Processor Trace (PT).
But, we do not currently use it.

We add a placeholder in the code for it so it is not a mystery and
also so we do not need an explicit enum initialization for Protection
Keys in a moment.

Why don't we use it?

We might end up using this at _some_ point in the future.  But,
this is a "system" state which requires using the currently
unsupported XSAVES feature.  Unlike all the other XSAVE states,
PT state is also not directly tied to a thread.  You might
context-switch between threads, but not want to change any of the
PT state.  Or, you might switch between threads, and *do* want to
change PT state, all depending on what is being traced.

We currently just manually set some MSRs to do this PT context
switching, and it is unclear whether replacing our direct MSR use
with XSAVE will be a net win or loss, both in code complexity and
performance.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: fenghua.yu@intel.com
Cc: linux-mm@kvack.org
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:13 +01:00
Andy Lutomirski
58122bf1d8 x86/fpu: Default eagerfpu=on on all CPUs
We have eager and lazy FPU modes, introduced in:

  304bceda6a ("x86, fpu: use non-lazy fpu restore for processors supporting xsave")

The result is rather messy.  There are two code paths in almost all
of the FPU code, and only one of them (the eager case) is tested
frequently, since most kernel developers have new enough hardware
that we use eagerfpu.

It seems that, on any remotely recent hardware, eagerfpu is a win:
glibc uses SSE2, so laziness is probably overoptimistic, and, in any
case, manipulating TS is far slower that saving and restoring the
full state.  (Stores to CR0.TS are serializing and are poorly
optimized.)

To try to shake out any latent issues on old hardware, this changes
the default to eager on all CPUs.  If no performance or functionality
problems show up, a subsequent patch could remove lazy mode entirely.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/ac290de61bf08d9cfc2664a4f5080257ffc1075a.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 15:42:56 +01:00
Andy Lutomirski
a20d729704 x86/fpu: Fold fpu_copy() into fpu__copy()
Splitting it into two functions needlessly obfuscated the code.
While we're at it, improve the comment slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/3eb5a63a9c5c84077b2677a7dfe684eef96fe59e.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 15:42:55 +01:00
Andy Lutomirski
5ed73f4073 x86/fpu: Fix FNSAVE usage in eagerfpu mode
In eager fpu mode, having deactivated FPU without immediately
reloading some other context is illegal.  Therefore, to recover from
FNSAVE, we can't just deactivate the state -- we need to reload it
if we're not actively context switching.

We had this wrong in fpu__save() and fpu__copy().  Fix both.
__kernel_fpu_begin() was fine -- add a comment.

This fixes a warning triggerable with nofxsr eagerfpu=on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 15:42:55 +01:00
Andy Lutomirski
4ecd16ec70 x86/fpu: Fix math emulation in eager fpu mode
Systems without an FPU are generally old and therefore use lazy FPU
switching. Unsurprisingly, math emulation in eager FPU mode is a
bit buggy. Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON()
that was incorrect.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 15:42:55 +01:00
yu-cheng yu
394db20ca2 x86/fpu: Disable AVX when eagerfpu is off
When "eagerfpu=off" is given as a command-line input, the kernel
should disable AVX support.

The Task Switched bit used for lazy context switching does not
support AVX. If AVX is enabled without eagerfpu context
switching, one task's AVX state could become corrupted or leak
to other tasks. This is a bug and has bad security implications.

This only affects systems that have AVX/AVX2/AVX512 and this
issue will be found only when one actually uses AVX/AVX2/AVX512
_AND_ does eagerfpu=off.

Reference: Intel Software Developer's Manual Vol. 3A

Sec. 2.5 Control Registers:
TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
instruction is actually executed by the new task.

Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
FPU and SSE State
When the TS flag is set, the processor monitors the instruction
stream for x87 FPU, MMX, SSE instructions. When the processor
detects one of these instructions, it raises a
device-not-available exeception (#NM) prior to executing the
instruction.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:21 +01:00
yu-cheng yu
a5fe93a549 x86/fpu: Disable MPX when eagerfpu is off
This issue is a fallout from the command-line parsing move.

When "eagerfpu=off" is given as a command-line input, the kernel
should disable MPX support. The decision for turning off MPX was
made in fpu__init_system_ctx_switch(), which is after the
selection of the XSAVE format. This patch fixes it by getting
that decision done earlier in fpu__init_system_xstate().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:21 +01:00
yu-cheng yu
eb7c5f872e x86/fpu: Disable XGETBV1 when no XSAVE
When "noxsave" is given as a command-line input, the kernel
should disable XGETBV1. This issue currently does not cause any
actual problems. XGETBV1 is only useful if we have something
using the 'init optimization' (i.e. xsaveopt, xsaves). We
already clear both of those in fpu__xstate_clear_all_cpu_caps().
But this is good for completeness.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:21 +01:00
yu-cheng yu
4f81cbafcc x86/fpu: Fix early FPU command-line parsing
The function fpu__init_system() is executed before
parse_early_param(). This causes wrong FPU configuration. This
patch fixes this issue by parsing boot_command_line in the
beginning of fpu__init_system().

With all four patches in this series, each parameter disables
features as the following:

eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx
no387: fpu
nofxsr: fxsr, fxsropt, xmm
noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512,
mpx, xgetbv1 noxsaveopt: xsaveopt
noxsaves: xsaves

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:20 +01:00
Linus Torvalds
6896d9f7e7 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "This cleans up the FPU fault handling methods to be more robust, and
  moves eligible variables to .init.data"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Put a few variables in .init.data
  x86/fpu: Get rid of xstate_fault()
  x86/fpu: Add an XSTATE_OP() macro
2016-01-11 16:56:38 -08:00
Linus Torvalds
67c707e451 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "The main changes in this cycle were:

   - code patching and cpu_has cleanups (Borislav Petkov)

   - paravirt cleanups (Juergen Gross)

   - TSC cleanup (Thomas Gleixner)

   - ptrace cleanup (Chen Gang)"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86/kernel/ptrace.c: Remove unused arg_offs_table
  x86/mm: Align macro defines
  x86/cpu: Provide a config option to disable static_cpu_has
  x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
  x86/cpufeature: Cleanup get_cpu_cap()
  x86/cpufeature: Move some of the scattered feature bits to x86_capability
  x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
  x86/paravirt: Remove unused pv_apic_ops structure
  x86/tsc: Remove unused tsc_pre_init() hook
  x86: Remove unused function cpu_has_ht_siblings()
  x86/paravirt: Kill some unused patching functions
2016-01-11 16:26:03 -08:00
Jiri Olsa
25ec02f2c1 x86/fpu: Properly align size in CHECK_MEMBER_AT_END_OF() macro
The CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) checks whether MEMBER
is last member of TYPE by evaluating:

  offsetof(TYPE::MEMBER) + sizeof(TYPE::MEMBER) == sizeof(TYPE)

and ensuring TYPE::MEMBER is the last member of the TYPE.

This condition breaks on structs that are padded to be
aligned. This patch ensures the TYPE alignment is taken
into account.

This bug was revealed after adding cacheline alignment into
struct sched_entity, which broke task_struct::thread check:

  CHECK_MEMBER_AT_END_OF(struct task_struct, thread);

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450707930-3445-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:06:06 +01:00
Borislav Petkov
362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Rasmus Villemoes
e49a449b86 x86/fpu: Put a few variables in .init.data
These are clearly just used during init.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447424312-26400-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-27 10:23:17 +01:00
Huaitong Han
a05917b6ba x86/fpu: Fix get_xsave_addr() behavior under virtualization
KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.

But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.

Fix it so that the passed in 'xsave' address is used - as intended
originally.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:34:58 +01:00
Dave Hansen
ab6b529475 x86/fpu: Fix 32-bit signal frame handling
(This should have gone to LKML originally. Sorry for the extra
 noise, folks on the cc.)

Background:

Signal frames on x86 have two formats:

  1. For 32-bit executables (whether on a real 32-bit kernel or
     under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

  2. For 64-bit executables (on 64-bit kernels obviously), the
     'fpregset_t' is smaller and does not contain the "FSAVE"
     state.

When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.

Problem:

save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable.  This is for real 32-bit and
ia32 emulation.

But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.

This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog().  But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed).  This leads to the FPU/XSAVE state being initialized.

For MPX, this leads to the most permissive state and means we
silently lose bounds violations.  I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
no one has spotted this bug.

I believe this was broken by:

	72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

way back in 2012.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:23:45 +01:00
Linus Torvalds
4302d506d5 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 sigcontext header cleanups from Ingo Molnar:
 "This series reorganizes and cleans up various aspects of the main
  sigcontext UAPI headers, such as unifying the data structures and
  updating/adding lots of comments to explain all the ABI details and
  quirks.  The headers can now also be built in user-space standalone"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Clean up too long lines
  x86/headers: Remove <asm/sigcontext.h> references on the kernel side
  x86/headers: Remove direct sigcontext32.h uses
  x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
  x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
  x86/headers: Make sigcontext pointers bit independent
  x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
  x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
  x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
  x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
  x86/headers: Unify register type definitions between 32-bit compat and i386
  x86/headers: Use ABI types consistently in sigcontext*.h
  x86/headers: Separate out legacy user-space structure definitions
  x86/headers: Clean up and better document uapi/asm/sigcontext.h
  x86/headers: Clean up uapi/asm/sigcontext32.h
  x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
2015-11-03 21:05:40 -08:00
Dave Hansen
ef78f2a4bf x86/fpu: Check CPU-provided sizes against struct declarations
We now have C structures defined for each of the XSAVE state
components that we support.  This patch adds checks during our
verification pass to ensure that the CPU-provided data
enumerated in CPUID leaves matches our C structures.

If not, we warn and dump all the XSAVE CPUID leaves.

Note: this *actually* found an inconsistency with the MPX
'bndcsr' state.  The hardware pads it out differently from
our C structures.  This patch caught it and warned.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233131.A8DB36DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:02 +02:00
Dave Hansen
e6e888f96b x86/fpu: Check to ensure increasing-offset xstate offsets
The xstate CPUID leaves enumerate where each state component is
inside the XSAVE buffer, along with the size of the entire
buffer.  Our new XSAVE sanity-checking code extrapolates an
expected _total_ buffer size by looking at the last component
that it encounters.

That method requires that the highest-numbered component also
be the one with the highest offset.  This is a pretty safe
assumption, but let's add some code to ensure it stays true.

To make this check work correctly, we also need to ensure we
only consider the offsets from enabled features because the
offset register (ebx) will return 0 on unsupported features.

This also means that we will preserve the -1's that we
initialized xstate_offsets/sizes[] with.  That will help
find bugs.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233130.0843AB15@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:02 +02:00
Dave Hansen
65ac2e9baa x86/fpu: Correct and check XSAVE xstate size calculations
Note: our xsaves support is currently broken and disabled.  This
patch does not fix it, but it is an incremental improvement.

This might be useful to someone backporting the entire set of
XSAVES patches at some point, but it should not be backported
alone.

Ingo said he wanted something like this (bullets 2 and 3):

  http://lkml.kernel.org/r/20150808091508.GB32641@gmail.com

There are currently two xsave buffer formats: standard and
compacted.  The standard format is waht 'XSAVE' and 'XSAVEOPT'
produce while 'XSAVES' and 'XSAVEC' produce a compacted-formet
buffer.  (The kernel never uses XSAVEC)

But, the XSAVES buffer *ALSO* contains "system state components"
which are never saved by a plain XSAVE.  So, XSAVES has two
things that might make its buffer differently-sized from an
XSAVE-produced one.

The current code assumes that an XSAVES buffer's size is simply
the sum of the sizes of the (user) states which are supported.
This seems to work in most cases, but it is not consistent with
what the SDM says, and it breaks if we 'align' a component in
the buffer.  The calculation is also unnecessary work since the
CPU *tells* us the size of the buffer directly.

This patch just reads the size of the buffer right out of the
CPUID leaf instead of trying to derive it.

But, blindly trusting the CPU like this is dangerous.  We add
a verification pass in do_extra_xstate_size_checks() to ensure
that the size we calculate matches with what we see from the
hardware.  When it comes down to it, we trust but verify the
CPU.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233130.234FE1EC@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:01 +02:00
Dave Hansen
633d54c47a x86/fpu: Add xfeature_enabled() helper instead of test_bit()
We currently use test_bit() in a few places to see if an
xfeature is enabled.  It ends up being a bit ugly because
'xfeatures_mask' is a u64 and test_bit wants an 'unsigned long'
so it requires a cast.  The *_bit() functions are also
techincally atomic, which we have no need for here.

So, remove the test_bit()s and replace with the new
xfeature_enabled() helper.

This also provides a central place to add a comment about the
future need to support 'system xstates'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.B1534F86@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:59 +02:00
Dave Hansen
ee9ae257eb x86/fpu: Remove 'xfeature_nr'
xfeature_nr ended up being initialized too late for me to
use it in the "xsave size sanity check" patch which is
later in the series.  I tried to move around its initialization
but realized that it was just as easy to get rid of it.

We only have 9 XFEATURES.  Instead of dynamically calculating
and storing the last feature, just use the compile-time max:
XFEATURES_NR_MAX.  Note that even with 'xfeatures_nr' we can
had "holes" in the xfeatures_mask that we had to deal with.

We also change a 'leaf' variable to be a plain 'i'.  Although
it is used to grab a cpuid leaf in this one loop, all of the
other loops just use an 'i' and I find it much more obvious
to keep the naming consistent across all the similar loops.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233128.3F30DF5A@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:59 +02:00
Dave Hansen
8a93c9e0dc x86/fpu: Rework XSTATE_* macros to remove magic '2'
The 'xstate.c' code has a bunch of references to '2'.  This
is because we have a lot more work to do for the "extended"
xstates than the "legacy" ones and state component 2 is the
first "extended" state.

This patch replaces all of the instances of '2' with
FIRST_EXTENDED_XFEATURE, which clearly explains what is
going on.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233128.A8C0BF51@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:58 +02:00
Dave Hansen
dad8c4fe85 x86/fpu: Rename XFEATURES_NR_MAX
This is a logcal followon to the last patch.  It makes the
XFEATURE_MAX naming consistent with the other enum values.
This is what Ingo suggested.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233127.A541448F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:57 +02:00
Dave Hansen
d91cab7813 x86/fpu: Rename XSAVE macros
There are two concepts that have some confusing naming:
 1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
 2. Extended State Component masks (currently called XSTATE_*)

The numbers are (currently) from 0-9.  State component 3 is the
bounds registers for MPX, for instance.

But when we want to enable "state component 3", we go set a bit
in XCR0.  The bit we set is 1<<3.  We can check to see if a
state component feature is enabled by looking at its bit.

The current 'xfeature_bit's are at best xfeature bit _numbers_.
Calling them bits is at best inconsistent with ending the enum
list with 'XFEATURES_NR_MAX'.

This patch renames the enum to be 'xfeature'.  These also
happen to be what the Intel documentation calls a "state
component".

We also want to differentiate these from the "XSTATE_*" macros.
The "XSTATE_*" macros are a mask, and we rename them to match.

These macros are reasonably widely used so this patch is a
wee bit big, but this really is just a rename.

The only non-mechanical part of this is the

	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/

We need a better name for it, but that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:46 +02:00
Dave Hansen
4109ca066b x86/fpu: Remove XSTATE_RESERVE
The original purpose of XSTATE_RESERVE was to carve out space
to store all of the possible extended state components that
get saved with the XSAVE instruction(s).

However, we are now almost entirely dynamically allocating
the buffers we use for XSAVE by placing them at the end of
the task_struct and them sizing them at boot.  The one
exception for that is the init_task.

The maximum extended state component size that we have today
is on systems with space for AVX-512 and Memory Protection
Keys: 2696 bytes.  We have reserved a PAGE_SIZE buffer in
the init_task via fpregs_state->__padding.

This check ensures that even if the component sizes or
layout were changed (which we do not expect), that we will
still not overflow the init_task's buffer.

In the case that we detect we might overflow the buffer,
we completely disable XSAVE support in the kernel and try
to boot as if we had 'legacy x87 FPU' support in place.
This is a crippled state without any of the XSAVE-enabled
features (MPX, AVX, etc...).  But, it at least let us
boot safely.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233125.D948D475@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
0a26537502 x86/fpu: Move XSAVE-disabling code to a helper
When we want to _completely_ disable XSAVE support as far as
the kernel is concerned, we have a big set of feature flags
to clear.  We currently only do this in cases where the user
asks for it to be disabled, but we are about to expand the
places where we do it to handle errors too.

Move the code in to xstate.c, and put it in the xstate.h
header.  We will use it in the next patch too.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233124.EA9A70E5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
b081535959 x86/fpu: Print xfeature buffer size in decimal
This is utterly a personal taste thing, but I find it way easier
to read structure sizes in decimal than in hex.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233124.1A8B04A8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:55 +02:00
Ingo Molnar
86e9fc3a0e x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
Remove uses of _fpstate_ia32 from the kernel, and move the
legacy _fpstate_ia32 definition to the user-space only portion
of the header.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:57 +02:00
Ingo Molnar
827409b2f5 x86/fpu/math-emu: Fix crash in fork()
During later stages of math-emu bootup the following crash triggers:

	 math_emulate: 0060:c100d0a8
	 Kernel panic - not syncing: Math emulation needed in kernel
	 CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
	 [...]
	 Call Trace:
	  [<c181d50d>] dump_stack+0x41/0x52
	  [<c181c918>] panic+0x77/0x189
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c164c2d7>] math_emulate+0xba7/0xbd0
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
	  [<c136ac20>] ? proc_clear_tty+0x40/0x70
	  [<c136ac6e>] ? session_clear_tty+0x1e/0x30
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c1003575>] do_device_not_available+0x45/0x70
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c18258e6>] error_code+0x5a/0x60
	  [<c1003530>] ? math_error+0x140/0x140
	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
	  [<c100c205>] arch_dup_task_struct+0x25/0x30
	  [<c1048cea>] copy_process.part.51+0xea/0x1480
	  [<c115a8e5>] ? dput+0x175/0x200
	  [<c136af70>] ? no_tty+0x30/0x30
	  [<c1157242>] ? do_vfs_ioctl+0x322/0x540
	  [<c104a21a>] _do_fork+0xca/0x340
	  [<c1057b06>] ? SyS_rt_sigaction+0x66/0x90
	  [<c104a557>] SyS_clone+0x27/0x30
	  [<c1824a80>] sysenter_do_call+0x12/0x12

The reason is the incorrect assumption in fpu_copy(), that FNSAVE
can be executed from math-emu kernels as well.

Don't try to copy the registers, the soft state will be copied
by fork anyway, so the child task inherits the parent task's
soft math state.

With this fix applied math-emu kernels boot up fine on modern
hardware and the 'no387 nofxsr' boot options.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 10:23:03 +02:00
Ingo Molnar
5fc960380e x86/fpu/math-emu: Fix math-emu boot crash
On a math-emu bootup the following crash occurs:

	Initializing CPU#0
	------------[ cut here ]------------
	kernel BUG at arch/x86/kernel/traps.c:779!
	invalid opcode: 0000 [#1] SMP
	[...]
	EIP is at do_device_not_available+0xe/0x70
	[...]
	Call Trace:
	 [<c18238e6>] error_code+0x5a/0x60
	 [<c1002bd0>] ? math_error+0x140/0x140
	 [<c100bbd9>] ? fpu__init_cpu+0x59/0xa0
	 [<c1012322>] cpu_init+0x202/0x330
	 [<c104509f>] ? __native_set_fixmap+0x1f/0x30
	 [<c1b56ab0>] trap_init+0x305/0x346
	 [<c1b548af>] start_kernel+0x1a5/0x35d
	 [<c1b542b4>] i386_start_kernel+0x82/0x86

The reason is that in the following commit:

  b1276c48e9 ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")

I failed to consider math-emu's limitation that it cannot execute the
FNINIT instruction in kernel mode.

The long term fix might be to allow math-emu to execute (certain) kernel
mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
fix: initialize the emulation state explicitly without trapping out to
the FPU emulator.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 10:02:04 +02:00
Jan Beulich
5bc016f1ab x86/fpu: Disable dependent CPU features on "noxsave"
Complete the set of dependent features that need disabling at
once: XSAVEC, AVX-512 and all currently known to the kernel
extensions to it, as well as MPX need to be disabled too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55ACC40D0200007800092E6C@mail.emea.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-21 08:20:42 +02:00
Ingo Molnar
5aaeb5c01c x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
Don't burden architectures without dynamic task_struct sizing
with the overhead of dynamic sizing.

Also optimize the x86 code a bit by caching task_struct_size.

Acked-and-Tested-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-3-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:51 +02:00
Dave Hansen
0c8c0f03e3 x86/fpu, sched: Dynamically allocate 'struct fpu'
The FPU rewrite removed the dynamic allocations of 'struct fpu'.
But, this potentially wastes massive amounts of memory (2k per
task on systems that do not have AVX-512 for instance).

Instead of having a separate slab, this patch just appends the
space that we need to the 'task_struct' which we dynamically
allocate already.  This saves from doing an extra slab
allocation at fork().

The only real downside here is that we have to stick everything
and the end of the task_struct.  But, I think the
BUILD_BUG_ON()s I stuck in there should keep that from being too
fragile.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-18 03:42:35 +02:00
Ingo Molnar
b96fecbfa8 x86/fpu: Fix boot crash in the early FPU code
Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:

  general protection fault: 0000 [#1] SMP
  RIP: 0010:[<ffffffff81048a6c>]  [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40

  2b:*  0f ae 85 00 fe ff ff    fxsave -0x200(%rbp)

and bisected it down to the following FPU commit:

   91a8c2a5b4 ("x86/fpu: Clean up and fix MXCSR handling")

The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.

This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).

So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.

Reported-and-bisected-by: Jan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:05:56 +02:00
Dave Hansen
a842400367 x86/fpu: Fix double-increment in setup_xstate_features()
I noticed that my MPX tracepoints were producing garbage for the
lower and upper bounds:

	mpx_bounds_register_exception: address referenced: 0x00007fffffffccb7 bounds: lower: 0x0 ~upper: 0xffffffffffffffff
	mpx_bounds_register_exception: address referenced: 0x00007fffffffccbf bounds: lower: 0x0 ~upper: 0xffffffffffffffff

This is, of course, bogus because 0x00007fffffffccbf is *within*
the bounds.  I assumed that my instruction decoder was bad and
went looking at it.  But I eventually realized that I was
getting a '0' offset back from xstate_offsets[BNDREGS].

It was being skipped in the initialization, which is obviously
bogus, so remove the extra leaf++.

This also goes an initializes xstate_offsets/sizes[] to -1 so
so that bugs like this will oops instead of silently failing
in interesting ways.

This was introduced by:

	39f1acd ("x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end")

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Link: http://lkml.kernel.org/r/20150611193400.2E0B00DB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-12 10:48:12 +02:00
Dave Hansen
04cd027bcb x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
The MPX code appears is calling a low-level FPU function
(copy_fpregs_to_fpstate()).  This function is not able to
be called in all contexts, although it is safe to call
directly in some cases.

Although probably correct, the current code is ugly and
potentially error-prone.  So, add a wrapper that calls
the (slightly) higher-level fpu__save() (which is preempt-
safe) and also ensures that we even *have* an FPU context
(in the case that this was called when in lazy FPU mode).

Ingo had this to say about the details about when we need
preemption disabled:

> it's indeed generally unsafe to access/copy FPU registers with preemption enabled,
> for two reasons:
>
>   - on older systems that use FSAVE the instruction destroys FPU register
>     contents, which has to be handled carefully
>
>   - even on newer systems if we copy to FPU registers (which this code doesn't)
>     then we don't want a context switch to occur in the middle of it, because a
>     context switch will write to the fpstate, potentially overwriting our new data
>     with old FPU state.
>
> But it's safe to access FPU registers with preemption enabled in a couple of
> special cases:
>
>   - potentially destructively saving FPU registers: the signal handling code does
>     this in copy_fpstate_to_sigframe(), because it can rely on the signal restore
>     side to restore the original FPU state.
>
>   - reading FPU registers on modern systems: we don't do this anywhere at the
>     moment, mostly to keep symmetry with older systems where FSAVE is
>     destructive.
>
>   - initializing FPU registers on modern systems: fpu__clear() does this. Here
>     it's safe because we don't copy from the fpstate.
>
>   - directly writing FPU registers from user-space memory (!). We do this in
>     fpu__restore_sig(), and it's safe because neither context switches nor
>     irq-handler FPU use can corrupt the source context of the copy (which is
>     user-space memory).
>
> Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think,
> because:
>
>  - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be
>    used.
>
>  - the code was only reading FPU registers, and was doing it only in places that
>    guaranteed that an FPU state was already active (i.e. didn't do it in
>    kthreads)

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:29 +02:00
Dave Hansen
0c4109bec0 x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
get_xsave_addr() assumes that if an xsave bit is present in the
hardware (pcntxt_mask) that it is present in a given xsave
buffer.  Due to an bug in the xsave code on all of the systems
that have MPX (and thus all the users of this code), that has
been a true assumption.

But, the bug is getting fixed, so our assumption is not going
to hold any more.

It's quite possible (and normal) for an enabled state to be
present on 'pcntxt_mask', but *not* in 'xstate_bv'.  We need
to consult 'xstate_bv'.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150607183700.1E739B34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09 12:24:29 +02:00
Ingo Molnar
d65fcd608f x86/fpu: Simplify copy_kernel_to_xregs_booting()
copy_kernel_to_xregs_booting() has a second parameter that is the mask
of xfeatures that should be copied - but this parameter is always -1.

Simplify the call site of this function, this also makes it more
similar to the function call signature of other copy_kernel_to*regs()
functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:33 +02:00
Ingo Molnar
003e2e8b57 x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the parameter passing convention of other kernel-to-FPU-registers
copying functions: pass around an in-memory FPU register state pointer,
instead of struct fpu *.

NOTE: This patch also changes the assembly constraint of the FXSAVE-leak
      workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine,
      as we only need a valid memory address there for the FILDL instruction.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:32 +02:00
Ingo Molnar
9ccc27a5d2 x86/fpu: Remove error return values from copy_kernel_to_*regs() functions
None of the copy_kernel_to_*regs() FPU register copying functions are
supposed to fail, and all of them have debugging checks that enforce
this.

Remove their return values and simplify their call sites, which have
redundant error checks and error handling code paths.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:30 +02:00
Ingo Molnar
3e1bf47e5c x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs()
Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
in line with the naming of other kernel-to-FPU-registers copying functions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:29 +02:00
Ingo Molnar
ce2a1e67f1 x86/fpu: Add debugging check to fpu__restore()
The copy_fpstate_to_fpregs() function is never supposed to fail,
so add a debugging check to its call site in fpu__restore().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
343763c3b0 x86/fpu: Optimize fpu__activate_fpstate_write()
fpu__activate_fpstate_write() is used before ptrace writes to the fpstate
context. Because it expects the modified registers to be reloaded on the
nexts context switch, it's only valid to call this function for stopped
child tasks.

  - add a debugging check for this assumption

  - remove code that only runs if the current task's FPU state needs
    to be saved, which cannot occur here

  - update comments to match the implementation

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:27 +02:00
Ingo Molnar
6a81d7eb33 x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write()
Remaining users of fpu__activate_fpstate() are all places that want to modify
FPU registers, rename the function to fpu__activate_fpstate_write() according
to this usage.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:26 +02:00
Ingo Molnar
9ba6b79102 x86/fpu: Optimize fpu__activate_fpstate_read()
fpu__activate_fpstate_read() is used before FPU registers are
read from the fpstate by ptrace and core dumping.

It's not necessary to unlazy non-current child tasks in this case,
since the reading of registers is non-destructive.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:25 +02:00
Ingo Molnar
0560281266 x86/fpu: Split out the fpu__activate_fpstate_read() method
Currently fpu__activate_fpstate() is used for two distinct purposes:

  - read access by ptrace and core dumping, where in the core dumping
    case the current task's FPU state may be examined as well.

  - write access by ptrace, which modifies FPU registers and expects
    the modified registers to be reloaded on the next context switch.

Split out the reading side into fpu__activate_fpstate_read().

( Note that this is just a pure duplication of fpu__activate_fpstate()
  for the time being, we'll optimize the new function in the next patch. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 14:11:24 +02:00
Ingo Molnar
47f01e8cc2 x86/fpu: Fix FPU register read access to the current task
Bobby Powers reported the following FPU warning during ELF coredumping:

   WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0()

This warning unearthed an invalid assumption about fpu__activate_stopped()
that I added in:

  67e97fc2ec ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check")

the old init_fpu() function had an (intentional but obscure) side effect:
when FPU registers are accessed for the current task, for reading, then
it synchronized live in-register FPU state with the fpstate by saving it.

So fix this bug by saving the FPU if we are the current task. We'll
still warn in fpu__save() if this is called for not yet stopped
child tasks, so the debugging check is still preserved.

Also rename the function to fpu__activate_fpstate(), because it's not
exclusively used for stopped tasks, but for the current task as well.

( Note that this bug calls for a cleaner separation of access-for-read
  and access-for-modification FPU methods, but we'll do that in separate
  patches. )

Reported-by: Bobby Powers <bobbypowers@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 12:40:18 +02:00
Ingo Molnar
6e5535940f x86/fpu: Fix fpu__init_system_xstate() comments
Remove obsolete comment about __init limitations: in the new code there aren't any.

Also standardize the comment style in the function while at it.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-25 12:49:34 +02:00
Ingo Molnar
6f56a8d024 Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Conflicts:
	arch/x86/kernel/i387.c

This commit is conflicting:

  e88221c50c ("x86/fpu: Disable XSAVES* support for now")

These functions changed a lot, move the quirk to arch/x86/kernel/fpu/init.c's
fpu__init_system_xstate_size_legacy().

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 12:01:01 +02:00
Ingo Molnar
5856afed0c x86/fpu/init: Clean up and comment the __setup() functions
Explain the functions and also standardize their style
and naming.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:39:35 +02:00
Ingo Molnar
7cf82d33b6 x86/fpu/init: Move __setup() functions to fpu/init.c
We had a number of FPU init related boot option handlers
in arch/x86/kernel/cpu/common.c - move them over into
arch/x86/kernel/fpu/init.c to have them all in a
single place.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:35:42 +02:00
Ingo Molnar
e97131a839 x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
There are various internal FPU state debugging checks that never
trigger in practice, but which are useful for FPU code development.

Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a
couple of new ones.

The size difference is about 0.5K of code on defconfig:

   text        data     bss          filename
   15028906    2578816  1638400      vmlinux
   15029430    2578816  1638400      vmlinux

( Keep this enabled by default until the new FPU code is debugged. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
e1884d69f6 x86/fpu: Pass 'struct fpu' to fpu__restore()
This cleans up the call sites and the function a bit,
and also makes it more symmetric with the other high
level FPU state handling functions.

It's still only valid for the current task, as we copy
to the FPU registers of the current CPU.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
32231879f6 x86/fpu/init: Propagate __init annotations
Now that all the FPU init function call dependencies are
cleaned up we can propagate __init annotations deeper.

This shrinks the runtime size of the kernel a bit, and
also addresses a few section warnings.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
5fd402dfa7 x86/fpu/xstate: Clean up setup_xstate_comp() call
So call setup_xstate_comp() from the xstate init code, not
from the generic fpu__init_system() code.

This allows us to remove the protytype from xstate.h as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
39f1acd243 x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
The current xstate code in setup_xstate_features() assumes that
the first zero bit means the end of xfeatures - but that is not
so, the SDM clearly states that an arbitrary set of xfeatures
might be enabled - and it is also clear from the description
of the compaction feature that holes are possible:

  "13-6 Vol. 1MANAGING STATE USING THE XSAVE FEATURE SET
  [...]

  Compacted format. Each state component i (i ≥ 2) is located at a byte
  offset from the base address of the XSAVE area based on the XCOMP_BV
  field in the XSAVE header:

  — If XCOMP_BV[i] = 0, state component i is not in the XSAVE area.

  — If XCOMP_BV[i] = 1, the following items apply:

  • If XCOMP_BV[j] = 0 for every j, 2 ≤ j < i, state component i is
    located at a byte offset 576 from the base address of the XSAVE
    area. (This item applies if i is the first bit set in bits 62:2 of
    the XCOMP_BV; it implies that state component i is located at the
    beginning of the extended region.)

  • Otherwise, let j, 2 ≤ j < i, be the greatest value such that
    XCOMP_BV[j] = 1. Then state component i is located at a byte offset
    X from the location of state component j, where X is the number of
    bytes required for state component j as enumerated in
    CPUID.(EAX=0DH,ECX=j):EAX. (This item implies that state component i
    immediately follows the preceding state component whose bit is set
    in XCOMP_BV.)"

So don't assume that the first zero xfeatures bit means the end of
all xfeatures - iterate through all of them.

I'm not aware of hardware that triggers this currently.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
63c6680cd0 x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
kernel_fpu_begin() is __kernel_fpu_begin() with a preempt_disable().

Move the kernel_fpu_begin() debugging check into __kernel_fpu_begin(),
so that users of __kernel_fpu_begin() may benefit from it as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
aeb997b9f2 x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
Improve the memory layout of 'struct fpu':

 - change ->fpregs_active from 'int' to 'char' - it's just a single flag
   and modern x86 CPUs can do efficient byte accesses.

 - pack related fields closer to each other: often 'fpu->state' will not be
   touched, while the other fields will - so pack them into a group.

Also add comments to each field, describing their purpose, and add
some background information about lazy restores.

Also fix an obsolete, lazy switching related comment in fpu_copy()'s description.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
c47ada305d x86/fpu: Harmonize FPU register state types
Use these consistent names:

    struct fregs_state           # was: i387_fsave_struct
    struct fxregs_state          # was: i387_fxsave_struct
    struct swregs_state          # was: i387_soft_struct
    struct xregs_state           # was: xsave_struct
    union  fpregs_state          # was: thread_xstate

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
0c306bcfba x86/fpu: Factor out the FPU regset code into fpu/regset.c
So much of fpu/core.c is the regset code, but it just obscures the generic
FPU state machine logic. Factor out the regset code into fpu/regset.c, where
it can be read in isolation.

This affects one API: fpu__activate_stopped() has to be made available
from the core to fpu/regset.c.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
b992c660d3 x86/fpu: Factor out fpu/signal.c
fpu/xstate.c has a lot of generic FPU signal frame handling routines,
move them into a separate file: fpu/signal.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
c681314421 x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
Standardize the naming of the various functions that copy register
content in specific FPU context formats:

  copy_fxregs_to_kernel()         # was: fpu_fxsave()
  copy_xregs_to_kernel()          # was: xsave_state()

  copy_kernel_to_fregs()          # was: frstor_checking()
  copy_kernel_to_fxregs()         # was: fxrstor_checking()
  copy_kernel_to_xregs()          # was: fpu_xrstor_checking()
  copy_kernel_to_xregs_booting()  # was: xrstor_state_booting()

  copy_fregs_to_user()            # was: fsave_user()
  copy_fxregs_to_user()           # was: fxsave_user()
  copy_xregs_to_user()            # was: xsave_user()

  copy_user_to_fregs()            # was: frstor_user()
  copy_user_to_fxregs()           # was: fxrstor_user()
  copy_user_to_xregs()            # was: xrestore_user()
  copy_user_to_fpregs_zeroing()   # was: restore_user_xstate()

Eliminate fpu_xrstor_checking(), because it was just a wrapper.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
815418890e x86/fpu: Move restore_init_xstate() out of fpu/internal.h
Move restore_init_xstate() next to its sole caller.

Also rename it to copy_init_fpstate_to_fpregs() and add
some comments about what it does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
6f57502310 x86/fpu: Generalize 'init_xstate_ctx'
So the handling of init_xstate_ctx has a layering violation: both
'struct xsave_struct' and 'union thread_xstate' have a
'struct i387_fxsave_struct' member:

   xsave_struct::i387
   thread_xstate::fxsave

The handling of init_xstate_ctx is generic, it is used on all
CPUs, with or without XSAVE instruction. So it's confusing how
the generic code passes around and handles an XSAVE specific
format.

What we really want is for init_xstate_ctx to be a proper
fpstate and we use its ::fxsave and ::xsave members, as
appropriate.

Since the xsave_struct::i387 and thread_xstate::fxsave aliases
each other this is not a functional problem.

So implement this, and move init_xstate_ctx to the generic FPU
code in the process.

Also, since init_xstate_ctx is not XSAVE specific anymore,
rename it to init_fpstate, and mark it __read_mostly,
because it's only modified once during bootup, and used
as a reference fpstate later on.

There's no change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
bf935b0b52 x86/fpu: Create 'union thread_xstate' helper for fpstate_init()
fpstate_init() only uses fpu->state, so pass that in to it.

This enables the cleanup we will do in the next patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
0aba697894 x86/fpu: Harmonize the names of the fpstate_init() helper functions
Harmonize the inconsistent naming of these related functions:

                          fpstate_init()
  finit_soft_fpu()   =>   fpstate_init_fsoft()
  fx_finit()         =>   fpstate_init_fxstate()
  fx_finit()         =>   fpstate_init_fstate()       # split out

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00
Ingo Molnar
e1cebad49c x86/fpu: Factor out the exception error code handling code
Factor out the FPU error code handling code from traps.c and fpu/internal.h
and move them close to each other.

Also convert the helper functions to 'struct fpu *', which further simplifies
them.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
acd58a3ad0 x86/fpu: Remove run-once init quirks
Remove various boot quirks that came from the old code.

The new code is cleanly split up into per-system and per-cpu
init sequences, and system init functions are only called once.

Remove the run-once quirks.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
59a36d16be x86/fpu: Factor out fpu/regset.h from fpu/internal.h
Only a few places use the regset definitions, so factor them out.

Also fix related header dependency assumptions.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:06 +02:00
Ingo Molnar
fcbc99c403 x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions
Most of the FPU does not use them, so split it out and include
them in signal.c and ia32_signal.c

Also fix header file dependency assumption in fpu/core.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
fbce778246 x86/fpu: Merge fpu__reset() and fpu__clear()
With recent cleanups and fixes the fpu__reset() and fpu__clear()
functions have become almost identical in functionality: the only
difference is that fpu__reset() assumed that the fpstate
was already active in the eagerfpu case, while fpu__clear()
activated it if it was inactive.

This distinction almost never matters, the only case where such
fpstate activation happens if if the init thread (PID 1) gets exec()-ed
for the first time.

So keep fpu__clear() and change all fpu__reset() uses to
fpu__clear() to simpify the logic.

( In a later patch we'll further simplify fpu__clear() by making
  sure that all contexts it is called on are already active. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:05 +02:00
Ingo Molnar
82c0e45eb5 x86/fpu: Move the signal frame handling code closer to each other
Consolidate more signal frame related functions:

   text      data    bss     dec       filename
   14108070  2575280 1634304 18317654  vmlinux.before
   14107944  2575344 1634304 18317592  vmlinux.after

Also, while moving it, rename alloc_mathframe() to fpu__alloc_mathframe().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
9dfe99b755 x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig()
restore_xstate_sig() is a misnomer: it's not limited to 'xstate' at all,
it is the high level 'restore FPU state from a signal frame' function
that works with all legacy FPU formats as well.

Rename it (and its helper) accordingly, and also move it to the
fpu__*() namespace.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
04c8e01d50 x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
Do it like all other high level FPU state handling functions: they
only know about struct fpu, not about the task.

(Also remove a dead prototype while at it.)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
6ffc152e46 x86/fpu: Move all the fpu__*() high level methods closer to each other
The fpu__*() methods are closely related, but they are defined
in scattered places within the FPU code.

Concentrate them, and also uninline fpu__save(), fpu__drop()
and fpu__reset() to save about 5K of kernel text on 64-bit kernels:

   text            data    bss     dec        filename
   14113063        2575280 1634304 18322647   vmlinux.before
   14108070        2575280 1634304 18317654   vmlinux.after

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:04 +02:00
Ingo Molnar
0e75c54f17 x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs()
fpu_restore_checking() is a helper function of restore_fpu_checking(),
but this is not apparent from the naming.

Both copy fpstate contents to fpregs, while the fuller variant does
a full copy without leaking information.

So rename them to:

    copy_fpstate_to_fpregs()
  __copy_fpstate_to_fpregs()

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
5033861575 x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
drop_fpu() and fpu_reset_state() are similar in functionality
and in scope, yet this is not apparent from their names.

drop_fpu() deactivates FPU contents (both the fpregs and the fpstate),
but leaves register contents intact in the eager-FPU case, mostly as an
optimization. It disables fpregs in the lazy FPU case. The drop_fpu()
method can be used to destroy FPU state in an optimized way, when we
know that a new state will be loaded before user-space might see
any remains of the old FPU state:

     - such as in sys_exit()'s exit_thread() where we know this task
       won't execute any user-space instructions anymore and the
       next context switch cleans up the FPU. The old FPU state
       might still be around in the eagerfpu case but won't be
       saved.

     - in __restore_xstate_sig(), where we use drop_fpu() before
       copying a new state into the fpstate and activating that one.
       No user-pace instructions can execute between those steps.

     - in sys_execve()'s fpu__clear(): there we use drop_fpu() in
       the !eagerfpu case, where it's equivalent to a full reinit.

fpu_reset_state() is a stronger version of drop_fpu(): both in
the eagerfpu and the lazy-FPU case it guarantees that fpregs
are reinitialized to init state. This method is used in cases
where we need a full reset:

     - handle_signal() uses fpu_reset_state() to reset the FPU state
       to init before executing a user-space signal handler. While we
       have already saved the original FPU state at this point, and
       always restore the original state, the signal handling code
       still has to do this reinit, because signals may interrupt
       any user-space instruction, and the FPU might be in various
       intermediate states (such as an unbalanced x87 stack) that is
       not immediately usable for general C signal handler code.

     - __restore_xstate_sig() uses fpu_reset_state() when the signal
       frame has no FP context. Since the signal handler may have
       modified the FPU state, it gets reset back to init state.

     - in another branch __restore_xstate_sig() uses fpu_reset_state()
       to handle a restoration error: when restore_user_xstate() fails
       to restore FPU state and we might have inconsistent FPU data,
       fpu_reset_state() is used to reset it back to a known good
       state.

     - __kernel_fpu_end() uses fpu_reset_state() in an error branch.
       This is in a 'must not trigger' error branch, so on bug-free
       kernels this never triggers.

     - fpu__restore() uses fpu_reset_state() in an error path
       as well: if the fpstate was set up with invalid FPU state
       (via ptrace or via a signal handler), then it's reset back
       to init state.

     - likewise, the scheduler's switch_fpu_finish() uses it in a
       restoration error path too.

Move both drop_fpu() and fpu_reset_state() to the fpu__*() namespace
and harmonize their naming with their function:

    fpu__drop()
    fpu__reset()

This clearly shows that both methods operate on the full state of the
FPU, just like fpu__restore().

Also add comments to explain what each function does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:03 +02:00
Ingo Molnar
2e85591a6c x86/fpu: Better document fpu__clear() state handling
So prior to this fix:

  c88d47480d ("x86/fpu: Always restore_xinit_state() when use_eager_cpu()")

we leaked FPU state across execve() boundaries on eagerfpu systems:

	$ /host/home/mingo/dump-xmm-regs-exec
	# XMM state before execve():
	XMM0 : 000000000000dede
	XMM1 : 000000000000dedf
	XMM2 : 000000000000dee0
	XMM3 : 000000000000dee1
	XMM4 : 000000000000dee2
	XMM5 : 000000000000dee3
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 000000000000dee6
	XMM9 : 000000000000dee7
	XMM10: 000000000000dee8
	XMM11: 000000000000dee9
	XMM12: 000000000000deea
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

	# XMM state after execve(), in the new task context:
	XMM0 : 0000000000000000
	XMM1 : 2f2f2f2f2f2f2f2f
	XMM2 : 0000000000000000
	XMM3 : 0000000000000000
	XMM4 : 00000000000000ff
	XMM5 : 00000000ff000000
	XMM6 : 000000000000dee4
	XMM7 : 000000000000dee5
	XMM8 : 0000000000000000
	XMM9 : 0000000000000000
	XMM10: 0000000000000000
	XMM11: 0000000000000000
	XMM12: 0000000000000000
	XMM13: 000000000000deeb
	XMM14: 000000000000deec
	XMM15: 000000000000deed

Better explain what this function is supposed to do and why.

Reviewed-by: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:02 +02:00