Commit graph

18721 commits

Author SHA1 Message Date
Dave Jones
b7b4839d93 perf/x86: Fix leak in uncore_type_init failure paths
The error path of uncore_type_init() frees up any allocations
that were made along the way, but it relies upon type->pmus
being set, which only happens if the function succeeds. As
type->pmus remains null in this case, the call to
uncore_type_exit will do nothing.

Moving the assignment earlier will allow us to actually free
those allocations should something go awry.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306172028.GA552@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:59:34 +01:00
Linus Torvalds
b01d4e6893 x86: fix compile error due to X86_TRAP_NMI use in asm files
It's an enum, not a #define, you can't use it in asm files.

Introduced in commit 5fa10196bd ("x86: Ignore NMIs that come in during
early boot"), and sadly I didn't compile-test things like I should have
before pushing out.

My weak excuse is that the x86 tree generally doesn't introduce stupid
things like this (and the ARM pull afterwards doesn't cause me to do a
compile-test either, since I don't cross-compile).

Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-07 18:58:40 -08:00
H. Peter Anvin
5fa10196bd x86: Ignore NMIs that come in during early boot
Don Zickus reports:

A customer generated an external NMI using their iLO to test kdump
worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
made things work.

I speculated the external NMI fired, caused the machine to panic (as
expected) and the perf NMI from the watchdog came in and was latched.
My guess was this somehow caused the hang.

   ----

It appears that the latched NMI stays latched until the early page
table generation on 64 bits, which causes exceptions to happen which
end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
during early execution, until we have proper exception handling.

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
2014-03-07 15:08:14 -08:00
Peter Zijlstra
d4078e2322 x86, trace: Further robustify CR2 handling vs tracing
Building on commit 0ac09f9f8c ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:

 - Now that read_cr2() is lifted into trace_do_page_fault(), we should
   pass the address to trace_page_fault_entries() to avoid it
   re-reading a potentially changed cr2.

 - Put both trace_do_page_fault() and trace_page_fault_entries() under
   CONFIG_TRACING.

 - Mark both fault entry functions {,trace_}do_page_fault() as notrace
   to avoid getting __mcount or other function entry trace callbacks
   before we've observed CR2.

 - Mark __do_page_fault() as noinline to guarantee the function tracer
   does get to see the fault.

Cc: <jolsa@redhat.com>
Cc: <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 10:58:18 -08:00
Jiri Olsa
0ac09f9f8c x86, trace: Fix CR2 corruption when tracing page faults
The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

v2: Moving the cr2 read before exception_enter, because
    it could trigger tracepoint as well.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu
Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
2014-03-04 16:00:14 -08:00
H. Peter Anvin
3c0b566334 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
causes a crash during boot - Borislav Petkov
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ
 ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL
 Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6
 fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv
 JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5
 tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R
 IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv
 bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O
 yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b
 PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF
 6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa
 PsVs9t/l7TZoL1MQHlEJ
 =YIab
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
   causes a crash during boot - Borislav Petkov

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-04 15:50:06 -08:00
Borislav Petkov
a5d90c923b x86/efi: Quirk out SGI UV
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,

 kernel BUG at arch/x86/mm/init_64.c:351!
 invalid opcode: 0000 [#1] SMP
 Call Trace:
  [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
  [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
  [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
  [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
  [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
  [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
  [<ffffffff8153d955>] ? printk+0x72/0x74
  [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
  [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
  [<ffffffff81535530>] ? rest_init+0x74/0x74
  [<ffffffff81535539>] kernel_init+0x9/0xff
  [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81535530>] ? rest_init+0x74/0x74

Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.

Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 23:43:33 +00:00
Linus Torvalds
3154da34be Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes, most of them on the tooling side"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix strict alias issue for find_first_bit
  perf tools: fix BFD detection on opensuse
  perf: Fix hotplug splat
  perf/x86: Fix event scheduling
  perf symbols: Destroy unused symsrcs
  perf annotate: Check availability of annotate when processing samples
2014-03-02 11:37:07 -06:00
Linus Torvalds
55de1ed2f5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "The VMCOREINFO patch I'll pushing for this release to avoid having a
  release with kASLR and but without that information.

  I was hoping to include the FPU patches from Suresh, but ran into a
  problem (see other thread); will try to make them happen next week"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kaslr: add missed "static" declarations
  x86, kaslr: export offset in VMCOREINFO ELF notes
2014-03-01 22:48:14 -06:00
Linus Torvalds
d8efcf38b1 Three x86 fixes and one for ARM/ARM64. In particular, nested
virtualization on Intel is broken in 3.13 and fixed by this
 pull request.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTEIYBAAoJEBvWZb6bTYbyv7AP/iOE8ybTr7MSfZPTF4Ip13o+
 rvzlnzUtDMsZAN5dZAXhsR3lPeXygjTmeI6FWdiVwdalp9wpg2FLAZ0BDj8KIg0r
 cAINYOmJ0jhC1mTOfMJghsrE9b1aaXwWVSlXkivzyIrPJCZFlqDKOXleHJXyqNTY
 g259tPI8VWS7Efell9NclUNXCdD4g6wG/RdEjumjv9JLiXlDVviXvzvIEl/S3Ud1
 D2xxX7vvGHikyTwuls/bWzJzzRRlsb1VVOOQtBuC9NyaAY7bpQjGQvo6XzxowtIZ
 h4F4iU/umln5WcDiJU8XXiV/TOCVzqgdLk3Pr5Kgv3yO8/XbE/CcnyJmeaSgMoJB
 i7vJ6tUX5mfGsLNxfshXw0RsY/y9KMLnbt62eiPImWBxgpDZNxKpAqCA7GsOb87g
 Vjzl3poEwe+5eN6Usbpd78rRgfgbbZF+Pf2qsphtQhFQGaogz1Ltz0B0hY3MYxx3
 y9OJMJyt1MI4+hvvdjhSnmIo6APwuGSr+hhdKCPSlMiWJun2XRHXTHBNAS+dsjgs
 Sx2Bzao/lki5l7y9Ea1fR4yerigbFJF4L1iV04sSbsoh0I/nN5qjXFrc22Ju0i3i
 uIrVwfSSdX4HQwQYdBGKQQRGq/W0wOjEDoA5qZmxg3s4j8KSd7ooBtRk/VepVH7E
 kaUrekJ+KWs/sVNW2MtU
 =zQTn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three x86 fixes and one for ARM/ARM64.

  In particular, nested virtualization on Intel is broken in 3.13 and
  fixed by this pull request"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm, vmx: Really fix lazy FPU on nested guest
  kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
  arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT
  KVM: MMU: drop read-only large sptes when creating lower level sptes
2014-02-28 11:45:03 -08:00
Paolo Bonzini
1b385cbdd7 kvm, vmx: Really fix lazy FPU on nested guest
Commit e504c9098e (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup.  In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c9098e
Reported-by: Kashyap Chamarty <kchamart@redhat.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Kashyap Chamarty <kchamart@redhat.com>
Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27 22:54:11 +01:00
Andrew Honig
a08d3b3b99 kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
The problem occurs when the guest performs a pusha with the stack
address pointing to an mmio address (or an invalid guest physical
address) to start with, but then extending into an ordinary guest
physical address.  When doing repeated emulated pushes
emulator_read_write sets mmio_needed to 1 on the first one.  On a
later push when the stack points to regular memory,
mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.

As a result, KVM exits to userspace, and then returns to
complete_emulated_mmio.  In complete_emulated_mmio
vcpu->mmio_cur_fragment is incremented.  The termination condition of
vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
The code bounces back and fourth to userspace incrementing
mmio_cur_fragment past it's buffer.  If the guest does nothing else it
eventually leads to a a crash on a memcpy from invalid memory address.

However if a guest code can cause the vm to be destroyed in another
vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
can be used by the guest to control the data that's pointed to by the
call to cancel_work_item, which can be used to gain execution.

Fixes: f78146b0f9
Signed-off-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org (3.5+)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27 19:35:22 +01:00
Peter Zijlstra
26e61e8939 perf/x86: Fix event scheduling
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

  n_events = 2, n_added = 1, n_txn = 1

	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

	group_sched_in()
	  pmu->start_txn() /* nop - BP pmu */
	  event_sched_in()
	     event->pmu->add()

So here we should end up with:

  0: n_events = 3, n_added = 2, n_txn = 2
  4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded!  Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

	event_sched_out()
	  event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

 n_event = 2, n_added = 2, n_txn = 2

	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:38:02 +01:00
Marcelo Tosatti
404381c583 KVM: MMU: drop read-only large sptes when creating lower level sptes
Read-only large sptes can be created due to read-only faults as
follows:

- QEMU pagetable entry that maps guest memory is read-only
due to COW.
- Guest read faults such memory, COW is not broken, because
it is a read-only fault.
- Enable dirty logging, large spte not nuked because it is read-only.
- Write-fault on such memory causes guest to loop endlessly
(which must go down to level 1 because dirty logging is enabled).

Fix by dropping large spte when necessary.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-26 17:23:32 +01:00
Kees Cook
e290e8c59d x86, kaslr: add missed "static" declarations
This silences build warnings about unexported variables and functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:59:29 -08:00
Eugene Surovegin
b6085a8657 x86, kaslr: export offset in VMCOREINFO ELF notes
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging.

[ hpa: pushing this for v3.14 to avoid having a kernel version with
  kASLR where we can't debug output. ]

Signed-off-by: Eugene Surovegin <surovegin@google.com>
Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:57:47 -08:00
Linus Torvalds
208937fdcf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - a bugfix which prevents a divide by 0 panic when the newly introduced
   try_msr_calibrate_tsc() fails

 - enablement of the Baytrail platform to utilize the newfangled msr
   based calibration

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: tsc: Add missing Baytrail frequency to the table
  x86, tsc: Fallback to normal calibration if fast MSR calibration fails
2014-02-23 14:15:46 -08:00
Linus Torvalds
9b3e7c9b9a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixlets from all around the place"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table
  perf/x86: Correctly use FEATURE_PDCM
  perf, nmi: Fix unknown NMI warning
  perf trace: Fix ioctl 'request' beautifier build problems on !(i386 || x86_64) arches
  perf trace: Add fallback definition of EFD_SEMAPHORE
  perf list: Fix checking for supported events on older kernels
  perf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly
  perf probe: Do not add offset twice to uprobe address
  perf/x86: Fix Userspace RDPMC switch
  perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
2014-02-22 12:11:54 -08:00
Stephane Eranian
337397f3af perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table
This patch updates the CBOX PMU filters mapping tables for SNB-EP
and IVT (model 45 and 62 respectively).

The NID umask always comes in addition to another umask.
When set, the NID filter is applied.

The current mapping tables were missing some code/umask
combinations to account for the NID umask. This patch
fixes that.

Cc: mingo@elte.hu
Cc: ak@linux.intel.com
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140219131018.GA24475@quad
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Peter Zijlstra
c9b08884c9 perf/x86: Correctly use FEATURE_PDCM
The current code simply assumes Intel Arch PerfMon v2+ to have
the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check
CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead.

This was found by KVM which implements v2+ but didn't provide the
capabilities MSR. Change the code to DTRT; KVM will also implement the
MSR and return 0.

Cc: pbonzini@redhat.com
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Markus Metzger
a3ef2229c9 perf, nmi: Fix unknown NMI warning
When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Cc: eranian@google.com
Cc: peterz@infradead.org
Cc: mingo@kernel.org
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 22:09:01 +01:00
Linus Torvalds
d49649615d Merge branch 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fixes from Marek Szyprowski:
 "This contains fixes for incorrect atomic test in dma-mapping subsystem
  for ARM and x86 architecture"

* 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  x86: dma-mapping: fix GFP_ATOMIC macro usage
  ARM: dma-mapping: fix GFP_ATOMIC macro usage
2014-02-20 11:58:56 -08:00
Mika Westerberg
3e11e818bf x86: tsc: Add missing Baytrail frequency to the table
Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means
that the CPU reference clock runs at 83.3MHz. Add this missing frequency to
the table.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bin Gao <bin.gao@linux.intel.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19 17:12:24 +01:00
Thomas Gleixner
5f0e030930 x86, tsc: Fallback to normal calibration if fast MSR calibration fails
If we cannot calibrate TSC via MSR based calibration
try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that
to the caller. This value gets then propagated further to clockevents
code resulting division by zero oops like the one below:

 divide error: 0000 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W    3.13.0+ #47
 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000
 RIP: 0010:[<ffffffff810aec14>]  [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0
 RSP: 0000:ffff880075507e58  EFLAGS: 00010246
 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff
 RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be
 R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008
 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0
 Stack:
  ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88
  ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168
  ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000
 Call Trace:
  [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30
  [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0
  [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8
  [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0
  [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205
  [<ffffffff8177c910>] ? rest_init+0x90/0x90
  [<ffffffff8177c91e>] kernel_init+0xe/0x120
  [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0
  [<ffffffff8177c910>] ? rest_init+0x90/0x90

Prevent this from happening by:
 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero
    if it fails.
 2) Check this return value in native_calibrate_tsc() and in case of zero
    fallback to use normal non-MSR based calibration.

[mw: Added subject and changelog]

Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bin Gao <bin.gao@linux.intel.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19 17:12:24 +01:00
Linus Torvalds
9bd01b9bbd Two fixes in the tracing utility.
The first is a fix for the way the ring buffer stores timestamps.
 After a restructure of the code was done, the ring buffer timestamp
 logic missed the fact that the first event on a sub buffer is to have
 a zero delta, as the full timestamp is stored on the sub buffer itself.
 But because the delta was not cleared to zero, the timestamp for that
 event will be calculated as the real timestamp + the delta from the
 last timestamp. This can skew the timestamps of the events and
 have them say they happened when they didn't really happen. That's bad.
 
 The second fix is for modifying the function graph caller site.
 When the stop machine was removed from updating the function tracing
 code, it missed updating the function graph call site location.
 It is still modified as if it is being done via stop machine. But it's not.
 This can lead to a GPF and kernel crash if the function graph call site
 happens to lie between cache lines and one CPU is executing it while
 another CPU is doing the update. It would be a very hard condition to
 hit, but the result is sever enough to have it fixed ASAP.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQEcBAABAgAGBQJS/2leAAoJEKQekfcNnQGu7nYH/AltUO19AgM2sFLOLM7Q0dp4
 Lg7vE8CLKtFq0fjtv/ri//fJ56Lr+/WNHiiD06aIrgnMVBbWynS0m0RO+9bhFl8/
 rELiUpXTTruqljmlT2T5lPxk+ZKgtLbxK8hNywU99eLgkTwyaOwrSUol30E8pw41
 UwtKg4OAn1LbjQ8/sddVynGlFDNRdqFiGTIDvhHqI6F6/QlaEX81EeZbLThDU4D/
 l86fMuIdw5pb+efa29Rr0s7O4Xol7SJgnSMVgd0OYADRFmp4sg+MKxuJAUjPsHk7
 9vvbylOb4w5H6lo5h7kUee3w7kG+FjYVoEx+Sqq9936+KlwtN0kbiNvl0DkrXnY=
 =kUmM
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull twi tracing fixes from Steven Rostedt:
 "Two urgent fixes in the tracing utility.

  The first is a fix for the way the ring buffer stores timestamps.
  After a restructure of the code was done, the ring buffer timestamp
  logic missed the fact that the first event on a sub buffer is to have
  a zero delta, as the full timestamp is stored on the sub buffer
  itself.  But because the delta was not cleared to zero, the timestamp
  for that event will be calculated as the real timestamp + the delta
  from the last timestamp.  This can skew the timestamps of the events
  and have them say they happened when they didn't really happen.
  That's bad.

  The second fix is for modifying the function graph caller site.  When
  the stop machine was removed from updating the function tracing code,
  it missed updating the function graph call site location.  It is still
  modified as if it is being done via stop machine.  But it's not.  This
  can lead to a GPF and kernel crash if the function graph call site
  happens to lie between cache lines and one CPU is executing it while
  another CPU is doing the update.  It would be a very hard condition to
  hit, but the result is severe enough to have it fixed ASAP"

* tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/x86: Use breakpoints for converting function graph caller
  ring-buffer: Fix first commit on sub-buffer having non-zero delta
2014-02-15 15:03:34 -08:00
Linus Torvalds
7fc9280462 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI fixes from Peter Anvin:
 "A few more EFI-related fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Check status field to validate BGRT header
  x86/efi: Fix 32-bit fallout
2014-02-15 15:02:28 -08:00
H. Peter Anvin
31fce91e7a Merge remote-tracking branch 'efi/urgent' into x86/urgent
There have been reports of EFI crashes since -rc1. The following two
commits fix known issues.

 * Fix boot failure on 32-bit EFI due to the recent EFI memmap changes
   merged during the merge window - Borislav Petkov

 * Avoid a crash during efi_bgrt_init() by detecting invalid BGRT
   headers based on the 'status' field.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-14 11:11:18 -08:00
Linus Torvalds
161aa772f9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A collection of small fixes:

   - There still seem to be problems with asm goto which requires the
     empty asm hack.
   - If SMAP is disabled at compile time, don't enable it nor try to
     interpret a page fault as an SMAP violation.
   - Fix a case of unbounded recursion while tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off
  x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
  compiler/gcc4: Make quirk for asm_volatile_goto() unconditional
  x86: Use preempt_disable_notrace() in cycles_2_ns()
2014-02-14 11:09:11 -08:00
Matt Fleming
09503379dc x86/efi: Check status field to validate BGRT header
Madper reported seeing the following crash,

  BUG: unable to handle kernel paging request at ffffffffff340003
  IP: [<ffffffff81d85ba4>] efi_bgrt_init+0x9d/0x133
  Call Trace:
   [<ffffffff81d8525d>] efi_late_init+0x9/0xb
   [<ffffffff81d68f59>] start_kernel+0x436/0x450
   [<ffffffff81d6892c>] ? repair_env_string+0x5c/0x5c
   [<ffffffff81d68120>] ? early_idt_handlers+0x120/0x120
   [<ffffffff81d685de>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81d6871e>] x86_64_start_kernel+0x13e/0x14d

This is caused because the layout of the ACPI BGRT header on this system
doesn't match the definition from the ACPI spec, and so we get a bogus
physical address when dereferencing ->image_address in efi_bgrt_init().

Luckily the status field in the BGRT header clearly marks it as invalid,
so we can check that field and skip BGRT initialisation.

Reported-by: Madper Xie <cxie@redhat.com>
Suggested-by: Toshi Kani <toshi.kani@hp.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-14 10:07:15 +00:00
Borislav Petkov
c55d016f7a x86/efi: Fix 32-bit fallout
We do not enable the new efi memmap on 32-bit and thus we need to run
runtime_code_page_mkexec() unconditionally there. Fix that.

Reported-and-tested-by: Lejun Zhu <lejun.zhu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-14 09:30:19 +00:00
H. Peter Anvin
4640c7ee9b x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off
If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions
which are incorrect (as the AC flag doesn't matter), causing spurious
faults.

The dynamic disabling of SMAP (nosmap on the command line) is fine
because it disables X86_FEATURE_SMAP, therefore causing the
static_cpu_has() to return false.

Found by Fengguang Wu's test system.

[ v3: move all predicates into smap_violation() ]
[ v2: use IS_ENABLED() instead of #ifdef ]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 08:40:52 -08:00
H. Peter Anvin
03bbd596ac x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
If SMAP support is not compiled into the kernel, don't enable SMAP in
CR4 -- in fact, we should clear it, because the kernel doesn't contain
the proper STAC/CLAC instructions for SMAP support.

Found by Fengguang Wu's test system.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 07:50:25 -08:00
Steven Rostedt (Red Hat)
87fbb2ac60 ftrace/x86: Use breakpoints for converting function graph caller
When the conversion was made to remove stop machine and use the breakpoint
logic instead, the modification of the function graph caller is still
done directly as though it was being done under stop machine.

As it is not converted via stop machine anymore, there is a possibility
that the code could be layed across cache lines and if another CPU is
accessing that function graph call when it is being updated, it could
cause a General Protection Fault.

Convert the update of the function graph caller to use the breakpoint
method as well.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # 3.5+
Fixes: 08d636b6d4 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11 20:19:44 -05:00
Marek Szyprowski
c091c71ad2 x86: dma-mapping: fix GFP_ATOMIC macro usage
GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
wants to perform an atomic allocation, the code must test for a lack of the
__GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.

CC: stable@vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-02-11 09:40:15 +01:00
Mel Gorman
a9c8e4beee xen: properly account for _PAGE_NUMA during xen pte translations
Steven Noonan forwarded a users report where they had a problem starting
vsftpd on a Xen paravirtualized guest, with this in dmesg:

  BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
  page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
  page flags: 0x2ffc0000000014(referenced|dirty)
  addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
  CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
  Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
  Disabling lock debugging due to kernel taint
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.  He
bisected the problem to commit 1667918b64 ("mm: numa: clear numa
hinting information on mprotect") that was also included in 3.12-stable.

The problem was related to how xen translates ptes because it was not
accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
a pteval_present helper for use by xen so both bare metal and xen use
the same code when checking if a PTE is present.

[mgorman@suse.de: wrote changelog, proposed minor modifications]
[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Steven Rostedt
569d6557ab x86: Use preempt_disable_notrace() in cycles_2_ns()
When debug preempt is enabled, preempt_disable() can be traced by
function and function graph tracing.

There's a place in the function graph tracer that calls trace_clock()
which eventually calls cycles_2_ns() outside of the recursion
protection. When cycles_2_ns() calls preempt_disable() it gets traced
and the graph tracer will go into a recursive loop causing a crash or
worse, a triple fault.

Simple fix is to use preempt_disable_notrace() in cycles_2_ns, which
makes sense because the preempt_disable() tracing may use that code
too, and it tracing it, even with recursion protection is rather
pointless.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140204141315.2a968a72@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:09:08 +01:00
Peter Zijlstra
0e9f2204cf perf/x86: Fix Userspace RDPMC switch
The current code forgets to change the CR4 state on the current CPU.
Use on_each_cpu() instead of smp_call_function().

Reported-by: Mark Davies <junk@eslaf.co.uk>
Suggested-by: Mark Davies <junk@eslaf.co.uk>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:25 +01:00
Peter Zijlstra
e97df76377 perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
PPro machines can die hard when PCE gets enabled due to a CPU erratum.
The safe way it so disable it by default and keep it disabled.

See erratum 26 in:

  http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf

Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:24 +01:00
Linus Torvalds
c1ff84317f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08 11:54:43 -08:00
H. Peter Anvin
a3b072cd18 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn
 CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb
 +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS
 BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I
 GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4
 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60
 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex
 n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9
 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt
 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7
 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc
 1L8PMkTX2rgtELiv4PJ/
 =hLCC
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-07 11:27:30 -08:00
Tang Chen
7bc35fdde6 arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
The following path will cause array out of bound.

memblock_add_region() will always set nid in memblock.reserved to
MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
correct valus in memblock.reserved, we called setup_node_data(), and
used memblock_alloc_nid() to allocate memory, with nid set to
MAX_NUMNODES.

The nodemask_t type can be seen as a bit array.  And the index is 0 ~
MAX_NUMNODES-1.

After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
MAX_NUMNODES-1].

See below:

numa_init()
 |---> numa_register_memblks()
 |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
 |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
 |      |......
 |      |---> setup_node_data()
 |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
 |......
 |---> numa_clear_kernel_node_hotplug()
        |---> node_set()			here, we have an index 1024, and overflowed

This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Tang Chen
017c217a26 arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
was not initialized.  So we need to initialize it.

[akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Borislav Petkov
75a1ba5b2c x86, microcode, AMD: Unify valid container checks
For additional coverage, BorisO and friends unknowlingly did swap AMD
microcode with Intel microcode blobs in order to see what happens. What
did happen on 32-bit was

[    5.722656] BUG: unable to handle kernel paging request at be3a6008
[    5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0
[    5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000

because there was a valid initrd there but without valid microcode in it
and the container check happened *after* the relocated ramdisk handling
on 32-bit, which was clearly wrong.

While at it, take care of the ramdisk relocation on both 32- and 64-bit
as it is done on both. Also, comment what we're doing because this code
is a bit tricky.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-02-06 11:11:19 -08:00
Linus Torvalds
1cd731df09 Bug-fixes:
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build.
  - Fix CR4 not being set on AP processors in Xen PVH mode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS8AyQAAoJEFjIrFwIi8fJbD4IAJssMuaLI5CRsSWBgDFHHDFt
 srVJpDOYQiDr/TxkwFCVcL4sFy9Htb3KMArU4eIBl6uMqQbGa+3rHyXcHYI219YY
 XH3D8RG+9JChwsxtaeUEzwx1C8ehcygD34vtdcoQXa7eBuEi4TL3HeLifR+HrXKO
 UdFrTA34FmvpVFbSuRXkZh5sd6ca9et9xHuQHM8SIY6pVokY6xaEYOp17tfPZpwM
 7A6LFjUjXeugHC2L3+/H8UOHA9nSZQvnMiZOWq2Cusc2Dt2V7emzgk2wcc2CHttf
 EA6GbtiJzHqMPmt5EjubI9hHdSMB31HpY4hnQE38+ucl+BwiSdRE9z2Rm4TYClg=
 =IX4M
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes:
   - Revert "xen/grant-table: Avoid m2p_override during mapping" as it
     broke Xen ARM build.
   - Fix CR4 not being set on AP processors in Xen PVH mode"

* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: set CR4 flags for APs
  Revert "xen/grant-table: Avoid m2p_override during mapping"
2014-02-05 16:01:11 -08:00
Matt Fleming
081cd62a01 x86/efi: Allow mapping BGRT on x86-32
CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
map (see commit 700870119f ("x86, efi: Don't map Boot Services on
i386")), and so efi_lookup_mapped_addr() will fail to return a valid
address. Executing the ioremap() path in efi_bgrt_init() causes the
following warning on x86-32 because we're trying to ioremap() RAM,

 WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
 Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
  00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
  00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
  00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
 Call Trace:
  [<c09a5196>] dump_stack+0x41/0x52
  [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c0448ce2>] warn_slowpath_null+0x22/0x30
  [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
  [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
  [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
  [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
  [<c043bc2b>] ioremap_nocache+0x1b/0x20
  [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
  [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
  [<c0cafd82>] efi_late_init+0x8/0xa
  [<c0c9bab2>] start_kernel+0x3ae/0x3c3
  [<c0c9b53b>] ? repair_env_string+0x51/0x51
  [<c0c9b378>] i386_start_kernel+0x12e/0x131

Switch to using early_memremap(), which won't trigger this warning, and
has the added benefit of more accurately conveying what we're trying to
do - map a chunk of memory.

This patch addresses the following bug report,

  https://bugzilla.kernel.org/show_bug.cgi?id=67911

Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-05 23:39:34 +00:00
Ingo Molnar
f8f2023482 x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs
It can take some time to validate the image, make sure
{allyes|allmod}config doesn't enable it.

I'd say randconfig will cover it often enough, and the failure is also
borderline build coverage related: you cannot really make the decoder
test fail via source level changes, only with changes in the build
environment, so I agree with Andi that we can disable this one too.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-05 14:10:30 -08:00
Mukesh Rathor
afca50132c xen/pvh: set CR4 flags for APs
During bootup in the 'probe_page_size_mask' these CR4 flags are
set in there. But for AP processors they are not set as we do not
use 'secondary_startup_64' which the baremetal kernels uses.
Instead do it in this function which we use in Xen PVH during our
startup for AP processors.

As such fix it up to make sure we have that flag set.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-03 15:44:18 -05:00
Konrad Rzeszutek Wilk
e85fc98055 Revert "xen/grant-table: Avoid m2p_override during mapping"
This reverts commit 08ece5bb23.

As it breaks ARM builds and needs more attention
on the ARM side.

Acked-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-03 06:44:49 -05:00
Petr Tesarik
85fc73a2cd x86: Fix the initialization of physnode_map
With DISCONTIGMEM, the mapping between a pfn and its owning node is
initialized using data provided by the BIOS. However, the initialization
may fail if the extents are not aligned to section boundary (64M).

The symptom of this bug is an early boot failure in pfn_to_page(),
as it tries to access NODE_DATA(__nid) using index from an unitialized
element of the physnode_map[] array.

While the bug is always present, it is more likely to be hit in kdump
kernels on large machines, because:

1. The memory map for a kdump kernel is specified as exactmap, and
   exactmap is more likely to be unaligned.

2. Large reservations are more likely to span across a 64M boundary.

[ hpa: fixed incorrect use of "pfn" instead of "start" ]

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Link: http://lkml.kernel.org/r/20140201133019.32e56f86@hananiah.suse.cz
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-02-01 22:15:51 -08:00
Linus Torvalds
ab5318788c Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debug changes from Ingo Molnar:
 "This contains mostly kernel debugging related updates:

   - make hung_task detection more configurable to distros
   - add final bits for x86 UV NMI debugging, with related KGDB changes
   - update the mailing-list of MAINTAINERS entries I'm involved with"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hung_task: Display every hung task warning
  sysctl: Add neg_one as a standard constraint
  x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured
  x86/uv/nmi: Fix Sparse warnings
  kgdb/kdb: Fix no KDB config problem
  MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
2014-01-31 08:59:46 -08:00