Commit Graph

289 Commits

Author SHA1 Message Date
Hans de Goede 594a30fb12 x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
When booting 4.13 on a VirtualBox VM on a Skylake host the following
error shows up in the logs:

 [    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata;
                please update microcode to version: 0xb2 (or later)

This is caused by apic_check_deadline_errata() only checking CPU model
and not the X86_FEATURE_TSC_DEADLINE_TIMER flag (which VirtualBox does
NOT export to the guest), combined with VirtualBox not exporting the
micro-code version to the guest.

This commit adds a check for X86_FEATURE_TSC_DEADLINE_TIMER to
apic_check_deadline_errata(), silencing this error on VirtualBox VMs.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frank Mehnert <frank.mehnert@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Thayer <michael.thayer@oracle.com>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: bd9240a18e ("x86/apic: Add TSC_DEADLINE quirk due to errata")
Link: http://lkml.kernel.org/r/20170830105811.27539-1-hdegoede@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 11:17:27 +02:00
Thomas Gleixner 61069de7a3 x86/apic: Remove the duplicated tracing versions of interrupts
The error and the spurious interrupt are really rare events and not at all
performance sensitive: two NOP5s can be tolerated when tracing is disabled.

Remove the complication.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064956.986009402@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:25 +02:00
Thomas Gleixner 3bec6def39 x86/apic: Use this_cpu_ptr() in local_timer_interrupt()
Accessing the per cpu data via per_cpu(, smp_processor_id()) is
pointless. Use this_cpu_ptr() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.829552757@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:24 +02:00
Thomas Gleixner 302a98f896 x86/apic: Remove the duplicated tracing version of local_timer_interrupt()
The two NOP5s are noise in the rest of the work which is done by the timer
interrupt and modern CPUs are pretty good in optimizing NOPs anyway.

Get rid of the interrupt handler duplication and move the tracepoints into
the regular handler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064956.751247330@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:24 +02:00
Thomas Gleixner 05161b9cbe x86/irq: Get rid of the 'first_system_vector' indirection bogosity
This variable is beyond pointless. Nothing allocates a vector via
alloc_gate() below FIRST_SYSTEM_VECTOR. So nothing can change
first_system_vector.

If there is a need for a gate below FIRST_SYSTEM_VECTOR then it can be
added to the vector defines and FIRST_SYSTEM_VECTOR can be adjusted
accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.357109735@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:21 +02:00
Linus Torvalds 3ad918e65d Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timers updates from Thomas Gleixner:
 "This update contains:

   - The solution for the TSC deadline timer borkage, which is caused by
     a hardware problem in the TSC_ADJUST/TSC_DEADLINE_TIMER logic.

     The problem is documented now and fixed with a microcode update, so
     we can remove the workaround and just check for the microcode version.

     If the microcode is not up to date, then the TSC deadline timer is
     disabled. If the borkage is fixed by the proper microcode version,
     then the deadline timer can be used. In both cases the restrictions
     to the range of the TSC_ADJUST value, which were added as
     workarounds, are removed.

  - A few simple fixes and updates to the timer related x86 code"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Call check_system_tsc_reliable() before unsynchronized_tsc()
  x86/hpet: Do not use smp_processor_id() in preemptible code
  x86/time: Make setup_default_timer_irq() static
  x86/tsc: Remove the TSC_ADJUST clamp
  x86/apic: Add TSC_DEADLINE quirk due to errata
  x86/apic: Change the lapic name in deadline mode
2017-07-03 18:01:50 -07:00
Thomas Gleixner c7d6c9dd87 x86/apic: Implement effective irq mask update
Add the effective irq mask update to the apic implementations and enable
effective irq masks for x86.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.878370703@linutronix.de
2017-06-22 18:21:23 +02:00
Thomas Gleixner 0e24f7c9f6 x86/apic: Add irq_data argument to apic->cpu_mask_to_apicid()
The decision to which CPUs an interrupt is effectively routed happens in
the various apic->cpu_mask_to_apicid() implementations

To support effective affinity masks this information needs to be updated in
irq_data. Add a pointer to irq_data to the callbacks and feed it through
the call chain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.720739075@linutronix.de
2017-06-22 18:21:22 +02:00
Thomas Gleixner 91cd9cb7ee x86/apic: Move cpumask and to core code
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming
cpumasks to search for the target.

Move that operation to the call site and rename it to cpu_mask_to_apicid()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de
2017-06-22 18:21:22 +02:00
Thomas Gleixner 52b166af40 x86/apic: Move online masking to core code
All implementations of apic->cpu_mask_to_apicid_and() mask out the offline
cpus. The callsite already has a mask available, which has the offline CPUs
removed. Use that and remove the extra bits.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.560868224@linutronix.de
2017-06-22 18:21:21 +02:00
Thomas Gleixner ad95212ee6 x86/apic: Move flat_cpu_mask_to_apicid_and() into C source
No point in having inlines assigned to function pointers at multiple
places. Just bloats the text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235446.405975721@linutronix.de
2017-06-22 18:21:21 +02:00
Peter Zijlstra bd9240a18e x86/apic: Add TSC_DEADLINE quirk due to errata
Due to errata it is possible for the TSC_DEADLINE timer to misbehave
after using TSC_ADJUST. A microcode update is available to fix this
situation.

Avoid using the TSC_DEADLINE timer if it is affected by this issue and
report the required microcode version.

[ tglx: Renamed function to apic_check_deadline_errata() ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155306.050849877@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04 21:55:53 +02:00
Peter Zijlstra c6e9f42bbe x86/apic: Change the lapic name in deadline mode
So that we can more easily see in what mode the lapic timer operates.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155305.989808008@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-04 21:55:52 +02:00
Linus Torvalds 12ca7c8db3 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:
 "Two small cleanups"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix a comment in init_apic_mappings()
  x86/apic: Remove the SET_APIC_ID(x) macro
2017-05-01 21:41:07 -07:00
Linus Torvalds 16b76293c5 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The biggest changes in this cycle were:

   - reworking of the e820 code: separate in-kernel and boot-ABI data
     structures and apply a whole range of cleanups to the kernel side.

     No change in functionality.

   - enable KASLR by default: it's used by all major distros and it's
     out of the experimental stage as well.

   - ... misc fixes and cleanups"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails
  x86/reboot: Turn off KVM when halting a CPU
  x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
  x86: Enable KASLR by default
  boot/param: Move next_arg() function to lib/cmdline.c for later reuse
  x86/boot: Fix Sparse warning by including required header file
  x86/boot/64: Rename start_cpu()
  x86/xen: Update e820 table handling to the new core x86 E820 code
  x86/boot: Fix pr_debug() API braindamage
  xen, x86/headers: Add <linux/device.h> dependency to <asm/xen/page.h>
  x86/boot/e820: Simplify e820__update_table()
  x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures
  x86/boot/e820: Fix and clean up e820_type switch() statements
  x86/boot/e820: Rename the remaining E820 APIs to the e820__*() prefix
  x86/boot/e820: Remove unnecessary #include's
  x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions()
  x86/boot/e820: Rename e820_reserve_resources*() to e820__reserve_resources*()
  x86/boot/e820: Use bool in query APIs
  x86/boot/e820: Document e820__reserve_setup_data()
  x86/boot/e820: Clean up __e820__update_table() et al
  ...
2017-05-01 20:51:12 -07:00
Nicolai Stange 747d04b30e x86/apic/timer: Set ->min_delta_ticks and ->max_delta_ticks
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.

Make the x86 arch's apic clockevent driver initialize these fields
properly.

This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
CC: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2017-04-14 13:11:17 -07:00
Dou Liyang 5ba039a554 x86/apic: Fix a comment in init_apic_mappings()
commit c0104d38a7 ("x86, apic: Unify identical register_lapic_address()
functions") renames acpi_register_lapic_address to register_lapic_address.

But acpi_register_lapic_address remains in a comment, and renaming it to
register_lapic_address is not suitable for this comment.

Remove acpi_register_lapic_address and rewrite the comment.

[ tglx: LAPIC address can be registered either by ACPI/MADT or MP info ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1488805690-5055-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 21:42:11 +01:00
Dou Liyang 5d64d209c4 x86/apic: Remove the SET_APIC_ID(x) macro
The SET_APIC_ID() macro obfusates the code. Remove it to increase
readability and add a comment to the apic struct to document that the
callback is required on 64-bit.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/1488971270-14359-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 21:28:38 +01:00
Dou Liyang 2b85b3d229 x86/acpi: Restore the order of CPU IDs
The following commits:

  f7c28833c2 ("x86/acpi: Enable acpi to register all possible cpus at
boot time") and 8f54969dc8 ("x86/acpi: Introduce persistent storage
for cpuid <-> apicid mapping")

... registered all the possible CPUs at boot time via ACPI tables to
make the mapping of cpuid <-> apicid fixed. Both enabled and disabled
CPUs could have a logical CPU ID after boot time.

But, ACPI tables are unreliable. the number amd order of Local APIC
entries which depends on the firmware is often inconsistent with the
physical devices. Even if they are consistent, The disabled CPUs which
take up some logical CPU IDs will also make the order discontinuous.

Revert the part of disabled CPUs registration, keep the allocation
logic of logical CPU IDs and also keep some code location changes.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: guzheng1@huawei.com
Cc: izumi.taku@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1488528147-2279-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11 14:41:19 +01:00
Dou Liyang 11277aabcb x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()
The following commit:

  2e63ad4bd5 ("x86/apic: Do not init irq remapping if ioapic is disabled")

... added a check for skipped IO-APIC setup to enable_IR_x2apic(), but this
check is also duplicated in try_to_enable_IR() - and it will never succeed in
calling irq_remapping_enable().

Remove the whole irq_remapping_enable() complication: if the IO-APIC is
disabled we cannot enable IRQ remapping.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: nicstange@gmail.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1487841401-1543-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 10:09:09 +01:00
Dou Liyang bb3f0a5263 x86/apic: Fix a warning message in logical CPU IDs allocation
The current warning message in allocate_logical_cpuid() is somewhat confusing:

  Only 1 processors supported.Processor 2/0x2 and the rest are ignored.

As it might imply that there's only one CPU in the system - while what we ran
into here is a kernel limitation.

Fix the warning message to clarify all that:

  APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 and the rest are ignored.

( Also update the error return from -1 to -EINVAL, which is the more
  canonical return value. )

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: nicstange@gmail.com
Cc: wanpeng.li@hotmail.com
Link: http://lkml.kernel.org/r/1488261052-25753-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 10:09:08 +01:00
Ingo Molnar 0871d5a66d Merge branch 'linus' into WIP.x86/boot, to fix up conflicts and to pick up updates
Conflicts:
	arch/x86/xen/setup.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-01 09:02:26 +01:00
Linus Torvalds f89db789de Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two documentation updates, plus a debugging annotation fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/crash: Update the stale comment in reserve_crashkernel()
  x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
  Documentation, x86, resctrl: Recommend locking for resctrlfs
2017-02-28 11:46:00 -08:00
Dou Liyang 543113d2f4 x86/apic: Fix a typo in a comment line
s/bringin
 /bringing

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1486442688-24690-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-07 10:40:27 +01:00
Ingo Molnar 1506c8dc94 x86/boot/e820: Rename e820_reserve_resources*() to e820__reserve_resources*()
Also do some minor cleanups.

No change in functionality.

Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-28 22:55:25 +01:00
Frederic Weisbecker 914122c389 x86/apic: Implement set_state_oneshot_stopped() callback
When clock_event_device::set_state_oneshot_stopped() is not implemented,
hrtimer_cancel() can't stop the clock when there is no more timer in
the queue. So the ghost of the freshly cancelled hrtimer haunts us back
later with an extra interrupt:

          <idle>-0     [002] d..2  2248.557659: hrtimer_cancel: hrtimer=ffff88021fa92d80
          <idle>-0     [002] d.h1  2249.303659: local_timer_entry: vector=239

So let's implement this missing callback for the lapic clock. This
consist in calling its set_state_shutdown() callback. There don't seem
to be a lighter way to stop the clock. Simply writing 0 to APIC_TMICT
won't be enough to stop the clock and avoid the extra interrupt, as
opposed to what is specified in the specs. We must also mask the
timer interrupt in the device.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1483029949-6925-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-09 11:48:42 +01:00
Dou Liyang 12bf98b91f x86/apic: Fix typos in comments
s/ID/IDs/
 s/inr_logical_cpuidi/nr_logical_cpuids/
 s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1483610083-24314-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-06 08:40:33 +01:00
Daniel Bristot de Oliveira c4158ff536 x86/irq, trace: Add __irq_entry annotation to x86's platform IRQ handlers
This patch adds the __irq_entry annotation to the default x86
platform IRQ handlers. ftrace's function_graph tracer uses the
__irq_entry annotation to notify the entry and return of IRQ
handlers.

For example, before the patch:
  354549.667252 |   3)  d..1              |  default_idle_call() {
  354549.667252 |   3)  d..1              |    arch_cpu_idle() {
  354549.667253 |   3)  d..1              |      default_idle() {
  354549.696886 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  354549.696886 |   3)  d..1              |          irq_enter() {
  354549.696886 |   3)  d..1              |            rcu_irq_enter() {

After the patch:
  366416.254476 |   3)  d..1              |    arch_cpu_idle() {
  366416.254476 |   3)  d..1              |      default_idle() {
  366416.261566 |   3)  d..1  ==========> |
  366416.261566 |   3)  d..1              |        smp_trace_reschedule_interrupt() {
  366416.261566 |   3)  d..1              |          irq_enter() {
  366416.261566 |   3)  d..1              |            rcu_irq_enter() {

KASAN also uses this annotation. The smp_apic_timer_interrupt()
was already annotated.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-05 08:58:49 +01:00
Thomas Gleixner 9d85eb9119 x86/smpboot: Make logical package management more robust
The logical package management has several issues:

 - The APIC ids provided by ACPI are not required to be the same as the
   initial APIC id which can be retrieved by CPUID. The APIC ids provided
   by ACPI are those which are written by the BIOS into the APIC. The
   initial id is set by hardware and can not be changed. The hardware
   provided ids contain the real hardware package information.

   Especially AMD sets the effective APIC id different from the hardware id
   as they need to reserve space for the IOAPIC ids starting at id 0.

   As a consequence those machines trigger the currently active firmware
   bug printouts in dmesg, These are obviously wrong.

 - Virtual machines have their own interesting of enumerating APICs and
   packages which are not reliably covered by the current implementation.

The sizing of the mapping array has been tweaked to be generously large to
handle systems which provide a wrong core count when HT is disabled so the
whole magic which checks for space in the physical hotplug case is not
needed anymore.

Simplify the whole machinery and do the mapping when the CPU starts and the
CPUID derived physical package information is available. This solves the
observed problems on AMD machines and works for the virtualization issues
as well.

Remove the extra call from XEN cpu bringup code as it is not longer
required.

Fixes: d49597fd3b ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: M. Vefa Bicakci <m.v.b@runbox.com>
Cc: xen-devel <xen-devel@lists.xen.org>
Cc: Charles (Chas) Williams <ciwillia@brocade.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-13 10:22:39 +01:00
Linus Torvalds 212f30008a Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 idle updates from Ingo Molnar:
 "There were two bigger changes in this development cycle:

   - remove idle notifiers:

       32 files changed, 74 insertions(+), 803 deletions(-)

     These notifiers were of questionable value and the main usecase,
     the i7300 driver, was essentially unmaintained and can be removed,
     plus modern power management concepts don't need the callback - so
     use this golden opportunity and get rid of this opaque and fragile
     callback from a latency sensitive code path.

     (Len Brown, Thomas Gleixner)

   - improve the AMD Erratum 400 workaround that used high overhead MSR
     polling in the idle loop (Borisla Petkov, Thomas Gleixner)"

* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Remove empty idle.h header
  x86/amd: Simplify AMD E400 aware idle routine
  x86/amd: Check for the C1E bug post ACPI subsystem init
  x86/bugs: Separate AMD E400 erratum and C1E bug
  x86/cpufeature: Provide helper to set bugs bits
  x86/idle: Remove enter_idle(), exit_idle()
  x86: Remove x86_test_and_clear_bit_percpu()
  x86/idle: Remove is_idle flag
  x86/idle: Remove idle_notifier
  i7300_idle: Remove this driver
2016-12-12 14:55:04 -08:00
Thomas Gleixner 34bc3560c6 x86: Remove empty idle.h header
One include less is always a good thing(tm). Good riddance.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:22 +01:00
Borislav Petkov 07c94a3812 x86/amd: Simplify AMD E400 aware idle routine
Reorganize the E400 detection now that we have everything in place:
switch the CPUs to broadcast mode after the LAPIC has been initialized
and remove the facilities that were used previously on the idle path.

Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
because alternatives have been applied when the actual detection happens,
so the static switching does not take effect and the test will stay
false. Use boot_cpu_has_bug() instead which is definitely an improvement
over the RDMSR and the cpumask handling.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-09 21:23:21 +01:00
Wanpeng Li 8ca225520e x86/apic: Prevent tracing on apic_msr_write_eoi()
The following RCU lockdep warning led to adding irq_enter()/irq_exit() into
smp_reschedule_interrupt():

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 1, debug_locks = 0
 RCU used illegally from extended quiescent state!
 no locks held by swapper/1/0.
 
  do_trace_write_msr
  native_write_msr
  native_apic_msr_eoi_write
  smp_reschedule_interrupt
  reschedule_interrupt

As Peterz pointed out:

| So now we're making a very frequent interrupt slower because of debug 
| code.
|
| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only sent to tickle the
| return to user path (which does the actual preemption).
| 
| Having to do the whole irq_enter/irq_exit dance just for this unlikely
| debug case totally blows.

Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the
kvm variant with notrace and add a native_apic_eoi callback to the apic
structure so KVM guests are covered as well.

This allows to revert the irq_enter/irq_exit dance in
smp_reschedule_interrupt().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Mike Galbraith <efault@gmx.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 22:03:14 +01:00
Thomas Gleixner df610d6788 x86/apic: Prevent pointless warning messages
Markus reported that he sees new warnings:

  APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
  APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.

This comes from the recent persistant cpuid - nodeid changes. The code
which emits the warning has been called prior to these changes only for
enabled processors. Now it's called for disabled processors as well to get
the possible cpu accounting correct. So if the kernel is compiled for the
number of actual available/enabled CPUs and the BIOS reports disabled CPUs
as well then the above warnings are printed.

That's a pointless exercise as it only makes sense if there are more CPUs
enabled than the kernel supports.

Nake the warning conditional on enabled processors so we are back to the
state before these changes.

Fixes: 8f54969dc8 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: linux-acpi@vger.kernel.org
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-08 12:18:36 +02:00
Thomas Gleixner eb6296dec1 x86/apic: Fix silent & fatal merge conflict in __generic_processor_info()
Fix up the silent merge conflict between commit c291b01515 in x86/urgent
and commit f7c28833c2 in x86/apic which both remove num_processors++
from the original location and then add it at two different locations. As a
result num_processors is incremented twice which can cut the number of
available cpus in half.

Remove the one which is added by commit c291b01515.

In hindsight I should have merged x86/urgent into x86/apic _before_ adding
the nodeid bits, but in hindsight we are always smarter.

Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: 1e1b37273c ("Merge branch 'x86/urgent' into x86/apic")
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1609261350090.5483@nanos
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26 15:51:22 -04:00
Thomas Gleixner 1e1b37273c Merge branch 'x86/urgent' into x86/apic
Bring in the upstream modifications so we can fixup the silent merge
conflict which is introduced by this merge.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-26 15:47:03 -04:00
Gu Zheng 8f54969dc8 x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
wq_numa_possible_cpumask will not cause any problem.
It contains 4 steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
2. Introduce a new array storing all possible cpuid <-> apicid mapping.
3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
4. Establish all possible cpuid <-> nodeid mapping.

This patch finishes step 2.

In this patch, we introduce a new static array named cpuid_to_apicid[],
which is large enough to store info for all possible cpus.

And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.

After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
mapping will be persistent.

And finally we will use this array to make cpuid <-> nodeid persistent.

cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.

In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21 21:18:38 +02:00
Gu Zheng f7c28833c2 x86/acpi: Enable acpi to register all possible cpus at boot time
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.

When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.

So here is the problem:

Assume we have the following cpuid <-> nodeid in the beginning:

  Node | CPU

------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119

and we hot-remove node2 and node3, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89

and we hot-add node4 and node5, it becomes:

  Node | CPU
------------------------
node 0 |  0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119

But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.

When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.

static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
        /* if cpumask is contained inside a NUMA node, we belong to that node */
        if (wq_numa_enabled) {
                for_each_node(node) {
                        if (cpumask_subset(pool->attrs->cpumask,
                                           wq_numa_possible_cpumask[node])) {
                                pool->node = node;
                                break;
                        }
                }
        }

Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:

 SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
  cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
  node 0: slabs: 6172, objs: 259224, free: 245741
  node 1: slabs: 3261, objs: 136962, free: 127656

It happens here:

create_worker(struct worker_pool *pool)
 |--> worker = alloc_worker(pool->node);

static struct worker *alloc_worker(int node)
{
        struct worker *worker;

        worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.

        ......

        return worker;
}

[Solution]

There are four mappings in the kernel:
1. nodeid (logical node id)   <->   pxm
2. apicid (physical cpu id)   <->   nodeid
3. cpuid (logical cpu id)     <->   apicid
4. cpuid (logical cpu id)     <->   nodeid

1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
   mapping is setup at boot time. This mapping is persistent, won't change.

2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
   time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
   persistent.

3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
   allocated, lower ids first, and released at CPU hotremove time, reused for other
   hotadded CPUs. So this mapping is not persistent.

4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
   cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.

To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.

apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:

1. Enable apic registeration flow to handle both enabled and disabled cpus.
   This is done by introducing an extra parameter to generic_processor_info to let the
   caller control if disabled cpus are ignored.

2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
   the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
   registering local apic. Store the mapping in this array.

3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
   This is also done by introducing an extra parameter to these apis to let the caller
   control if disabled cpus are ignored.

4. Establish all possible cpuid <-> nodeid mapping.
   This is done via an additional acpi namespace walk for processors.

This patch finished step 1.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: mika.j.penttila@gmail.com
Cc: len.brown@intel.com
Cc: rafael@kernel.org
Cc: rjw@rjwysocki.net
Cc: yasu.isimatu@gmail.com
Cc: linux-mm@kvack.org
Cc: linux-acpi@vger.kernel.org
Cc: isimatu.yasuaki@jp.fujitsu.com
Cc: gongzhaogang@inspur.com
Cc: tj@kernel.org
Cc: izumi.taku@jp.fujitsu.com
Cc: cl@linux.com
Cc: chen.tang@easystack.cn
Cc: akpm@linux-foundation.org
Cc: kamezawa.hiroyu@jp.fujitsu.com
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21 21:18:38 +02:00
Denys Vlasenko cff9ab2b29 x86/apic: Get rid of apic_version[] array
The array has a size of MAX_LOCAL_APIC, which can be as large as 32k, so it
can consume up to 128k.

The array has been there forever and was never used for anything useful
other than a version mismatch check which was introduced in 2009.

There is no reason to store the version in an array. The kernel is not
prepared to handle different APIC versions anyway, so the real important
part is to detect a version mismatch and warn about it, which can be done
with a single variable as well.

[ tglx: Massaged changelog ]

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Borislav Petkov <bp@alien8.de>
CC: Brian Gerst <brgerst@gmail.com>
CC: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20160913181232.30815-1-dvlasenk@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-20 00:31:19 +02:00
Dou Liyang c291b01515 x86/apic: Fix num_processors value in case of failure
If the topology package map check of the APIC ID and the CPU is a failure,
we don't generate the processor info for that APIC ID yet we increase
disabled_cpus by one - which is buggy.

Only increase num_processors once we are sure we don't fail.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1473214893-16481-1-git-send-email-douly.fnst@cn.fujitsu.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-08 08:11:03 +02:00
Wei Jiangang 5035da4199 x86/apic: Update comment about disabling processor focus
Fix references to discarded end_level_ioapic_irq().

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1471576957-12961-2-git-send-email-weijg.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 11:24:33 +02:00
Wanpeng Li 2e63ad4bd5 x86/apic: Do not init irq remapping if ioapic is disabled
native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping		  

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-24 09:45:40 +02:00
Baoquan He 6de421198c x86/apic, ACPI: Remove the repeated lapic address override entry parsing
The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-15 08:53:37 +02:00
Ingo Molnar fdbdfefbab Merge branch 'linus' into timers/urgent, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 14:36:23 +02:00
Nicolai Stange 6731b0d611 x86/timers/apic: Inform TSC deadline clockevent device about recalibration
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.

The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().

This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".

Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.

Call it from the TSC code after TSC calibration refinement has happened.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:38:12 +02:00
Nicolai Stange 1a9e4c564a x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
I noticed the following bug/misbehavior on certain Intel systems: with a
single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum. (!)

Further tracing showed that the first one precedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.

The reason for this is imprecise calibration, the timeout we program into
the APIC results in 'too short' timer interrupts. The core (hr)timer code
notices this (because it has a precise ktime source and sees the short
interrupt) and fixes it up by programming an additional very short
interrupt period.

This is obviously suboptimal.

The reason for the imprecise calibration is twofold, and this patch
fixes the first reason:

In setup_APIC_timer(), the registered clockevent device's frequency
is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
it with 1000 afterwards:

  (tsc_khz / TSC_DIVISOR) * 1000

The multiplication with 1000 is done for converting from kHz to Hz and the
division by TSC_DIVISOR is carried out in order to make sure that the final
result fits into an u32.

However, with the order given in this calculation, the roundoff error
introduced by the division gets magnified by a factor of 1000 by the
following multiplication.

To fix it, reversing the order of the division and the multiplication a la:

  (tsc_khz * 1000) / TSC_DIVISOR

... reduces the roundoff error already.

Furthermore, if TSC_DIVISOR divides 1000, associativity holds:

  (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)

and thus, the roundoff error even vanishes and the whole operation can be
carried out within 32 bits.

The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
TSC_DIVISOR still allows for TSC frequencies up to
2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
in the next years.

Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
the order of the divison and the multiplication in the calculation of
the registered clockevent device's frequency.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
[ Improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 12:37:38 +02:00
Masahiro Yamada 97f2645f35 tree-wide: replace config_enabled() with IS_ENABLED()
The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

 - config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

 - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Joshua Kinard <kumba@gentoo.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will Drewry <wad@chromium.org>
Cc: Nikolay Martynov <mar.kolya@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Rafal Milecki <zajec5@gmail.com>
Cc: James Cowgill <James.Cowgill@imgtec.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tony Wu <tung7970@gmail.com>
Cc: Huaitong Han <huaitong.han@intel.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rabin Vincent <rabin@rab.in>
Cc: "Maciej W. Rozycki" <macro@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Linus Torvalds aeb35d6b74 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header cleanups from Ingo Molnar:
 "This tree is a cleanup of the x86 tree reducing spurious uses of
  module.h - which should improve build performance a bit"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
  x86/apic: Remove duplicated include from probe_64.c
  x86/ce4100: Remove duplicated include from ce4100.c
  x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
  x86/platform: Delete extraneous MODULE_* tags fromm ts5500
  x86: Audit and remove any remaining unnecessary uses of module.h
  x86/kvm: Audit and remove any unnecessary uses of module.h
  x86/xen: Audit and remove any unnecessary uses of module.h
  x86/platform: Audit and remove any unnecessary uses of module.h
  x86/lib: Audit and remove any unnecessary uses of module.h
  x86/kernel: Audit and remove any unnecessary uses of module.h
  x86/mm: Audit and remove any unnecessary uses of module.h
  x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-08-01 14:23:42 -04:00
Linus Torvalds 08fd8c1768 xen: features and fixes for 4.8-rc0
- ACPI support for guests on ARM platforms.
 - Generic steal time support for arm and x86.
 - Support cases where kernel cpu is not Xen VCPU number (e.g., if
   in-guest kexec is used).
 - Use the system workqueue instead of a custom workqueue in various
   places.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXmLlrAAoJEFxbo/MsZsTRvRQH/1wOMF8BmlbZfR7H3qwDfjst
 ApNifCiZE08xDtWBlwUaBFAQxyflQS9BBiNZDVK0sysIdXeOdpWV7V0ZjRoLL+xr
 czsaaGXDcmXxJxApoMDVuT7FeP6rEk6LVAYRoHpVjJjMZGW3BbX1vZaMW4DXl2WM
 9YNaF2Lj+rpc1f8iG31nUxwkpmcXFog6ct4tu7HiyCFT3hDkHt/a4ghuBdQItCkd
 vqBa1pTpcGtQBhSmWzlylN/PV2+NKcRd+kGiwd09/O/rNzogTMCTTWeHKAtMpPYb
 Cu6oSqJtlK5o0vtr0qyLSWEGIoyjE2gE92s0wN3iCzFY1PldqdsxUO622nIj+6o=
 =G6q3
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Features and fixes for 4.8-rc0:

   - ACPI support for guests on ARM platforms.
   - Generic steal time support for arm and x86.
   - Support cases where kernel cpu is not Xen VCPU number (e.g., if
     in-guest kexec is used).
   - Use the system workqueue instead of a custom workqueue in various
     places"

* tag 'for-linus-4.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (47 commits)
  xen: add static initialization of steal_clock op to xen_time_ops
  xen/pvhvm: run xen_vcpu_setup() for the boot CPU
  xen/evtchn: use xen_vcpu_id mapping
  xen/events: fifo: use xen_vcpu_id mapping
  xen/events: use xen_vcpu_id mapping in events_base
  x86/xen: use xen_vcpu_id mapping when pointing vcpu_info to shared_info
  x86/xen: use xen_vcpu_id mapping for HYPERVISOR_vcpu_op
  xen: introduce xen_vcpu_id mapping
  x86/acpi: store ACPI ids from MADT for future usage
  x86/xen: update cpuid.h from Xen-4.7
  xen/evtchn: add IOCTL_EVTCHN_RESTRICT
  xen-blkback: really don't leak mode property
  xen-blkback: constify instance of "struct attribute_group"
  xen-blkfront: prefer xenbus_scanf() over xenbus_gather()
  xen-blkback: prefer xenbus_scanf() over xenbus_gather()
  xen: support runqueue steal time on xen
  arm/xen: add support for vm_assist hypercall
  xen: update xen headers
  xen-pciback: drop superfluous variables
  xen-pciback: short-circuit read path used for merging write values
  ...
2016-07-27 11:35:37 -07:00
Vitaly Kuznetsov 3e9e57fad3 x86/acpi: store ACPI ids from MADT for future usage
Currently we don't save ACPI ids (unlike LAPIC ids which go to
x86_cpu_to_apicid) from MADT and we may need this information later.
Particularly, ACPI ids is the only existent way for a PVHVM Xen guest
to figure out Xen's idea of its vCPUs ids before these CPUs boot and
in some cases these ids diverge from Linux's cpu ids.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-25 13:30:53 +01:00