The X86 entry, exception and interrupt code rework

This all started about 6 month ago with the attempt to move the Posix CPU
 timer heavy lifting out of the timer interrupt code and just have lockless
 quick checks in that code path. Trivial 5 patches.
 
 This unearthed an inconsistency in the KVM handling of task work and the
 review requested to move all of this into generic code so other
 architectures can share.
 
 Valid request and solved with another 25 patches but those unearthed
 inconsistencies vs. RCU and instrumentation.
 
 Digging into this made it obvious that there are quite some inconsistencies
 vs. instrumentation in general. The int3 text poke handling in particular
 was completely unprotected and with the batched update of trace events even
 more likely to expose to endless int3 recursion.
 
 In parallel the RCU implications of instrumenting fragile entry code came
 up in several discussions.
 
 The conclusion of the X86 maintainer team was to go all the way and make
 the protection against any form of instrumentation of fragile and dangerous
 code pathes enforcable and verifiable by tooling.
 
 A first batch of preparatory work hit mainline with commit d5f744f9a2.
 
 The (almost) full solution introduced a new code section '.noinstr.text'
 into which all code which needs to be protected from instrumentation of all
 sorts goes into. Any call into instrumentable code out of this section has
 to be annotated. objtool has support to validate this. Kprobes now excludes
 this section fully which also prevents BPF from fiddling with it and all
 'noinstr' annotated functions also keep ftrace off. The section, kprobes
 and objtool changes are already merged.
 
 The major changes coming with this are:
 
     - Preparatory cleanups
 
     - Annotating of relevant functions to move them into the noinstr.text
       section or enforcing inlining by marking them __always_inline so the
       compiler cannot misplace or instrument them.
 
     - Splitting and simplifying the idtentry macro maze so that it is now
       clearly separated into simple exception entries and the more
       interesting ones which use interrupt stacks and have the paranoid
       handling vs. CR3 and GS.
 
     - Move quite some of the low level ASM functionality into C code:
 
        - enter_from and exit to user space handling. The ASM code now calls
          into C after doing the really necessary ASM handling and the return
 	 path goes back out without bells and whistels in ASM.
 
        - exception entry/exit got the equivivalent treatment
 
        - move all IRQ tracepoints from ASM to C so they can be placed as
          appropriate which is especially important for the int3 recursion
          issue.
 
     - Consolidate the declaration and definition of entry points between 32
       and 64 bit. They share a common header and macros now.
 
     - Remove the extra device interrupt entry maze and just use the regular
       exception entry code.
 
     - All ASM entry points except NMI are now generated from the shared header
       file and the corresponding macros in the 32 and 64 bit entry ASM.
 
     - The C code entry points are consolidated as well with the help of
       DEFINE_IDTENTRY*() macros. This allows to ensure at one central point
       that all corresponding entry points share the same semantics. The
       actual function body for most entry points is in an instrumentable
       and sane state.
 
       There are special macros for the more sensitive entry points,
       e.g. INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
       They allow to put the whole entry instrumentation and RCU handling
       into safe places instead of the previous pray that it is correct
       approach.
 
     - The INT3 text poke handling is now completely isolated and the
       recursion issue banned. Aside of the entry rework this required other
       isolation work, e.g. the ability to force inline bsearch.
 
     - Prevent #DB on fragile entry code, entry relevant memory and disable
       it on NMI, #MC entry, which allowed to get rid of the nested #DB IST
       stack shifting hackery.
 
     - A few other cleanups and enhancements which have been made possible
       through this and already merged changes, e.g. consolidating and
       further restricting the IDT code so the IDT table becomes RO after
       init which removes yet another popular attack vector
 
     - About 680 lines of ASM maze are gone.
 
 There are a few open issues:
 
    - An escape out of the noinstr section in the MCE handler which needs
      some more thought but under the aspect that MCE is a complete
      trainwreck by design and the propability to survive it is low, this was
      not high on the priority list.
 
    - Paravirtualization
 
      When PV is enabled then objtool complains about a bunch of indirect
      calls out of the noinstr section. There are a few straight forward
      ways to fix this, but the other issues vs. general correctness were
      more pressing than parawitz.
 
    - KVM
 
      KVM is inconsistent as well. Patches have been posted, but they have
      not yet been commented on or picked up by the KVM folks.
 
    - IDLE
 
      Pretty much the same problems can be found in the low level idle code
      especially the parts where RCU stopped watching. This was beyond the
      scope of the more obvious and exposable problems and is on the todo
      list.
 
 The lesson learned from this brain melting exercise to morph the evolved
 code base into something which can be validated and understood is that once
 again the violation of the most important engineering principle
 "correctness first" has caused quite a few people to spend valuable time on
 problems which could have been avoided in the first place. The "features
 first" tinkering mindset really has to stop.
 
 With that I want to say thanks to everyone involved in contributing to this
 effort. Special thanks go to the following people (alphabetical order):
 
    Alexandre Chartre
    Andy Lutomirski
    Borislav Petkov
    Brian Gerst
    Frederic Weisbecker
    Josh Poimboeuf
    Juergen Gross
    Lai Jiangshan
    Macro Elver
    Paolo Bonzini
    Paul McKenney
    Peter Zijlstra
    Vitaly Kuznetsov
    Will Deacon
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7j510THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoU2WD/4refvaNm08fG7aiVYem3JJzr0+Pq5O
 /opwnI/1D973ApApj5W/Nd53sN5tVqOiXncSKgywRBWZxRCAGjVYypl9rjpvXu4l
 HlMjhEKBmWkDryxxrM98Vr7hl3hnId5laR56oFfH+G4LUsItaV6Uak/HfXZ4Mq1k
 iYVbEtl2CN+KJjvSgZ6Y1l853Ab5mmGvmeGNHHWCj8ZyjF3cOLoelDTQNnsb0wXM
 crKXBcXJSsCWKYyJ5PTvB82crQCET7Su+LgwK06w/ZbW1//2hVIjSCiN5o/V+aRJ
 06BZNMj8v9tfglkN8LEQvRIjTlnEQ2sq3GxbrVtA53zxkzbBCBJQ96w8yYzQX0ux
 yhqQ/aIZJ1wTYEjJzSkftwLNMRHpaOUnKvJndXRKAYi+eGI7syF61qcZSYGKuAQ/
 bK3b/CzU6QWr1235oTADxh4isEwxA0Pg5wtJCfDDOG0MJ9ALMSOGUkhoiz5EqpkU
 mzFAwfG/Uj7hRjlkms7Yj2OjZfnU7iypj63GgpXghLjr5ksRFKEOMw8e1GXltVHs
 zzwghUjqp2EPq0VOOQn3lp9lol5Prc3xfFHczKpO+CJW6Rpa4YVdqJmejBqJy/on
 Hh/T/ST3wa2qBeAw89vZIeWiUJZZCsQ0f//+2hAbzJY45Y6DuR9vbTAPb9agRgOM
 xg+YaCfpQqFc1A==
 =llba
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry updates from Thomas Gleixner:
 "The x86 entry, exception and interrupt code rework

  This all started about 6 month ago with the attempt to move the Posix
  CPU timer heavy lifting out of the timer interrupt code and just have
  lockless quick checks in that code path. Trivial 5 patches.

  This unearthed an inconsistency in the KVM handling of task work and
  the review requested to move all of this into generic code so other
  architectures can share.

  Valid request and solved with another 25 patches but those unearthed
  inconsistencies vs. RCU and instrumentation.

  Digging into this made it obvious that there are quite some
  inconsistencies vs. instrumentation in general. The int3 text poke
  handling in particular was completely unprotected and with the batched
  update of trace events even more likely to expose to endless int3
  recursion.

  In parallel the RCU implications of instrumenting fragile entry code
  came up in several discussions.

  The conclusion of the x86 maintainer team was to go all the way and
  make the protection against any form of instrumentation of fragile and
  dangerous code pathes enforcable and verifiable by tooling.

  A first batch of preparatory work hit mainline with commit
  d5f744f9a2 ("Pull x86 entry code updates from Thomas Gleixner")

  That (almost) full solution introduced a new code section
  '.noinstr.text' into which all code which needs to be protected from
  instrumentation of all sorts goes into. Any call into instrumentable
  code out of this section has to be annotated. objtool has support to
  validate this.

  Kprobes now excludes this section fully which also prevents BPF from
  fiddling with it and all 'noinstr' annotated functions also keep
  ftrace off. The section, kprobes and objtool changes are already
  merged.

  The major changes coming with this are:

    - Preparatory cleanups

    - Annotating of relevant functions to move them into the
      noinstr.text section or enforcing inlining by marking them
      __always_inline so the compiler cannot misplace or instrument
      them.

    - Splitting and simplifying the idtentry macro maze so that it is
      now clearly separated into simple exception entries and the more
      interesting ones which use interrupt stacks and have the paranoid
      handling vs. CR3 and GS.

    - Move quite some of the low level ASM functionality into C code:

       - enter_from and exit to user space handling. The ASM code now
         calls into C after doing the really necessary ASM handling and
         the return path goes back out without bells and whistels in
         ASM.

       - exception entry/exit got the equivivalent treatment

       - move all IRQ tracepoints from ASM to C so they can be placed as
         appropriate which is especially important for the int3
         recursion issue.

    - Consolidate the declaration and definition of entry points between
      32 and 64 bit. They share a common header and macros now.

    - Remove the extra device interrupt entry maze and just use the
      regular exception entry code.

    - All ASM entry points except NMI are now generated from the shared
      header file and the corresponding macros in the 32 and 64 bit
      entry ASM.

    - The C code entry points are consolidated as well with the help of
      DEFINE_IDTENTRY*() macros. This allows to ensure at one central
      point that all corresponding entry points share the same
      semantics. The actual function body for most entry points is in an
      instrumentable and sane state.

      There are special macros for the more sensitive entry points, e.g.
      INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
      They allow to put the whole entry instrumentation and RCU handling
      into safe places instead of the previous pray that it is correct
      approach.

    - The INT3 text poke handling is now completely isolated and the
      recursion issue banned. Aside of the entry rework this required
      other isolation work, e.g. the ability to force inline bsearch.

    - Prevent #DB on fragile entry code, entry relevant memory and
      disable it on NMI, #MC entry, which allowed to get rid of the
      nested #DB IST stack shifting hackery.

    - A few other cleanups and enhancements which have been made
      possible through this and already merged changes, e.g.
      consolidating and further restricting the IDT code so the IDT
      table becomes RO after init which removes yet another popular
      attack vector

    - About 680 lines of ASM maze are gone.

  There are a few open issues:

   - An escape out of the noinstr section in the MCE handler which needs
     some more thought but under the aspect that MCE is a complete
     trainwreck by design and the propability to survive it is low, this
     was not high on the priority list.

   - Paravirtualization

     When PV is enabled then objtool complains about a bunch of indirect
     calls out of the noinstr section. There are a few straight forward
     ways to fix this, but the other issues vs. general correctness were
     more pressing than parawitz.

   - KVM

     KVM is inconsistent as well. Patches have been posted, but they
     have not yet been commented on or picked up by the KVM folks.

   - IDLE

     Pretty much the same problems can be found in the low level idle
     code especially the parts where RCU stopped watching. This was
     beyond the scope of the more obvious and exposable problems and is
     on the todo list.

  The lesson learned from this brain melting exercise to morph the
  evolved code base into something which can be validated and understood
  is that once again the violation of the most important engineering
  principle "correctness first" has caused quite a few people to spend
  valuable time on problems which could have been avoided in the first
  place. The "features first" tinkering mindset really has to stop.

  With that I want to say thanks to everyone involved in contributing to
  this effort. Special thanks go to the following people (alphabetical
  order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
  Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
  Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
  Vitaly Kuznetsov, and Will Deacon"

* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
  x86/entry: Force rcu_irq_enter() when in idle task
  x86/entry: Make NMI use IDTENTRY_RAW
  x86/entry: Treat BUG/WARN as NMI-like entries
  x86/entry: Unbreak __irqentry_text_start/end magic
  x86/entry: __always_inline CR2 for noinstr
  lockdep: __always_inline more for noinstr
  x86/entry: Re-order #DB handler to avoid *SAN instrumentation
  x86/entry: __always_inline arch_atomic_* for noinstr
  x86/entry: __always_inline irqflags for noinstr
  x86/entry: __always_inline debugreg for noinstr
  x86/idt: Consolidate idt functionality
  x86/idt: Cleanup trap_init()
  x86/idt: Use proper constants for table size
  x86/idt: Add comments about early #PF handling
  x86/idt: Mark init only functions __init
  x86/entry: Rename trace_hardirqs_off_prepare()
  x86/entry: Clarify irq_{enter,exit}_rcu()
  x86/entry: Remove DBn stacks
  x86/entry: Remove debug IDT frobbing
  x86/entry: Optimize local_db_save() for virt
  ...
This commit is contained in:
Linus Torvalds 2020-06-13 10:05:47 -07:00
commit 076f14be7f
111 changed files with 2754 additions and 2441 deletions

View File

@ -181,7 +181,6 @@ config X86
select HAVE_HW_BREAKPOINT
select HAVE_IDE
select HAVE_IOREMAP_PROT
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_GZIP

View File

@ -3,7 +3,13 @@
# Makefile for the x86 low level entry code
#
OBJECT_FILES_NON_STANDARD_entry_64_compat.o := y
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE) -fstack-protector -fstack-protector-strong
CFLAGS_REMOVE_syscall_32.o = $(CC_FLAGS_FTRACE) -fstack-protector -fstack-protector-strong
CFLAGS_REMOVE_syscall_64.o = $(CC_FLAGS_FTRACE) -fstack-protector -fstack-protector-strong
CFLAGS_syscall_64.o += $(call cc-option,-Wno-override-init,)
CFLAGS_syscall_32.o += $(call cc-option,-Wno-override-init,)

View File

@ -341,30 +341,13 @@ For 32-bit we have the following conventions - kernel is built with
#endif
.endm
#endif /* CONFIG_X86_64 */
#else /* CONFIG_X86_64 */
# undef UNWIND_HINT_IRET_REGS
# define UNWIND_HINT_IRET_REGS
#endif /* !CONFIG_X86_64 */
.macro STACKLEAK_ERASE
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
call stackleak_erase
#endif
.endm
/*
* This does 'call enter_from_user_mode' unless we can avoid it based on
* kernel config or using the static jump infrastructure.
*/
.macro CALL_enter_from_user_mode
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_JUMP_LABEL
STATIC_JUMP_IF_FALSE .Lafter_call_\@, context_tracking_key, def=0
#endif
call enter_from_user_mode
.Lafter_call_\@:
#endif
.endm
#ifdef CONFIG_PARAVIRT_XXL
#define GET_CR2_INTO(reg) GET_CR2_INTO_AX ; _ASM_MOV %_ASM_AX, reg
#else
#define GET_CR2_INTO(reg) _ASM_MOV %cr2, reg
#endif

View File

@ -27,6 +27,11 @@
#include <linux/syscalls.h>
#include <linux/uaccess.h>
#ifdef CONFIG_XEN_PV
#include <xen/xen-ops.h>
#include <xen/events.h>
#endif
#include <asm/desc.h>
#include <asm/traps.h>
#include <asm/vdso.h>
@ -35,21 +40,67 @@
#include <asm/nospec-branch.h>
#include <asm/io_bitmap.h>
#include <asm/syscall.h>
#include <asm/irq_stack.h>
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
#ifdef CONFIG_CONTEXT_TRACKING
/* Called on entry from user mode with IRQs off. */
__visible inline void enter_from_user_mode(void)
/**
* enter_from_user_mode - Establish state when coming from user mode
*
* Syscall entry disables interrupts, but user mode is traced as interrupts
* enabled. Also with NO_HZ_FULL RCU might be idle.
*
* 1) Tell lockdep that interrupts are disabled
* 2) Invoke context tracking if enabled to reactivate RCU
* 3) Trace interrupts off state
*/
static noinstr void enter_from_user_mode(void)
{
CT_WARN_ON(ct_state() != CONTEXT_USER);
enum ctx_state state = ct_state();
lockdep_hardirqs_off(CALLER_ADDR0);
user_exit_irqoff();
instrumentation_begin();
CT_WARN_ON(state != CONTEXT_USER);
trace_hardirqs_off_finish();
instrumentation_end();
}
#else
static inline void enter_from_user_mode(void) {}
static __always_inline void enter_from_user_mode(void)
{
lockdep_hardirqs_off(CALLER_ADDR0);
instrumentation_begin();
trace_hardirqs_off_finish();
instrumentation_end();
}
#endif
/**
* exit_to_user_mode - Fixup state when exiting to user mode
*
* Syscall exit enables interrupts, but the kernel state is interrupts
* disabled when this is invoked. Also tell RCU about it.
*
* 1) Trace interrupts on state
* 2) Invoke context tracking if enabled to adjust RCU state
* 3) Clear CPU buffers if CPU is affected by MDS and the migitation is on.
* 4) Tell lockdep that interrupts are enabled
*/
static __always_inline void exit_to_user_mode(void)
{
instrumentation_begin();
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
user_enter_irqoff();
mds_user_clear_cpu_buffers();
lockdep_hardirqs_on(CALLER_ADDR0);
}
static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
{
#ifdef CONFIG_X86_64
@ -179,8 +230,7 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
}
}
/* Called with IRQs disabled. */
__visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
static void __prepare_exit_to_usermode(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
u32 cached_flags;
@ -219,10 +269,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
*/
ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
#endif
}
user_enter_irqoff();
mds_user_clear_cpu_buffers();
__visible noinstr void prepare_exit_to_usermode(struct pt_regs *regs)
{
instrumentation_begin();
__prepare_exit_to_usermode(regs);
instrumentation_end();
exit_to_user_mode();
}
#define SYSCALL_EXIT_WORK_FLAGS \
@ -251,11 +305,7 @@ static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags)
tracehook_report_syscall_exit(regs, step);
}
/*
* Called with IRQs on and fully valid regs. Returns with IRQs off in a
* state such that we can immediately switch to user mode.
*/
__visible inline void syscall_return_slowpath(struct pt_regs *regs)
static void __syscall_return_slowpath(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
u32 cached_flags = READ_ONCE(ti->flags);
@ -276,15 +326,29 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
syscall_slow_exit_work(regs, cached_flags);
local_irq_disable();
prepare_exit_to_usermode(regs);
__prepare_exit_to_usermode(regs);
}
/*
* Called with IRQs on and fully valid regs. Returns with IRQs off in a
* state such that we can immediately switch to user mode.
*/
__visible noinstr void syscall_return_slowpath(struct pt_regs *regs)
{
instrumentation_begin();
__syscall_return_slowpath(regs);
instrumentation_end();
exit_to_user_mode();
}
#ifdef CONFIG_X86_64
__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
__visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs)
{
struct thread_info *ti;
enter_from_user_mode();
instrumentation_begin();
local_irq_enable();
ti = current_thread_info();
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
@ -301,8 +365,10 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
regs->ax = x32_sys_call_table[nr](regs);
#endif
}
__syscall_return_slowpath(regs);
syscall_return_slowpath(regs);
instrumentation_end();
exit_to_user_mode();
}
#endif
@ -313,7 +379,7 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
* extremely hot in workloads that use it, and it's usually called from
* do_fast_syscall_32, so forcibly inline it to improve performance.
*/
static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
static void do_syscall_32_irqs_on(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
unsigned int nr = (unsigned int)regs->orig_ax;
@ -337,27 +403,62 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
regs->ax = ia32_sys_call_table[nr](regs);
}
syscall_return_slowpath(regs);
__syscall_return_slowpath(regs);
}
/* Handles int $0x80 */
__visible void do_int80_syscall_32(struct pt_regs *regs)
__visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
{
enter_from_user_mode();
instrumentation_begin();
local_irq_enable();
do_syscall_32_irqs_on(regs);
instrumentation_end();
exit_to_user_mode();
}
static bool __do_fast_syscall_32(struct pt_regs *regs)
{
int res;
/* Fetch EBP from where the vDSO stashed it. */
if (IS_ENABLED(CONFIG_X86_64)) {
/*
* Micro-optimization: the pointer we're following is
* explicitly 32 bits, so it can't be out of range.
*/
res = __get_user(*(u32 *)&regs->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
} else {
res = get_user(*(u32 *)&regs->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp);
}
if (res) {
/* User code screwed up. */
regs->ax = -EFAULT;
local_irq_disable();
__prepare_exit_to_usermode(regs);
return false;
}
/* Now this is just like a normal syscall. */
do_syscall_32_irqs_on(regs);
return true;
}
/* Returns 0 to return using IRET or 1 to return using SYSEXIT/SYSRETL. */
__visible long do_fast_syscall_32(struct pt_regs *regs)
__visible noinstr long do_fast_syscall_32(struct pt_regs *regs)
{
/*
* Called using the internal vDSO SYSENTER/SYSCALL32 calling
* convention. Adjust regs so it looks like we entered using int80.
*/
unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
vdso_image_32.sym_int80_landing_pad;
vdso_image_32.sym_int80_landing_pad;
bool success;
/*
* SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
@ -367,33 +468,17 @@ __visible long do_fast_syscall_32(struct pt_regs *regs)
regs->ip = landing_pad;
enter_from_user_mode();
instrumentation_begin();
local_irq_enable();
success = __do_fast_syscall_32(regs);
/* Fetch EBP from where the vDSO stashed it. */
if (
#ifdef CONFIG_X86_64
/*
* Micro-optimization: the pointer we're following is explicitly
* 32 bits, so it can't be out of range.
*/
__get_user(*(u32 *)&regs->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp)
#else
get_user(*(u32 *)&regs->bp,
(u32 __user __force *)(unsigned long)(u32)regs->sp)
#endif
) {
instrumentation_end();
exit_to_user_mode();
/* User code screwed up. */
local_irq_disable();
regs->ax = -EFAULT;
prepare_exit_to_usermode(regs);
return 0; /* Keep it simple: use IRET. */
}
/* Now this is just like a normal syscall. */
do_syscall_32_irqs_on(regs);
/* If it failed, keep it simple: use IRET. */
if (!success)
return 0;
#ifdef CONFIG_X86_64
/*
@ -431,3 +516,266 @@ SYSCALL_DEFINE0(ni_syscall)
{
return -ENOSYS;
}
/**
* idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
* RCU handling
* @regs: Pointer to pt_regs of interrupted context
*
* Invokes:
* - lockdep irqflag state tracking as low level ASM entry disabled
* interrupts.
*
* - Context tracking if the exception hit user mode.
*
* - The hardirq tracer to keep the state consistent as low level ASM
* entry disabled interrupts.
*
* For kernel mode entries RCU handling is done conditional. If RCU is
* watching then the only RCU requirement is to check whether the tick has
* to be restarted. If RCU is not watching then rcu_irq_enter() has to be
* invoked on entry and rcu_irq_exit() on exit.
*
* Avoiding the rcu_irq_enter/exit() calls is an optimization but also
* solves the problem of kernel mode pagefaults which can schedule, which
* is not possible after invoking rcu_irq_enter() without undoing it.
*
* For user mode entries enter_from_user_mode() must be invoked to
* establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
* would not be possible.
*
* Returns: True if RCU has been adjusted on a kernel entry
* False otherwise
*
* The return value must be fed into the rcu_exit argument of
* idtentry_exit_cond_rcu().
*/
bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
{
if (user_mode(regs)) {
enter_from_user_mode();
return false;
}
/*
* If this entry hit the idle task invoke rcu_irq_enter() whether
* RCU is watching or not.
*
* Interupts can nest when the first interrupt invokes softirq
* processing on return which enables interrupts.
*
* Scheduler ticks in the idle task can mark quiescent state and
* terminate a grace period, if and only if the timer interrupt is
* not nested into another interrupt.
*
* Checking for __rcu_is_watching() here would prevent the nesting
* interrupt to invoke rcu_irq_enter(). If that nested interrupt is
* the tick then rcu_flavor_sched_clock_irq() would wrongfully
* assume that it is the first interupt and eventually claim
* quiescient state and end grace periods prematurely.
*
* Unconditionally invoke rcu_irq_enter() so RCU state stays
* consistent.
*
* TINY_RCU does not support EQS, so let the compiler eliminate
* this part when enabled.
*/
if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
/*
* If RCU is not watching then the same careful
* sequence vs. lockdep and tracing is required
* as in enter_from_user_mode().
*/
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
instrumentation_end();
return true;
}
/*
* If RCU is watching then RCU only wants to check whether it needs
* to restart the tick in NOHZ mode. rcu_irq_enter_check_tick()
* already contains a warning when RCU is not watching, so no point
* in having another one here.
*/
instrumentation_begin();
rcu_irq_enter_check_tick();
/* Use the combo lockdep/tracing function */
trace_hardirqs_off();
instrumentation_end();
return false;
}
static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
{
if (may_sched && !preempt_count()) {
/* Sanity check RCU and thread stack */
rcu_irq_exit_check_preempt();
if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
WARN_ON_ONCE(!on_thread_stack());
if (need_resched())
preempt_schedule_irq();
}
/* Covers both tracing and lockdep */
trace_hardirqs_on();
}
/**
* idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
* handling
* @regs: Pointer to pt_regs (exception entry regs)
* @rcu_exit: Invoke rcu_irq_exit() if true
*
* Depending on the return target (kernel/user) this runs the necessary
* preemption and work checks if possible and reguired and returns to
* the caller with interrupts disabled and no further work pending.
*
* This is the last action before returning to the low level ASM code which
* just needs to return to the appropriate context.
*
* Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
* function must be fed into the @rcu_exit argument.
*/
void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
{
lockdep_assert_irqs_disabled();
/* Check whether this returns to user mode */
if (user_mode(regs)) {
prepare_exit_to_usermode(regs);
} else if (regs->flags & X86_EFLAGS_IF) {
/*
* If RCU was not watching on entry this needs to be done
* carefully and needs the same ordering of lockdep/tracing
* and RCU as the return to user mode path.
*/
if (rcu_exit) {
instrumentation_begin();
/* Tell the tracer that IRET will enable interrupts */
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
instrumentation_end();
rcu_irq_exit();
lockdep_hardirqs_on(CALLER_ADDR0);
return;
}
instrumentation_begin();
idtentry_exit_cond_resched(regs, IS_ENABLED(CONFIG_PREEMPTION));
instrumentation_end();
} else {
/*
* IRQ flags state is correct already. Just tell RCU if it
* was not watching on entry.
*/
if (rcu_exit)
rcu_irq_exit();
}
}
/**
* idtentry_enter_user - Handle state tracking on idtentry from user mode
* @regs: Pointer to pt_regs of interrupted context
*
* Invokes enter_from_user_mode() to establish the proper context for
* NOHZ_FULL. Otherwise scheduling on exit would not be possible.
*/
void noinstr idtentry_enter_user(struct pt_regs *regs)
{
enter_from_user_mode();
}
/**
* idtentry_exit_user - Handle return from exception to user mode
* @regs: Pointer to pt_regs (exception entry regs)
*
* Runs the necessary preemption and work checks and returns to the caller
* with interrupts disabled and no further work pending.
*
* This is the last action before returning to the low level ASM code which
* just needs to return to the appropriate context.
*
* Counterpart to idtentry_enter_user().
*/
void noinstr idtentry_exit_user(struct pt_regs *regs)
{
lockdep_assert_irqs_disabled();
prepare_exit_to_usermode(regs);
}
#ifdef CONFIG_XEN_PV
#ifndef CONFIG_PREEMPTION
/*
* Some hypercalls issued by the toolstack can take many 10s of
* seconds. Allow tasks running hypercalls via the privcmd driver to
* be voluntarily preempted even if full kernel preemption is
* disabled.
*
* Such preemptible hypercalls are bracketed by
* xen_preemptible_hcall_begin() and xen_preemptible_hcall_end()
* calls.
*/
DEFINE_PER_CPU(bool, xen_in_preemptible_hcall);
EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
/*
* In case of scheduling the flag must be cleared and restored after
* returning from schedule as the task might move to a different CPU.
*/
static __always_inline bool get_and_clear_inhcall(void)
{
bool inhcall = __this_cpu_read(xen_in_preemptible_hcall);
__this_cpu_write(xen_in_preemptible_hcall, false);
return inhcall;
}
static __always_inline void restore_inhcall(bool inhcall)
{
__this_cpu_write(xen_in_preemptible_hcall, inhcall);
}
#else
static __always_inline bool get_and_clear_inhcall(void) { return false; }
static __always_inline void restore_inhcall(bool inhcall) { }
#endif
static void __xen_pv_evtchn_do_upcall(void)
{
irq_enter_rcu();
inc_irq_stat(irq_hv_callback_count);
xen_hvm_evtchn_do_upcall();
irq_exit_rcu();
}
__visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
{
struct pt_regs *old_regs;
bool inhcall, rcu_exit;
rcu_exit = idtentry_enter_cond_rcu(regs);
old_regs = set_irq_regs(regs);
instrumentation_begin();
run_on_irqstack_cond(__xen_pv_evtchn_do_upcall, NULL, regs);
instrumentation_begin();
set_irq_regs(old_regs);
inhcall = get_and_clear_inhcall();
if (inhcall && !WARN_ON_ONCE(rcu_exit)) {
instrumentation_begin();
idtentry_exit_cond_resched(regs, true);
instrumentation_end();
restore_inhcall(inhcall);
} else {
idtentry_exit_cond_rcu(regs, rcu_exit);
}
}
#endif /* CONFIG_XEN_PV */

View File

@ -44,40 +44,13 @@
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/frame.h>
#include <asm/trapnr.h>
#include <asm/nospec-branch.h>
#include "calling.h"
.section .entry.text, "ax"
/*
* We use macros for low-level operations which need to be overridden
* for paravirtualization. The following will never clobber any registers:
* INTERRUPT_RETURN (aka. "iret")
* GET_CR0_INTO_EAX (aka. "movl %cr0, %eax")
* ENABLE_INTERRUPTS_SYSEXIT (aka "sti; sysexit").
*
* For DISABLE_INTERRUPTS/ENABLE_INTERRUPTS (aka "cli"/"sti"), you must
* specify what registers can be overwritten (CLBR_NONE, CLBR_EAX/EDX/ECX/ANY).
* Allowing a register to be clobbered can shrink the paravirt replacement
* enough to patch inline, increasing performance.
*/
#ifdef CONFIG_PREEMPTION
# define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
#else
# define preempt_stop(clobbers)
#endif
.macro TRACE_IRQS_IRET
#ifdef CONFIG_TRACE_IRQFLAGS
testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off?
jz 1f
TRACE_IRQS_ON
1:
#endif
.endm
#define PTI_SWITCH_MASK (1 << PAGE_SHIFT)
/*
@ -726,10 +699,68 @@
.Lend_\@:
.endm
/**
* idtentry - Macro to generate entry stubs for simple IDT entries
* @vector: Vector number
* @asmsym: ASM symbol for the entry point
* @cfunc: C function to be called
* @has_error_code: Hardware pushed error code on stack
*/
.macro idtentry vector asmsym cfunc has_error_code:req
SYM_CODE_START(\asmsym)
ASM_CLAC
cld
.if \has_error_code == 0
pushl $0 /* Clear the error code */
.endif
/* Push the C-function address into the GS slot */
pushl $\cfunc
/* Invoke the common exception entry */
jmp handle_exception
SYM_CODE_END(\asmsym)
.endm
.macro idtentry_irq vector cfunc
.p2align CONFIG_X86_L1_CACHE_SHIFT
SYM_CODE_START_LOCAL(asm_\cfunc)
ASM_CLAC
SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
movl %esp, %eax
movl PT_ORIG_EAX(%esp), %edx /* get the vector from stack */
movl $-1, PT_ORIG_EAX(%esp) /* no syscall to restart */
call \cfunc
jmp handle_exception_return
SYM_CODE_END(asm_\cfunc)
.endm
.macro idtentry_sysvec vector cfunc
idtentry \vector asm_\cfunc \cfunc has_error_code=0
.endm
/*
* Include the defines which emit the idt entries which are shared
* shared between 32 and 64 bit and emit the __irqentry_text_* markers
* so the stacktrace boundary checks work.
*/
.align 16
.globl __irqentry_text_start
__irqentry_text_start:
#include <asm/idtentry.h>
.align 16
.globl __irqentry_text_end
__irqentry_text_end:
/*
* %eax: prev task
* %edx: next task
*/
.pushsection .text, "ax"
SYM_CODE_START(__switch_to_asm)
/*
* Save callee-saved registers
@ -776,6 +807,7 @@ SYM_CODE_START(__switch_to_asm)
jmp __switch_to
SYM_CODE_END(__switch_to_asm)
.popsection
/*
* The unwinder expects the last frame on the stack to always be at the same
@ -784,6 +816,7 @@ SYM_CODE_END(__switch_to_asm)
* asmlinkage function so its argument has to be pushed on the stack. This
* wrapper creates a proper "end of stack" frame header before the call.
*/
.pushsection .text, "ax"
SYM_FUNC_START(schedule_tail_wrapper)
FRAME_BEGIN
@ -794,6 +827,8 @@ SYM_FUNC_START(schedule_tail_wrapper)
FRAME_END
ret
SYM_FUNC_END(schedule_tail_wrapper)
.popsection
/*
* A newly forked process directly context switches into this address.
*
@ -801,6 +836,7 @@ SYM_FUNC_END(schedule_tail_wrapper)
* ebx: kernel thread func (NULL for user thread)
* edi: kernel thread arg
*/
.pushsection .text, "ax"
SYM_CODE_START(ret_from_fork)
call schedule_tail_wrapper
@ -811,8 +847,7 @@ SYM_CODE_START(ret_from_fork)
/* When we fork, we trace the syscall return in the child, too. */
movl %esp, %eax
call syscall_return_slowpath
STACKLEAK_ERASE
jmp restore_all
jmp .Lsyscall_32_done
/* kernel thread */
1: movl %edi, %eax
@ -825,38 +860,7 @@ SYM_CODE_START(ret_from_fork)
movl $0, PT_EAX(%esp)
jmp 2b
SYM_CODE_END(ret_from_fork)
/*
* Return to user mode is not as complex as all this looks,
* but we want the default path for a system call return to
* go as quickly as possible which is why some of this is
* less clear than it otherwise should be.
*/
# userspace resumption stub bypassing syscall exit tracing
SYM_CODE_START_LOCAL(ret_from_exception)
preempt_stop(CLBR_ANY)
ret_from_intr:
#ifdef CONFIG_VM86
movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
movb PT_CS(%esp), %al
andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
#else
/*
* We can be coming here from child spawned by kernel_thread().
*/
movl PT_CS(%esp), %eax
andl $SEGMENT_RPL_MASK, %eax
#endif
cmpl $USER_RPL, %eax
jb restore_all_kernel # not returning to v8086 or userspace
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
movl %esp, %eax
call prepare_exit_to_usermode
jmp restore_all
SYM_CODE_END(ret_from_exception)
.popsection
SYM_ENTRY(__begin_SYSENTER_singlestep_region, SYM_L_GLOBAL, SYM_A_NONE)
/*
@ -960,12 +964,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
jnz .Lsysenter_fix_flags
.Lsysenter_flags_fixed:
/*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
*/
TRACE_IRQS_OFF
movl %esp, %eax
call do_fast_syscall_32
/* XEN PV guests always use IRET path */
@ -974,8 +972,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
STACKLEAK_ERASE
/* Opportunistic SYSEXIT */
TRACE_IRQS_ON /* User mode traces as IRQs on. */
/* Opportunistic SYSEXIT */
/*
* Setup entry stack - we keep the pointer in %eax and do the
@ -1075,20 +1072,12 @@ SYM_FUNC_START(entry_INT80_32)
SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1 /* save rest */
/*
* User mode is traced as though IRQs are on, and the interrupt gate
* turned them off.
*/
TRACE_IRQS_OFF
movl %esp, %eax
call do_int80_syscall_32
.Lsyscall_32_done:
STACKLEAK_ERASE
restore_all:
TRACE_IRQS_ON
restore_all_switch_stack:
SWITCH_TO_ENTRY_STACK
CHECK_AND_APPLY_ESPFIX
@ -1107,26 +1096,10 @@ restore_all:
*/
INTERRUPT_RETURN
restore_all_kernel:
#ifdef CONFIG_PREEMPTION
DISABLE_INTERRUPTS(CLBR_ANY)
cmpl $0, PER_CPU_VAR(__preempt_count)
jnz .Lno_preempt
testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ?
jz .Lno_preempt
call preempt_schedule_irq
.Lno_preempt:
#endif
TRACE_IRQS_IRET
PARANOID_EXIT_TO_KERNEL_MODE
BUG_IF_WRONG_CR3
RESTORE_REGS 4
jmp .Lirq_return
.section .fixup, "ax"
SYM_CODE_START(iret_exc)
SYM_CODE_START(asm_iret_error)
pushl $0 # no error code
pushl $do_iret_error
pushl $iret_error
#ifdef CONFIG_DEBUG_ENTRY
/*
@ -1140,10 +1113,10 @@ SYM_CODE_START(iret_exc)
popl %eax
#endif
jmp common_exception
SYM_CODE_END(iret_exc)
jmp handle_exception
SYM_CODE_END(asm_iret_error)
.previous
_ASM_EXTABLE(.Lirq_return, iret_exc)
_ASM_EXTABLE(.Lirq_return, asm_iret_error)
SYM_FUNC_END(entry_INT80_32)
.macro FIXUP_ESPFIX_STACK
@ -1193,192 +1166,21 @@ SYM_FUNC_END(entry_INT80_32)
#endif
.endm
/*
* Build the entry stubs with some assembler magic.
* We pack 1 stub into every 8-byte block.
*/
.align 8
SYM_CODE_START(irq_entries_start)
vector=FIRST_EXTERNAL_VECTOR
.rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
pushl $(~vector+0x80) /* Note: always in signed byte range */
vector=vector+1
jmp common_interrupt
.align 8
.endr
SYM_CODE_END(irq_entries_start)
#ifdef CONFIG_X86_LOCAL_APIC
.align 8
SYM_CODE_START(spurious_entries_start)
vector=FIRST_SYSTEM_VECTOR
.rept (NR_VECTORS - FIRST_SYSTEM_VECTOR)
pushl $(~vector+0x80) /* Note: always in signed byte range */
vector=vector+1
jmp common_spurious
.align 8
.endr
SYM_CODE_END(spurious_entries_start)
SYM_CODE_START_LOCAL(common_spurious)
ASM_CLAC
addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */
SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
movl %esp, %eax
call smp_spurious_interrupt
jmp ret_from_intr
SYM_CODE_END(common_spurious)
#endif
/*
* the CPU automatically disables interrupts when executing an IRQ vector,
* so IRQ-flags tracing has to follow that:
*/
.p2align CONFIG_X86_L1_CACHE_SHIFT
SYM_CODE_START_LOCAL(common_interrupt)
ASM_CLAC
addl $-0x80, (%esp) /* Adjust vector into the [-256, -1] range */
SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
movl %esp, %eax
call do_IRQ
jmp ret_from_intr
SYM_CODE_END(common_interrupt)
#define BUILD_INTERRUPT3(name, nr, fn) \
SYM_FUNC_START(name) \
ASM_CLAC; \
pushl $~(nr); \
SAVE_ALL switch_stacks=1; \
ENCODE_FRAME_POINTER; \
TRACE_IRQS_OFF \
movl %esp, %eax; \
call fn; \
jmp ret_from_intr; \
SYM_FUNC_END(name)
#define BUILD_INTERRUPT(name, nr) \
BUILD_INTERRUPT3(name, nr, smp_##name); \
/* The include is where all of the SMP etc. interrupts come from */
#include <asm/entry_arch.h>
SYM_CODE_START(coprocessor_error)
ASM_CLAC
pushl $0
pushl $do_coprocessor_error
jmp common_exception
SYM_CODE_END(coprocessor_error)
SYM_CODE_START(simd_coprocessor_error)
ASM_CLAC
pushl $0
#ifdef CONFIG_X86_INVD_BUG
/* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
ALTERNATIVE "pushl $do_general_protection", \
"pushl $do_simd_coprocessor_error", \
X86_FEATURE_XMM
#else
pushl $do_simd_coprocessor_error
#endif
jmp common_exception
SYM_CODE_END(simd_coprocessor_error)
SYM_CODE_START(device_not_available)
ASM_CLAC
pushl $0
pushl $do_device_not_available
jmp common_exception
SYM_CODE_END(device_not_available)
#ifdef CONFIG_PARAVIRT
SYM_CODE_START(native_iret)
iret
_ASM_EXTABLE(native_iret, iret_exc)
_ASM_EXTABLE(native_iret, asm_iret_error)
SYM_CODE_END(native_iret)
#endif
SYM_CODE_START(overflow)
ASM_CLAC
pushl $0
pushl $do_overflow
jmp common_exception
SYM_CODE_END(overflow)
SYM_CODE_START(bounds)
ASM_CLAC
pushl $0
pushl $do_bounds
jmp common_exception
SYM_CODE_END(bounds)
SYM_CODE_START(invalid_op)
ASM_CLAC
pushl $0
pushl $do_invalid_op
jmp common_exception
SYM_CODE_END(invalid_op)
SYM_CODE_START(coprocessor_segment_overrun)
ASM_CLAC
pushl $0
pushl $do_coprocessor_segment_overrun
jmp common_exception
SYM_CODE_END(coprocessor_segment_overrun)
SYM_CODE_START(invalid_TSS)
ASM_CLAC
pushl $do_invalid_TSS
jmp common_exception
SYM_CODE_END(invalid_TSS)
SYM_CODE_START(segment_not_present)
ASM_CLAC
pushl $do_segment_not_present
jmp common_exception
SYM_CODE_END(segment_not_present)
SYM_CODE_START(stack_segment)
ASM_CLAC
pushl $do_stack_segment
jmp common_exception
SYM_CODE_END(stack_segment)
SYM_CODE_START(alignment_check)
ASM_CLAC
pushl $do_alignment_check
jmp common_exception
SYM_CODE_END(alignment_check)
SYM_CODE_START(divide_error)
ASM_CLAC
pushl $0 # no error code
pushl $do_divide_error
jmp common_exception
SYM_CODE_END(divide_error)
#ifdef CONFIG_X86_MCE
SYM_CODE_START(machine_check)
ASM_CLAC
pushl $0
pushl $do_mce
jmp common_exception
SYM_CODE_END(machine_check)
#endif
SYM_CODE_START(spurious_interrupt_bug)
ASM_CLAC
pushl $0
pushl $do_spurious_interrupt_bug
jmp common_exception
SYM_CODE_END(spurious_interrupt_bug)
#ifdef CONFIG_XEN_PV
SYM_FUNC_START(xen_hypervisor_callback)
/*
* See comment in entry_64.S for further explanation
*
* Note: This is not an actual IDT entry point. It's a XEN specific entry
* point and therefore named to match the 64-bit trampoline counterpart.
*/
SYM_FUNC_START(xen_asm_exc_xen_hypervisor_callback)
/*
* Check to see if we got the event in the critical
* region in xen_iret_direct, after we've reenabled
@ -1395,14 +1197,11 @@ SYM_FUNC_START(xen_hypervisor_callback)
pushl $-1 /* orig_ax = -1 => not a system call */
SAVE_ALL
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
mov %esp, %eax
call xen_evtchn_do_upcall
#ifndef CONFIG_PREEMPTION
call xen_maybe_preempt_hcall
#endif
jmp ret_from_intr
SYM_FUNC_END(xen_hypervisor_callback)
call xen_pv_evtchn_do_upcall
jmp handle_exception_return
SYM_FUNC_END(xen_asm_exc_xen_hypervisor_callback)
/*
* Hypervisor uses this for application faults while it executes.
@ -1429,11 +1228,11 @@ SYM_FUNC_START(xen_failsafe_callback)
popl %eax
lea 16(%esp), %esp
jz 5f
jmp iret_exc
jmp asm_iret_error
5: pushl $-1 /* orig_ax = -1 => not a system call */
SAVE_ALL
ENCODE_FRAME_POINTER
jmp ret_from_exception
jmp handle_exception_return
.section .fixup, "ax"
6: xorl %eax, %eax
@ -1456,56 +1255,7 @@ SYM_FUNC_START(xen_failsafe_callback)
SYM_FUNC_END(xen_failsafe_callback)
#endif /* CONFIG_XEN_PV */
#ifdef CONFIG_XEN_PVHVM
BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
xen_evtchn_do_upcall)
#endif
#if IS_ENABLED(CONFIG_HYPERV)
BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
hyperv_vector_handler)
BUILD_INTERRUPT3(hyperv_reenlightenment_vector, HYPERV_REENLIGHTENMENT_VECTOR,
hyperv_reenlightenment_intr)
BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
hv_stimer0_vector_handler)
#endif /* CONFIG_HYPERV */
SYM_CODE_START(page_fault)
ASM_CLAC
pushl $do_page_fault
jmp common_exception_read_cr2
SYM_CODE_END(page_fault)
SYM_CODE_START_LOCAL_NOALIGN(common_exception_read_cr2)
/* the function address is in %gs's slot on the stack */
SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
ENCODE_FRAME_POINTER
/* fixup %gs */
GS_TO_REG %ecx
movl PT_GS(%esp), %edi
REG_TO_PTGS %ecx
SET_KERNEL_GS %ecx
GET_CR2_INTO(%ecx) # might clobber %eax
/* fixup orig %eax */
movl PT_ORIG_EAX(%esp), %edx # get the error code
movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception_read_cr2)
SYM_CODE_START_LOCAL_NOALIGN(common_exception)
SYM_CODE_START_LOCAL_NOALIGN(handle_exception)
/* the function address is in %gs's slot on the stack */
SAVE_ALL switch_stacks=1 skip_gs=1 unwind_espfix=1
ENCODE_FRAME_POINTER
@ -1520,23 +1270,35 @@ SYM_CODE_START_LOCAL_NOALIGN(common_exception)
movl PT_ORIG_EAX(%esp), %edx # get the error code
movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
CALL_NOSPEC edi
jmp ret_from_exception
SYM_CODE_END(common_exception)
SYM_CODE_START(debug)
handle_exception_return:
#ifdef CONFIG_VM86
movl PT_EFLAGS(%esp), %eax # mix EFLAGS and CS
movb PT_CS(%esp), %al
andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
#else
/*
* Entry from sysenter is now handled in common_exception
* We can be coming here from child spawned by kernel_thread().
*/
ASM_CLAC
pushl $0
pushl $do_debug
jmp common_exception
SYM_CODE_END(debug)
movl PT_CS(%esp), %eax
andl $SEGMENT_RPL_MASK, %eax
#endif
cmpl $USER_RPL, %eax # returning to v8086 or userspace ?
jnb ret_to_user
SYM_CODE_START(double_fault)
PARANOID_EXIT_TO_KERNEL_MODE
BUG_IF_WRONG_CR3
RESTORE_REGS 4
jmp .Lirq_return
ret_to_user:
movl %esp, %eax
jmp restore_all_switch_stack
SYM_CODE_END(handle_exception)
SYM_CODE_START(asm_exc_double_fault)
1:
/*
* This is a task gate handler, not an interrupt gate handler.
@ -1574,7 +1336,7 @@ SYM_CODE_START(double_fault)
1:
hlt
jmp 1b
SYM_CODE_END(double_fault)
SYM_CODE_END(asm_exc_double_fault)
/*
* NMI is doubly nasty. It can happen on the first instruction of
@ -1583,7 +1345,7 @@ SYM_CODE_END(double_fault)
* switched stacks. We handle both conditions by simply checking whether we
* interrupted kernel code running on the SYSENTER stack.
*/
SYM_CODE_START(nmi)
SYM_CODE_START(asm_exc_nmi)
ASM_CLAC
#ifdef CONFIG_X86_ESPFIX32
@ -1612,7 +1374,7 @@ SYM_CODE_START(nmi)
jb .Lnmi_from_sysenter_stack
/* Not on SYSENTER stack. */
call do_nmi
call exc_nmi
jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
@ -1622,7 +1384,7 @@ SYM_CODE_START(nmi)
*/
movl %esp, %ebx
movl PER_CPU_VAR(cpu_current_top_of_stack), %esp
call do_nmi
call exc_nmi
movl %ebx, %esp
.Lnmi_return:
@ -1676,21 +1438,9 @@ SYM_CODE_START(nmi)
lss (1+5+6)*4(%esp), %esp # back to espfix stack
jmp .Lirq_return
#endif
SYM_CODE_END(nmi)
SYM_CODE_START(int3)
ASM_CLAC
pushl $0
pushl $do_int3
jmp common_exception
SYM_CODE_END(int3)
SYM_CODE_START(general_protection)
ASM_CLAC
pushl $do_general_protection
jmp common_exception
SYM_CODE_END(general_protection)
SYM_CODE_END(asm_exc_nmi)
.pushsection .text, "ax"
SYM_CODE_START(rewind_stack_do_exit)
/* Prevent any naive code from trying to unwind to our caller. */
xorl %ebp, %ebp
@ -1701,3 +1451,4 @@ SYM_CODE_START(rewind_stack_do_exit)
call do_exit
1: jmp 1b
SYM_CODE_END(rewind_stack_do_exit)
.popsection

File diff suppressed because it is too large Load Diff

View File

@ -46,12 +46,14 @@
* ebp user stack
* 0(%ebp) arg6
*/
SYM_FUNC_START(entry_SYSENTER_compat)
SYM_CODE_START(entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
/* Interrupts are off on entry. */
SWAPGS
/* We are about to clobber %rsp anyway, clobbering here is OK */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
pushq %rax
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
popq %rax
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
@ -104,6 +106,9 @@ SYM_FUNC_START(entry_SYSENTER_compat)
xorl %r14d, %r14d /* nospec r14 */
pushq $0 /* pt_regs->r15 = 0 */
xorl %r15d, %r15d /* nospec r15 */
UNWIND_HINT_REGS
cld
/*
@ -129,17 +134,11 @@ SYM_FUNC_START(entry_SYSENTER_compat)
jnz .Lsysenter_fix_flags
.Lsysenter_flags_fixed:
/*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
*/
TRACE_IRQS_OFF
movq %rsp, %rdi
call do_fast_syscall_32
/* XEN PV guests always use IRET path */
ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
"jmp .Lsyscall_32_done", X86_FEATURE_XENPV
ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
"jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
jmp sysret32_from_system_call
.Lsysenter_fix_flags:
@ -147,7 +146,7 @@ SYM_FUNC_START(entry_SYSENTER_compat)
popfq
jmp .Lsysenter_flags_fixed
SYM_INNER_LABEL(__end_entry_SYSENTER_compat, SYM_L_GLOBAL)
SYM_FUNC_END(entry_SYSENTER_compat)
SYM_CODE_END(entry_SYSENTER_compat)
/*
* 32-bit SYSCALL entry.
@ -197,6 +196,7 @@ SYM_FUNC_END(entry_SYSENTER_compat)
* 0(%esp) arg6
*/
SYM_CODE_START(entry_SYSCALL_compat)
UNWIND_HINT_EMPTY
/* Interrupts are off on entry. */
swapgs
@ -247,17 +247,13 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
pushq $0 /* pt_regs->r15 = 0 */
xorl %r15d, %r15d /* nospec r15 */
/*
* User mode is traced as though IRQs are on, and SYSENTER
* turned them off.
*/
TRACE_IRQS_OFF
UNWIND_HINT_REGS
movq %rsp, %rdi
call do_fast_syscall_32
/* XEN PV guests always use IRET path */
ALTERNATIVE "testl %eax, %eax; jz .Lsyscall_32_done", \
"jmp .Lsyscall_32_done", X86_FEATURE_XENPV
ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \
"jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV
/* Opportunistic SYSRET */
sysret32_from_system_call:
@ -266,7 +262,7 @@ sysret32_from_system_call:
* stack. So let's erase the thread stack right now.
*/
STACKLEAK_ERASE
TRACE_IRQS_ON /* User mode traces as IRQs on. */
movq RBX(%rsp), %rbx /* pt_regs->rbx */
movq RBP(%rsp), %rbp /* pt_regs->rbp */
movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */
@ -340,6 +336,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
* ebp arg6
*/
SYM_CODE_START(entry_INT80_compat)
UNWIND_HINT_EMPTY
/*
* Interrupts are off on entry.
*/
@ -361,8 +358,11 @@ SYM_CODE_START(entry_INT80_compat)
/* Need to switch before accessing the thread stack. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
/* In the Xen PV case we already run on the thread stack. */
ALTERNATIVE "movq %rsp, %rdi", "jmp .Lint80_keep_stack", X86_FEATURE_XENPV
ALTERNATIVE "", "jmp .Lint80_keep_stack", X86_FEATURE_XENPV
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
pushq 6*8(%rdi) /* regs->ss */
@ -401,19 +401,12 @@ SYM_CODE_START(entry_INT80_compat)
xorl %r14d, %r14d /* nospec r14 */
pushq %r15 /* pt_regs->r15 */
xorl %r15d, %r15d /* nospec r15 */
cld
/*
* User mode is traced as though IRQs are on, and the interrupt
* gate turned them off.
*/
TRACE_IRQS_OFF
UNWIND_HINT_REGS
cld
movq %rsp, %rdi
call do_int80_syscall_32
.Lsyscall_32_done:
/* Go back to user mode. */
TRACE_IRQS_ON
jmp swapgs_restore_regs_and_return_to_usermode
SYM_CODE_END(entry_INT80_compat)

View File

@ -3,7 +3,6 @@
* Save registers before calling assembly functions. This avoids
* disturbance of register allocation in some inline assembly constructs.
* Copyright 2001,2002 by Andi Kleen, SuSE Labs.
* Added trace_hardirqs callers - Copyright 2007 Steven Rostedt, Red Hat, Inc.
*/
#include <linux/linkage.h>
#include "calling.h"
@ -37,15 +36,6 @@ SYM_FUNC_END(\name)
_ASM_NOKPROBE(\name)
.endm
#ifdef CONFIG_TRACE_IRQFLAGS
THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
THUNK lockdep_sys_exit_thunk,lockdep_sys_exit
#endif
#ifdef CONFIG_PREEMPTION
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
@ -53,9 +43,7 @@ SYM_FUNC_END(\name)
EXPORT_SYMBOL(preempt_schedule_notrace_thunk)
#endif
#if defined(CONFIG_TRACE_IRQFLAGS) \
|| defined(CONFIG_DEBUG_LOCK_ALLOC) \
|| defined(CONFIG_PREEMPTION)
#ifdef CONFIG_PREEMPTION
SYM_CODE_START_LOCAL_NOALIGN(.L_restore)
popq %r11
popq %r10

View File

@ -15,6 +15,7 @@
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/idtentry.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
@ -152,15 +153,11 @@ static inline bool hv_reenlightenment_available(void)
ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT;
}
__visible void __irq_entry hyperv_reenlightenment_intr(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_reenlightenment)
{
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(irq_hv_reenlightenment_count);
schedule_delayed_work(&hv_reenlightenment_work, HZ/10);
exiting_irq();
}
void set_hv_tscchange_cb(void (*cb)(void))

View File

@ -1,11 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_ACRN_H
#define _ASM_X86_ACRN_H
extern void acrn_hv_callback_vector(void);
#ifdef CONFIG_TRACING
#define trace_acrn_hv_callback_vector acrn_hv_callback_vector
#endif
extern void acrn_hv_vector_handler(struct pt_regs *regs);
#endif /* _ASM_X86_ACRN_H */

View File

@ -519,39 +519,6 @@ static inline bool apic_id_is_primary_thread(unsigned int id) { return false; }
static inline void apic_smt_update(void) { }
#endif
extern void irq_enter(void);
extern void irq_exit(void);
static inline void entering_irq(void)
{
irq_enter();
kvm_set_cpu_l1tf_flush_l1d();
}
static inline void entering_ack_irq(void)
{
entering_irq();
ack_APIC_irq();
}
static inline void ipi_entering_ack_irq(void)
{
irq_enter();
ack_APIC_irq();
kvm_set_cpu_l1tf_flush_l1d();
}
static inline void exiting_irq(void)
{
irq_exit();
}
static inline void exiting_ack_irq(void)
{
ack_APIC_irq();
irq_exit();
}
extern void ioapic_zap_locks(void);
#endif /* _ASM_X86_APIC_H */

View File

@ -205,13 +205,13 @@ static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int n
}
#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
static inline int arch_atomic_xchg(atomic_t *v, int new)
static __always_inline int arch_atomic_xchg(atomic_t *v, int new)
{
return arch_xchg(&v->counter, new);
}
#define arch_atomic_xchg arch_atomic_xchg
static inline void arch_atomic_and(int i, atomic_t *v)
static __always_inline void arch_atomic_and(int i, atomic_t *v)
{
asm volatile(LOCK_PREFIX "andl %1,%0"
: "+m" (v->counter)
@ -219,7 +219,7 @@ static inline void arch_atomic_and(int i, atomic_t *v)
: "memory");
}
static inline int arch_atomic_fetch_and(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_and(int i, atomic_t *v)
{
int val = arch_atomic_read(v);
@ -229,7 +229,7 @@ static inline int arch_atomic_fetch_and(int i, atomic_t *v)
}
#define arch_atomic_fetch_and arch_atomic_fetch_and
static inline void arch_atomic_or(int i, atomic_t *v)
static __always_inline void arch_atomic_or(int i, atomic_t *v)
{
asm volatile(LOCK_PREFIX "orl %1,%0"
: "+m" (v->counter)
@ -237,7 +237,7 @@ static inline void arch_atomic_or(int i, atomic_t *v)
: "memory");
}
static inline int arch_atomic_fetch_or(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v)
{
int val = arch_atomic_read(v);
@ -247,7 +247,7 @@ static inline int arch_atomic_fetch_or(int i, atomic_t *v)
}
#define arch_atomic_fetch_or arch_atomic_fetch_or
static inline void arch_atomic_xor(int i, atomic_t *v)
static __always_inline void arch_atomic_xor(int i, atomic_t *v)
{
asm volatile(LOCK_PREFIX "xorl %1,%0"
: "+m" (v->counter)
@ -255,7 +255,7 @@ static inline void arch_atomic_xor(int i, atomic_t *v)
: "memory");
}
static inline int arch_atomic_fetch_xor(int i, atomic_t *v)
static __always_inline int arch_atomic_fetch_xor(int i, atomic_t *v)
{
int val = arch_atomic_read(v);

View File

@ -70,14 +70,17 @@ do { \
#define HAVE_ARCH_BUG
#define BUG() \
do { \
instrumentation_begin(); \
_BUG_FLAGS(ASM_UD2, 0); \
unreachable(); \
} while (0)
#define __WARN_FLAGS(flags) \
do { \
instrumentation_begin(); \
_BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags)); \
annotate_reachable(); \
instrumentation_end(); \
} while (0)
#include <asm-generic/bug.h>

View File

@ -11,15 +11,11 @@
#ifdef CONFIG_X86_64
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, db2_holesize)\
#define ESTACKS_MEMBERS(guardsize) \
char DF_stack_guard[guardsize]; \
char DF_stack[EXCEPTION_STKSZ]; \
char NMI_stack_guard[guardsize]; \
char NMI_stack[EXCEPTION_STKSZ]; \
char DB2_stack_guard[guardsize]; \
char DB2_stack[db2_holesize]; \
char DB1_stack_guard[guardsize]; \
char DB1_stack[EXCEPTION_STKSZ]; \
char DB_stack_guard[guardsize]; \
char DB_stack[EXCEPTION_STKSZ]; \
char MCE_stack_guard[guardsize]; \
@ -28,12 +24,12 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
ESTACKS_MEMBERS(0, 0)
ESTACKS_MEMBERS(0)
};
/* The effective cpu entry area mapping with guard pages. */
struct cea_exception_stacks {
ESTACKS_MEMBERS(PAGE_SIZE, EXCEPTION_STKSZ)
ESTACKS_MEMBERS(PAGE_SIZE)
};
/*
@ -42,8 +38,6 @@ struct cea_exception_stacks {
enum exception_stack_ordering {
ESTACK_DF,
ESTACK_NMI,
ESTACK_DB2,
ESTACK_DB1,
ESTACK_DB,
ESTACK_MCE,
N_EXCEPTION_STACKS

View File

@ -18,7 +18,7 @@ DECLARE_PER_CPU(unsigned long, cpu_dr7);
native_set_debugreg(register, value)
#endif
static inline unsigned long native_get_debugreg(int regno)
static __always_inline unsigned long native_get_debugreg(int regno)
{
unsigned long val = 0; /* Damn you, gcc! */
@ -47,7 +47,7 @@ static inline unsigned long native_get_debugreg(int regno)
return val;
}
static inline void native_set_debugreg(int regno, unsigned long value)
static __always_inline void native_set_debugreg(int regno, unsigned long value)
{
switch (regno) {
case 0:
@ -85,7 +85,7 @@ static inline void hw_breakpoint_disable(void)
set_debugreg(0UL, 3);
}
static inline int hw_breakpoint_active(void)
static __always_inline bool hw_breakpoint_active(void)
{
return __this_cpu_read(cpu_dr7) & DR_GLOBAL_ENABLE_MASK;
}
@ -94,24 +94,38 @@ extern void aout_dump_debugregs(struct user *dump);
extern void hw_breakpoint_restore(void);
#ifdef CONFIG_X86_64
DECLARE_PER_CPU(int, debug_stack_usage);
static inline void debug_stack_usage_inc(void)
static __always_inline unsigned long local_db_save(void)
{
__this_cpu_inc(debug_stack_usage);
unsigned long dr7;
if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
return 0;
get_debugreg(dr7, 7);
dr7 &= ~0x400; /* architecturally set bit */
if (dr7)
set_debugreg(0, 7);
/*
* Ensure the compiler doesn't lower the above statements into
* the critical section; disabling breakpoints late would not
* be good.
*/
barrier();
return dr7;
}
static inline void debug_stack_usage_dec(void)
static __always_inline void local_db_restore(unsigned long dr7)
{
__this_cpu_dec(debug_stack_usage);
/*
* Ensure the compiler doesn't raise this statement into
* the critical section; enabling breakpoints early would
* not be good.
*/
barrier();
if (dr7)
set_debugreg(dr7, 7);
}
void debug_stack_set_zero(void);
void debug_stack_reset(void);
#else /* !X86_64 */
static inline void debug_stack_set_zero(void) { }
static inline void debug_stack_reset(void) { }
static inline void debug_stack_usage_inc(void) { }
static inline void debug_stack_usage_dec(void) { }
#endif /* X86_64 */
#ifdef CONFIG_CPU_SUP_AMD
extern void set_dr_addr_mask(unsigned long mask, int dr);

View File

@ -40,11 +40,6 @@ static inline void fill_ldt(struct desc_struct *desc, const struct user_desc *in
desc->l = 0;
}
extern struct desc_ptr idt_descr;
extern gate_desc idt_table[];
extern const struct desc_ptr debug_idt_descr;
extern gate_desc debug_idt_table[];
struct gdt_page {
struct desc_struct gdt[GDT_ENTRIES];
} __attribute__((aligned(PAGE_SIZE)));
@ -214,7 +209,7 @@ static inline void native_load_gdt(const struct desc_ptr *dtr)
asm volatile("lgdt %0"::"m" (*dtr));
}
static inline void native_load_idt(const struct desc_ptr *dtr)
static __always_inline void native_load_idt(const struct desc_ptr *dtr)
{
asm volatile("lidt %0"::"m" (*dtr));
}
@ -386,64 +381,23 @@ static inline void set_desc_limit(struct desc_struct *desc, unsigned long limit)
desc->limit1 = (limit >> 16) & 0xf;
}
void update_intr_gate(unsigned int n, const void *addr);
void alloc_intr_gate(unsigned int n, const void *addr);
extern unsigned long system_vectors[];
#ifdef CONFIG_X86_64
DECLARE_PER_CPU(u32, debug_idt_ctr);
static inline bool is_debug_idt_enabled(void)
{
if (this_cpu_read(debug_idt_ctr))
return true;
return false;
}
static inline void load_debug_idt(void)
{
load_idt((const struct desc_ptr *)&debug_idt_descr);
}
#else
static inline bool is_debug_idt_enabled(void)
{
return false;
}
static inline void load_debug_idt(void)
{
}
#endif
/*
* The load_current_idt() must be called with interrupts disabled
* to avoid races. That way the IDT will always be set back to the expected
* descriptor. It's also called when a CPU is being initialized, and
* that doesn't need to disable interrupts, as nothing should be
* bothering the CPU then.
*/
static inline void load_current_idt(void)
{
if (is_debug_idt_enabled())
load_debug_idt();
else
load_idt((const struct desc_ptr *)&idt_descr);
}
extern void load_current_idt(void);
extern void idt_setup_early_handler(void);
extern void idt_setup_early_traps(void);
extern void idt_setup_traps(void);
extern void idt_setup_apic_and_irq_gates(void);
extern bool idt_is_f00f_address(unsigned long address);
#ifdef CONFIG_X86_64
extern void idt_setup_early_pf(void);
extern void idt_setup_ist_traps(void);
extern void idt_setup_debugidt_traps(void);
#else
static inline void idt_setup_early_pf(void) { }
static inline void idt_setup_ist_traps(void) { }
static inline void idt_setup_debugidt_traps(void) { }
#endif
extern void idt_invalidate(void *addr);

View File

@ -1,56 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* This file is designed to contain the BUILD_INTERRUPT specifications for
* all of the extra named interrupt vectors used by the architecture.
* Usually this is the Inter Process Interrupts (IPIs)
*/
/*
* The following vectors are part of the Linux architecture, there
* is no hardware IRQ pin equivalent for them, they are triggered
* through the ICC by us (IPIs)
*/
#ifdef CONFIG_SMP
BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
BUILD_INTERRUPT(irq_move_cleanup_interrupt, IRQ_MOVE_CLEANUP_VECTOR)
BUILD_INTERRUPT(reboot_interrupt, REBOOT_VECTOR)
#endif
#ifdef CONFIG_HAVE_KVM
BUILD_INTERRUPT(kvm_posted_intr_ipi, POSTED_INTR_VECTOR)
BUILD_INTERRUPT(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR)
BUILD_INTERRUPT(kvm_posted_intr_nested_ipi, POSTED_INTR_NESTED_VECTOR)
#endif
/*
* every pentium local APIC has two 'local interrupts', with a
* soft-definable vector attached to both interrupts, one of
* which is a timer interrupt, the other one is error counter
* overflow. Linux uses the local APIC timer interrupt to get
* a much simpler SMP time architecture:
*/
#ifdef CONFIG_X86_LOCAL_APIC
BUILD_INTERRUPT(apic_timer_interrupt,LOCAL_TIMER_VECTOR)
BUILD_INTERRUPT(error_interrupt,ERROR_APIC_VECTOR)
BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR)
BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
#ifdef CONFIG_IRQ_WORK
BUILD_INTERRUPT(irq_work_interrupt, IRQ_WORK_VECTOR)
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
#endif
#ifdef CONFIG_X86_MCE_AMD
BUILD_INTERRUPT(deferred_error_interrupt, DEFERRED_ERROR_VECTOR)
#endif
#endif

View File

@ -28,28 +28,6 @@
#include <asm/irq.h>
#include <asm/sections.h>
/* Interrupt handlers registered during init_IRQ */
extern asmlinkage void apic_timer_interrupt(void);
extern asmlinkage void x86_platform_ipi(void);
extern asmlinkage void kvm_posted_intr_ipi(void);
extern asmlinkage void kvm_posted_intr_wakeup_ipi(void);
extern asmlinkage void kvm_posted_intr_nested_ipi(void);
extern asmlinkage void error_interrupt(void);
extern asmlinkage void irq_work_interrupt(void);
extern asmlinkage void uv_bau_message_intr1(void);
extern asmlinkage void spurious_interrupt(void);
extern asmlinkage void thermal_interrupt(void);
extern asmlinkage void reschedule_interrupt(void);
extern asmlinkage void irq_move_cleanup_interrupt(void);
extern asmlinkage void reboot_interrupt(void);
extern asmlinkage void threshold_interrupt(void);
extern asmlinkage void deferred_error_interrupt(void);
extern asmlinkage void call_function_interrupt(void);
extern asmlinkage void call_function_single_interrupt(void);
#ifdef CONFIG_X86_LOCAL_APIC
struct irq_data;
struct pci_dev;

View File

@ -0,0 +1,652 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_IDTENTRY_H
#define _ASM_X86_IDTENTRY_H
/* Interrupts/Exceptions */
#include <asm/trapnr.h>
#ifndef __ASSEMBLY__
#include <linux/hardirq.h>
#include <asm/irq_stack.h>
void idtentry_enter_user(struct pt_regs *regs);
void idtentry_exit_user(struct pt_regs *regs);
bool idtentry_enter_cond_rcu(struct pt_regs *regs);
void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
/**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
* No error code pushed by hardware
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Declares three functions:
* - The ASM entry point: asm_##func
* - The XEN PV trap entry point: xen_##func (maybe unused)
* - The C handler called from the ASM entry point
*
* Note: This is the C variant of DECLARE_IDTENTRY(). As the name says it
* declares the entry points for usage in C code. There is an ASM variant
* as well which is used to emit the entry stubs in entry_32/64.S.
*/
#define DECLARE_IDTENTRY(vector, func) \
asmlinkage void asm_##func(void); \
asmlinkage void xen_asm_##func(void); \
__visible void func(struct pt_regs *regs)
/**
* DEFINE_IDTENTRY - Emit code for simple IDT entry points
* @func: Function name of the entry point
*
* @func is called from ASM entry code with interrupts disabled.
*
* The macro is written so it acts as function definition. Append the
* body with a pair of curly brackets.
*
* idtentry_enter() contains common code which has to be invoked before
* arbitrary code in the body. idtentry_exit() contains common code
* which has to run before returning to the low level assembly code.
*/
#define DEFINE_IDTENTRY(func) \
static __always_inline void __##func(struct pt_regs *regs); \
\
__visible noinstr void func(struct pt_regs *regs) \
{ \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
\
instrumentation_begin(); \
__##func (regs); \
instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \
} \
\
static __always_inline void __##func(struct pt_regs *regs)
/* Special case for 32bit IRET 'trap' */
#define DECLARE_IDTENTRY_SW DECLARE_IDTENTRY
#define DEFINE_IDTENTRY_SW DEFINE_IDTENTRY
/**
* DECLARE_IDTENTRY_ERRORCODE - Declare functions for simple IDT entry points
* Error code pushed by hardware
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Declares three functions:
* - The ASM entry point: asm_##func
* - The XEN PV trap entry point: xen_##func (maybe unused)
* - The C handler called from the ASM entry point
*
* Same as DECLARE_IDTENTRY, but has an extra error_code argument for the
* C-handler.
*/
#define DECLARE_IDTENTRY_ERRORCODE(vector, func) \
asmlinkage void asm_##func(void); \
asmlinkage void xen_asm_##func(void); \
__visible void func(struct pt_regs *regs, unsigned long error_code)
/**
* DEFINE_IDTENTRY_ERRORCODE - Emit code for simple IDT entry points
* Error code pushed by hardware
* @func: Function name of the entry point
*
* Same as DEFINE_IDTENTRY, but has an extra error_code argument
*/
#define DEFINE_IDTENTRY_ERRORCODE(func) \
static __always_inline void __##func(struct pt_regs *regs, \
unsigned long error_code); \
\
__visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code) \
{ \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
\
instrumentation_begin(); \
__##func (regs, error_code); \
instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \
} \
\
static __always_inline void __##func(struct pt_regs *regs, \
unsigned long error_code)
/**
* DECLARE_IDTENTRY_RAW - Declare functions for raw IDT entry points
* No error code pushed by hardware
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Maps to DECLARE_IDTENTRY().
*/
#define DECLARE_IDTENTRY_RAW(vector, func) \
DECLARE_IDTENTRY(vector, func)
/**
* DEFINE_IDTENTRY_RAW - Emit code for raw IDT entry points
* @func: Function name of the entry point
*
* @func is called from ASM entry code with interrupts disabled.
*
* The macro is written so it acts as function definition. Append the
* body with a pair of curly brackets.
*
* Contrary to DEFINE_IDTENTRY() this does not invoke the
* idtentry_enter/exit() helpers before and after the body invocation. This
* needs to be done in the body itself if applicable. Use if extra work
* is required before the enter/exit() helpers are invoked.
*/
#define DEFINE_IDTENTRY_RAW(func) \
__visible noinstr void func(struct pt_regs *regs)
/**
* DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry points
* Error code pushed by hardware
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Maps to DECLARE_IDTENTRY_ERRORCODE()
*/
#define DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)
/**
* DEFINE_IDTENTRY_RAW_ERRORCODE - Emit code for raw IDT entry points
* @func: Function name of the entry point
*
* @func is called from ASM entry code with interrupts disabled.
*
* The macro is written so it acts as function definition. Append the
* body with a pair of curly brackets.
*
* Contrary to DEFINE_IDTENTRY_ERRORCODE() this does not invoke the
* idtentry_enter/exit() helpers before and after the body invocation. This
* needs to be done in the body itself if applicable. Use if extra work
* is required before the enter/exit() helpers are invoked.
*/
#define DEFINE_IDTENTRY_RAW_ERRORCODE(func) \
__visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
/**
* DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry
* points (common/spurious)
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Maps to DECLARE_IDTENTRY_ERRORCODE()
*/
#define DECLARE_IDTENTRY_IRQ(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)
/**
* DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
* @func: Function name of the entry point
*
* The vector number is pushed by the low level entry stub and handed
* to the function as error_code argument which needs to be truncated
* to an u8 because the push is sign extending.
*
* On 64-bit idtentry_enter/exit() are invoked in the ASM entry code before
* and after switching to the interrupt stack. On 32-bit this happens in C.
*
* irq_enter/exit_rcu() are invoked before the function body and the
* KVM L1D flush request is set.
*/
#define DEFINE_IDTENTRY_IRQ(func) \
static __always_inline void __##func(struct pt_regs *regs, u8 vector); \
\
__visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code) \
{ \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
\
instrumentation_begin(); \
irq_enter_rcu(); \
kvm_set_cpu_l1tf_flush_l1d(); \
__##func (regs, (u8)error_code); \
irq_exit_rcu(); \
instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \
} \
\
static __always_inline void __##func(struct pt_regs *regs, u8 vector)
/**
* DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Declares three functions:
* - The ASM entry point: asm_##func
* - The XEN PV trap entry point: xen_##func (maybe unused)
* - The C handler called from the ASM entry point
*
* Maps to DECLARE_IDTENTRY().
*/
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
DECLARE_IDTENTRY(vector, func)
/**
* DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
* @func: Function name of the entry point
*
* idtentry_enter/exit() and irq_enter/exit_rcu() are invoked before the
* function body. KVM L1D flush request is set.
*
* Runs the function on the interrupt stack if the entry hit kernel mode
*/
#define DEFINE_IDTENTRY_SYSVEC(func) \
static void __##func(struct pt_regs *regs); \
\
__visible noinstr void func(struct pt_regs *regs) \
{ \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
\
instrumentation_begin(); \
irq_enter_rcu(); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_on_irqstack_cond(__##func, regs, regs); \
irq_exit_rcu(); \
instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \
} \
\
static noinline void __##func(struct pt_regs *regs)
/**
* DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
* entry points
* @func: Function name of the entry point
*
* Runs the function on the interrupted stack. No switch to IRQ stack and
* only the minimal __irq_enter/exit() handling.
*
* Only use for 'empty' vectors like reschedule IPI and KVM posted
* interrupt vectors.
*/
#define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func) \
static __always_inline void __##func(struct pt_regs *regs); \
\
__visible noinstr void func(struct pt_regs *regs) \
{ \
bool rcu_exit = idtentry_enter_cond_rcu(regs); \
\
instrumentation_begin(); \
__irq_enter_raw(); \
kvm_set_cpu_l1tf_flush_l1d(); \
__##func (regs); \
__irq_exit_raw(); \
instrumentation_end(); \
idtentry_exit_cond_rcu(regs, rcu_exit); \
} \
\
static __always_inline void __##func(struct pt_regs *regs)
/**
* DECLARE_IDTENTRY_XENCB - Declare functions for XEN HV callback entry point
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Declares three functions:
* - The ASM entry point: asm_##func
* - The XEN PV trap entry point: xen_##func (maybe unused)
* - The C handler called from the ASM entry point
*
* Maps to DECLARE_IDTENTRY(). Distinct entry point to handle the 32/64-bit
* difference
*/
#define DECLARE_IDTENTRY_XENCB(vector, func) \
DECLARE_IDTENTRY(vector, func)
#ifdef CONFIG_X86_64
/**
* DECLARE_IDTENTRY_IST - Declare functions for IST handling IDT entry points
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Maps to DECLARE_IDTENTRY_RAW, but declares also the NOIST C handler
* which is called from the ASM entry point on user mode entry
*/
#define DECLARE_IDTENTRY_IST(vector, func) \
DECLARE_IDTENTRY_RAW(vector, func); \
__visible void noist_##func(struct pt_regs *regs)
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
*
* Maps to DEFINE_IDTENTRY_RAW
*/
#define DEFINE_IDTENTRY_IST(func) \
DEFINE_IDTENTRY_RAW(func)
/**
* DEFINE_IDTENTRY_NOIST - Emit code for NOIST entry points which
* belong to a IST entry point (MCE, DB)
* @func: Function name of the entry point. Must be the same as
* the function name of the corresponding IST variant
*
* Maps to DEFINE_IDTENTRY_RAW().
*/
#define DEFINE_IDTENTRY_NOIST(func) \
DEFINE_IDTENTRY_RAW(noist_##func)
/**
* DECLARE_IDTENTRY_DF - Declare functions for double fault
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Maps to DECLARE_IDTENTRY_RAW_ERRORCODE
*/
#define DECLARE_IDTENTRY_DF(vector, func) \
DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func)
/**
* DEFINE_IDTENTRY_DF - Emit code for double fault
* @func: Function name of the entry point
*
* Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
*/
#define DEFINE_IDTENTRY_DF(func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(func)
#else /* CONFIG_X86_64 */
/* Maps to a regular IDTENTRY on 32bit for now */
# define DECLARE_IDTENTRY_IST DECLARE_IDTENTRY
# define DEFINE_IDTENTRY_IST DEFINE_IDTENTRY
/**
* DECLARE_IDTENTRY_DF - Declare functions for double fault 32bit variant
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Declares two functions:
* - The ASM entry point: asm_##func
* - The C handler called from the C shim
*/
#define DECLARE_IDTENTRY_DF(vector, func) \
asmlinkage void asm_##func(void); \
__visible void func(struct pt_regs *regs, \
unsigned long error_code, \
unsigned long address)
/**
* DEFINE_IDTENTRY_DF - Emit code for double fault on 32bit
* @func: Function name of the entry point
*
* This is called through the doublefault shim which already provides
* cr2 in the address argument.
*/
#define DEFINE_IDTENTRY_DF(func) \
__visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code, \
unsigned long address)
#endif /* !CONFIG_X86_64 */
/* C-Code mapping */
#define DECLARE_IDTENTRY_MCE DECLARE_IDTENTRY_IST
#define DEFINE_IDTENTRY_MCE DEFINE_IDTENTRY_IST
#define DEFINE_IDTENTRY_MCE_USER DEFINE_IDTENTRY_NOIST
#define DECLARE_IDTENTRY_NMI DECLARE_IDTENTRY_RAW
#define DEFINE_IDTENTRY_NMI DEFINE_IDTENTRY_RAW
#define DECLARE_IDTENTRY_DEBUG DECLARE_IDTENTRY_IST
#define DEFINE_IDTENTRY_DEBUG DEFINE_IDTENTRY_IST
#define DEFINE_IDTENTRY_DEBUG_USER DEFINE_IDTENTRY_NOIST
/**
* DECLARE_IDTENTRY_XEN - Declare functions for XEN redirect IDT entry points
* @vector: Vector number (ignored for C)
* @func: Function name of the entry point
*
* Used for xennmi and xendebug redirections. No DEFINE as this is all ASM
* indirection magic.
*/
#define DECLARE_IDTENTRY_XEN(vector, func) \
asmlinkage void xen_asm_exc_xen##func(void); \
asmlinkage void asm_exc_xen##func(void)
#else /* !__ASSEMBLY__ */
/*
* The ASM variants for DECLARE_IDTENTRY*() which emit the ASM entry stubs.
*/
#define DECLARE_IDTENTRY(vector, func) \
idtentry vector asm_##func func has_error_code=0
#define DECLARE_IDTENTRY_ERRORCODE(vector, func) \
idtentry vector asm_##func func has_error_code=1
/* Special case for 32bit IRET 'trap'. Do not emit ASM code */
#define DECLARE_IDTENTRY_SW(vector, func)
#define DECLARE_IDTENTRY_RAW(vector, func) \
DECLARE_IDTENTRY(vector, func)
#define DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func) \
DECLARE_IDTENTRY_ERRORCODE(vector, func)
/* Entries for common/spurious (device) interrupts */
#define DECLARE_IDTENTRY_IRQ(vector, func) \
idtentry_irq vector func
/* System vector entries */
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
idtentry_sysvec vector func
#ifdef CONFIG_X86_64
# define DECLARE_IDTENTRY_MCE(vector, func) \
idtentry_mce_db vector asm_##func func
# define DECLARE_IDTENTRY_DEBUG(vector, func) \
idtentry_mce_db vector asm_##func func
# define DECLARE_IDTENTRY_DF(vector, func) \
idtentry_df vector asm_##func func
# define DECLARE_IDTENTRY_XENCB(vector, func) \
DECLARE_IDTENTRY(vector, func)
#else
# define DECLARE_IDTENTRY_MCE(vector, func) \
DECLARE_IDTENTRY(vector, func)
# define DECLARE_IDTENTRY_DEBUG(vector, func) \
DECLARE_IDTENTRY(vector, func)
/* No ASM emitted for DF as this goes through a C shim */
# define DECLARE_IDTENTRY_DF(vector, func)
/* No ASM emitted for XEN hypervisor callback */
# define DECLARE_IDTENTRY_XENCB(vector, func)
#endif
/* No ASM code emitted for NMI */
#define DECLARE_IDTENTRY_NMI(vector, func)
/* XEN NMI and DB wrapper */
#define DECLARE_IDTENTRY_XEN(vector, func) \
idtentry vector asm_exc_xen##func exc_##func has_error_code=0
/*
* ASM code to emit the common vector entry stubs where each stub is
* packed into 8 bytes.
*
* Note, that the 'pushq imm8' is emitted via '.byte 0x6a, vector' because
* GCC treats the local vector variable as unsigned int and would expand
* all vectors above 0x7F to a 5 byte push. The original code did an
* adjustment of the vector number to be in the signed byte range to avoid
* this. While clever it's mindboggling counterintuitive and requires the
* odd conversion back to a real vector number in the C entry points. Using
* .byte achieves the same thing and the only fixup needed in the C entry
* point is to mask off the bits above bit 7 because the push is sign
* extending.
*/
.align 8
SYM_CODE_START(irq_entries_start)
vector=FIRST_EXTERNAL_VECTOR
pos = .
.rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
UNWIND_HINT_IRET_REGS
.byte 0x6a, vector
jmp asm_common_interrupt
nop
/* Ensure that the above is 8 bytes max */
. = pos + 8
pos=pos+8
vector=vector+1
.endr
SYM_CODE_END(irq_entries_start)
#ifdef CONFIG_X86_LOCAL_APIC
.align 8
SYM_CODE_START(spurious_entries_start)
vector=FIRST_SYSTEM_VECTOR
pos = .
.rept (NR_VECTORS - FIRST_SYSTEM_VECTOR)
UNWIND_HINT_IRET_REGS
.byte 0x6a, vector
jmp asm_spurious_interrupt
nop
/* Ensure that the above is 8 bytes max */
. = pos + 8
pos=pos+8
vector=vector+1
.endr
SYM_CODE_END(spurious_entries_start)
#endif
#endif /* __ASSEMBLY__ */
/*
* The actual entry points. Note that DECLARE_IDTENTRY*() serves two
* purposes:
* - provide the function declarations when included from C-Code
* - emit the ASM stubs when included from entry_32/64.S
*
* This avoids duplicate defines and ensures that everything is consistent.
*/
/*
* Dummy trap number so the low level ASM macro vector number checks do not
* match which results in emitting plain IDTENTRY stubs without bells and
* whistels.
*/
#define X86_TRAP_OTHER 0xFFFF
/* Simple exception entry points. No hardware error code */
DECLARE_IDTENTRY(X86_TRAP_DE, exc_divide_error);
DECLARE_IDTENTRY(X86_TRAP_OF, exc_overflow);
DECLARE_IDTENTRY(X86_TRAP_BR, exc_bounds);
DECLARE_IDTENTRY(X86_TRAP_NM, exc_device_not_available);
DECLARE_IDTENTRY(X86_TRAP_OLD_MF, exc_coproc_segment_overrun);
DECLARE_IDTENTRY(X86_TRAP_SPURIOUS, exc_spurious_interrupt_bug);
DECLARE_IDTENTRY(X86_TRAP_MF, exc_coprocessor_error);
DECLARE_IDTENTRY(X86_TRAP_XF, exc_simd_coprocessor_error);
/* 32bit software IRET trap. Do not emit ASM code */
DECLARE_IDTENTRY_SW(X86_TRAP_IRET, iret_error);
/* Simple exception entries with error code pushed by hardware */
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_TS, exc_invalid_tss);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_NP, exc_segment_not_present);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS, exc_stack_segment);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP, exc_general_protection);
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check);
/* Raw exception entries which need extra work */
DECLARE_IDTENTRY_RAW(X86_TRAP_UD, exc_invalid_op);
DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3);
DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_PF, exc_page_fault);
#ifdef CONFIG_X86_MCE
DECLARE_IDTENTRY_MCE(X86_TRAP_MC, exc_machine_check);
#endif
/* NMI */
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
DECLARE_IDTENTRY_XEN(X86_TRAP_NMI, nmi);
/* #DB */
DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB, exc_debug);
DECLARE_IDTENTRY_XEN(X86_TRAP_DB, debug);
/* #DF */
DECLARE_IDTENTRY_DF(X86_TRAP_DF, exc_double_fault);
#ifdef CONFIG_XEN_PV
DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER, exc_xen_hypervisor_callback);
#endif
/* Device interrupts common/spurious */
DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt);
#ifdef CONFIG_X86_LOCAL_APIC
DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, spurious_interrupt);
#endif
/* System vector entry points */
#ifdef CONFIG_X86_LOCAL_APIC
DECLARE_IDTENTRY_SYSVEC(ERROR_APIC_VECTOR, sysvec_error_interrupt);
DECLARE_IDTENTRY_SYSVEC(SPURIOUS_APIC_VECTOR, sysvec_spurious_apic_interrupt);
DECLARE_IDTENTRY_SYSVEC(LOCAL_TIMER_VECTOR, sysvec_apic_timer_interrupt);
DECLARE_IDTENTRY_SYSVEC(X86_PLATFORM_IPI_VECTOR, sysvec_x86_platform_ipi);
#endif
#ifdef CONFIG_SMP
DECLARE_IDTENTRY(RESCHEDULE_VECTOR, sysvec_reschedule_ipi);
DECLARE_IDTENTRY_SYSVEC(IRQ_MOVE_CLEANUP_VECTOR, sysvec_irq_move_cleanup);
DECLARE_IDTENTRY_SYSVEC(REBOOT_VECTOR, sysvec_reboot);
DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_SINGLE_VECTOR, sysvec_call_function_single);
DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_VECTOR, sysvec_call_function);
#endif
#ifdef CONFIG_X86_LOCAL_APIC
# ifdef CONFIG_X86_UV
DECLARE_IDTENTRY_SYSVEC(UV_BAU_MESSAGE, sysvec_uv_bau_message);
# endif
# ifdef CONFIG_X86_MCE_THRESHOLD
DECLARE_IDTENTRY_SYSVEC(THRESHOLD_APIC_VECTOR, sysvec_threshold);
# endif
# ifdef CONFIG_X86_MCE_AMD
DECLARE_IDTENTRY_SYSVEC(DEFERRED_ERROR_VECTOR, sysvec_deferred_error);
# endif
# ifdef CONFIG_X86_THERMAL_VECTOR
DECLARE_IDTENTRY_SYSVEC(THERMAL_APIC_VECTOR, sysvec_thermal);
# endif
# ifdef CONFIG_IRQ_WORK
DECLARE_IDTENTRY_SYSVEC(IRQ_WORK_VECTOR, sysvec_irq_work);
# endif
#endif
#ifdef CONFIG_HAVE_KVM
DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_VECTOR, sysvec_kvm_posted_intr_ipi);
DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_WAKEUP_VECTOR, sysvec_kvm_posted_intr_wakeup_ipi);
DECLARE_IDTENTRY_SYSVEC(POSTED_INTR_NESTED_VECTOR, sysvec_kvm_posted_intr_nested_ipi);
#endif
#if IS_ENABLED(CONFIG_HYPERV)
DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR, sysvec_hyperv_callback);
DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_REENLIGHTENMENT_VECTOR, sysvec_hyperv_reenlightenment);
DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_STIMER0_VECTOR, sysvec_hyperv_stimer0);
#endif
#if IS_ENABLED(CONFIG_ACRN_GUEST)
DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
#endif
#ifdef CONFIG_XEN_PVHVM
DECLARE_IDTENTRY_SYSVEC(HYPERVISOR_CALLBACK_VECTOR, sysvec_xen_hvm_callback);
#endif
#undef X86_TRAP_OTHER
#endif

View File

@ -11,6 +11,13 @@
#include <asm/apicdef.h>
#include <asm/irq_vectors.h>
/*
* The irq entry code is in the noinstr section and the start/end of
* __irqentry_text is emitted via labels. Make the build fail if
* something moves a C function into the __irq_entry section.
*/
#define __irq_entry __invalid_section
static inline int irq_canonicalize(int irq)
{
return ((irq == 2) ? 9 : irq);
@ -26,17 +33,14 @@ extern void fixup_irqs(void);
#ifdef CONFIG_HAVE_KVM
extern void kvm_set_posted_intr_wakeup_handler(void (*handler)(void));
extern __visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs);
extern __visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs);
extern __visible void smp_kvm_posted_intr_nested_ipi(struct pt_regs *regs);
#endif
extern void (*x86_platform_ipi_callback)(void);
extern void native_init_IRQ(void);
extern void handle_irq(struct irq_desc *desc, struct pt_regs *regs);
extern void __handle_irq(struct irq_desc *desc, struct pt_regs *regs);
extern __visible void do_IRQ(struct pt_regs *regs);
extern __visible void do_IRQ(struct pt_regs *regs, unsigned long vector);
extern void init_ISA_irqs(void);
@ -46,7 +50,6 @@ extern void __init init_IRQ(void);
void arch_trigger_cpumask_backtrace(const struct cpumask *mask,
bool exclude_self);
extern __visible void smp_x86_platform_ipi(struct pt_regs *regs);
#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

View File

@ -1,32 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Per-cpu current frame pointer - the location of the last exception frame on
* the stack, stored in the per-cpu area.
*
* Jeremy Fitzhardinge <jeremy@goop.org>
*/
#ifndef _ASM_X86_IRQ_REGS_H
#define _ASM_X86_IRQ_REGS_H
#include <asm/percpu.h>
#define ARCH_HAS_OWN_IRQ_REGS
DECLARE_PER_CPU(struct pt_regs *, irq_regs);
static inline struct pt_regs *get_irq_regs(void)
{
return __this_cpu_read(irq_regs);
}
static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
{
struct pt_regs *old_regs;
old_regs = get_irq_regs();
__this_cpu_write(irq_regs, new_regs);
return old_regs;
}
#endif /* _ASM_X86_IRQ_REGS_32_H */

View File

@ -0,0 +1,53 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_IRQ_STACK_H
#define _ASM_X86_IRQ_STACK_H
#include <linux/ptrace.h>
#include <asm/processor.h>
#ifdef CONFIG_X86_64
static __always_inline bool irqstack_active(void)
{
return __this_cpu_read(irq_count) != -1;
}
void asm_call_on_stack(void *sp, void *func, void *arg);
static __always_inline void __run_on_irqstack(void *func, void *arg)
{
void *tos = __this_cpu_read(hardirq_stack_ptr);
__this_cpu_add(irq_count, 1);
asm_call_on_stack(tos - 8, func, arg);
__this_cpu_sub(irq_count, 1);
}
#else /* CONFIG_X86_64 */
static inline bool irqstack_active(void) { return false; }
static inline void __run_on_irqstack(void *func, void *arg) { }
#endif /* !CONFIG_X86_64 */
static __always_inline bool irq_needs_irq_stack(struct pt_regs *regs)
{
if (IS_ENABLED(CONFIG_X86_32))
return false;
if (!regs)
return !irqstack_active();
return !user_mode(regs) && !irqstack_active();
}
static __always_inline void run_on_irqstack_cond(void *func, void *arg,
struct pt_regs *regs)
{
void (*__func)(void *arg) = func;
lockdep_assert_irqs_disabled();
if (irq_needs_irq_stack(regs))
__run_on_irqstack(__func, arg);
else
__func(arg);
}
#endif

View File

@ -10,7 +10,6 @@ static inline bool arch_irq_work_has_interrupt(void)
return boot_cpu_has(X86_FEATURE_APIC);
}
extern void arch_irq_work_raise(void);
extern __visible void smp_irq_work_interrupt(struct pt_regs *regs);
#else
static inline bool arch_irq_work_has_interrupt(void)
{

View File

@ -17,7 +17,7 @@
/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
extern inline unsigned long native_save_fl(void);
extern inline unsigned long native_save_fl(void)
extern __always_inline unsigned long native_save_fl(void)
{
unsigned long flags;
@ -44,12 +44,12 @@ extern inline void native_restore_fl(unsigned long flags)
:"memory", "cc");
}
static inline void native_irq_disable(void)
static __always_inline void native_irq_disable(void)
{
asm volatile("cli": : :"memory");
}
static inline void native_irq_enable(void)
static __always_inline void native_irq_enable(void)
{
asm volatile("sti": : :"memory");
}
@ -74,22 +74,22 @@ static inline __cpuidle void native_halt(void)
#ifndef __ASSEMBLY__
#include <linux/types.h>
static inline notrace unsigned long arch_local_save_flags(void)
static __always_inline unsigned long arch_local_save_flags(void)
{
return native_save_fl();
}
static inline notrace void arch_local_irq_restore(unsigned long flags)
static __always_inline void arch_local_irq_restore(unsigned long flags)
{
native_restore_fl(flags);
}
static inline notrace void arch_local_irq_disable(void)
static __always_inline void arch_local_irq_disable(void)
{
native_irq_disable();
}
static inline notrace void arch_local_irq_enable(void)
static __always_inline void arch_local_irq_enable(void)
{
native_irq_enable();
}
@ -115,7 +115,7 @@ static inline __cpuidle void halt(void)
/*
* For spinlocks, etc:
*/
static inline notrace unsigned long arch_local_irq_save(void)
static __always_inline unsigned long arch_local_irq_save(void)
{
unsigned long flags = arch_local_save_flags();
arch_local_irq_disable();
@ -159,12 +159,12 @@ static inline notrace unsigned long arch_local_irq_save(void)
#endif /* CONFIG_PARAVIRT_XXL */
#ifndef __ASSEMBLY__
static inline int arch_irqs_disabled_flags(unsigned long flags)
static __always_inline int arch_irqs_disabled_flags(unsigned long flags)
{
return !(flags & X86_EFLAGS_IF);
}
static inline int arch_irqs_disabled(void)
static __always_inline int arch_irqs_disabled(void)
{
unsigned long flags = arch_local_save_flags();
@ -172,38 +172,4 @@ static inline int arch_irqs_disabled(void)
}
#endif /* !__ASSEMBLY__ */
#ifdef __ASSEMBLY__
#ifdef CONFIG_TRACE_IRQFLAGS
# define TRACE_IRQS_ON call trace_hardirqs_on_thunk;
# define TRACE_IRQS_OFF call trace_hardirqs_off_thunk;
#else
# define TRACE_IRQS_ON
# define TRACE_IRQS_OFF
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# ifdef CONFIG_X86_64
# define LOCKDEP_SYS_EXIT call lockdep_sys_exit_thunk
# define LOCKDEP_SYS_EXIT_IRQ \
TRACE_IRQS_ON; \
sti; \
call lockdep_sys_exit_thunk; \
cli; \
TRACE_IRQS_OFF;
# else
# define LOCKDEP_SYS_EXIT \
pushl %eax; \
pushl %ecx; \
pushl %edx; \
call lockdep_sys_exit; \
popl %edx; \
popl %ecx; \
popl %eax;
# define LOCKDEP_SYS_EXIT_IRQ
# endif
#else
# define LOCKDEP_SYS_EXIT
# define LOCKDEP_SYS_EXIT_IRQ
#endif
#endif /* __ASSEMBLY__ */
#endif

View File

@ -141,7 +141,7 @@ static inline void kvm_disable_steal_time(void)
return;
}
static inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token)
static __always_inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token)
{
return false;
}

View File

@ -238,7 +238,7 @@ extern void mce_disable_bank(int bank);
/*
* Exception handler
*/
void do_machine_check(struct pt_regs *, long);
void do_machine_check(struct pt_regs *pt_regs);
/*
* Threshold handler

View File

@ -54,20 +54,8 @@ typedef int (*hyperv_fill_flush_list_func)(
vclocks_set_used(VDSO_CLOCKMODE_HVCLOCK);
#define hv_get_raw_timer() rdtsc_ordered()
void hyperv_callback_vector(void);
void hyperv_reenlightenment_vector(void);
#ifdef CONFIG_TRACING
#define trace_hyperv_callback_vector hyperv_callback_vector
#endif
void hyperv_vector_handler(struct pt_regs *regs);
/*
* Routines for stimer0 Direct Mode handling.
* On x86/x64, there are no percpu actions to take.
*/
void hv_stimer0_vector_handler(struct pt_regs *regs);
void hv_stimer0_callback_vector(void);
static inline void hv_enable_stimer0_percpu_irq(int irq) {}
static inline void hv_disable_stimer0_percpu_irq(int irq) {}
@ -226,7 +214,6 @@ void hyperv_setup_mmu_ops(void);
void *hv_alloc_hyperv_page(void);
void *hv_alloc_hyperv_zeroed_page(void);
void hv_free_hyperv_page(unsigned long addr);
void hyperv_reenlightenment_intr(struct pt_regs *regs);
void set_hv_tscchange_cb(void (*cb)(void));
void clear_hv_tscchange_cb(void);
void hyperv_stop_tsc_emulation(void);

View File

@ -262,7 +262,7 @@ DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
* combination with microcode which triggers a CPU buffer flush when the
* instruction is executed.
*/
static inline void mds_clear_cpu_buffers(void)
static __always_inline void mds_clear_cpu_buffers(void)
{
static const u16 ds = __KERNEL_DS;
@ -283,7 +283,7 @@ static inline void mds_clear_cpu_buffers(void)
*
* Clear CPU buffers if the corresponding static key is enabled
*/
static inline void mds_user_clear_cpu_buffers(void)
static __always_inline void mds_user_clear_cpu_buffers(void)
{
if (static_branch_likely(&mds_user_clear))
mds_clear_cpu_buffers();

View File

@ -823,7 +823,7 @@ static inline void prefetch(const void *x)
* Useful for spinlocks to avoid one state transition in the
* cache coherency protocol:
*/
static inline void prefetchw(const void *x)
static __always_inline void prefetchw(const void *x)
{
alternative_input(BASE_PREFETCH, "prefetchw %P1",
X86_FEATURE_3DNOWPREFETCH,

View File

@ -123,7 +123,7 @@ static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
* On x86_64, vm86 mode is mercifully nonexistent, and we don't need
* the extra check.
*/
static inline int user_mode(struct pt_regs *regs)
static __always_inline int user_mode(struct pt_regs *regs)
{
#ifdef CONFIG_X86_32
return ((regs->cs & SEGMENT_RPL_MASK) | (regs->flags & X86_VM_MASK)) >= USER_RPL;

View File

@ -7,6 +7,7 @@
#include <asm/nops.h>
#include <asm/processor-flags.h>
#include <linux/irqflags.h>
#include <linux/jump_label.h>
/*
@ -27,14 +28,14 @@ static inline unsigned long native_read_cr0(void)
return val;
}
static inline unsigned long native_read_cr2(void)
static __always_inline unsigned long native_read_cr2(void)
{
unsigned long val;
asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
return val;
}
static inline void native_write_cr2(unsigned long val)
static __always_inline void native_write_cr2(unsigned long val)
{
asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
}
@ -129,7 +130,16 @@ static inline void native_wbinvd(void)
asm volatile("wbinvd": : :"memory");
}
extern asmlinkage void native_load_gs_index(unsigned);
extern asmlinkage void asm_load_gs_index(unsigned int selector);
static inline void native_load_gs_index(unsigned int selector)
{
unsigned long flags;
local_irq_save(flags);
asm_load_gs_index(selector);
local_irq_restore(flags);
}
static inline unsigned long __read_cr4(void)
{
@ -150,12 +160,12 @@ static inline void write_cr0(unsigned long x)
native_write_cr0(x);
}
static inline unsigned long read_cr2(void)
static __always_inline unsigned long read_cr2(void)
{
return native_read_cr2();
}
static inline void write_cr2(unsigned long x)
static __always_inline void write_cr2(unsigned long x)
{
native_write_cr2(x);
}
@ -186,7 +196,7 @@ static inline void wbinvd(void)
#ifdef CONFIG_X86_64
static inline void load_gs_index(unsigned selector)
static inline void load_gs_index(unsigned int selector)
{
native_load_gs_index(selector);
}

View File

@ -64,7 +64,7 @@ extern void text_poke_finish(void);
#define DISP32_SIZE 4
static inline int text_opcode_size(u8 opcode)
static __always_inline int text_opcode_size(u8 opcode)
{
int size = 0;
@ -118,12 +118,14 @@ extern __ro_after_init struct mm_struct *poking_mm;
extern __ro_after_init unsigned long poking_addr;
#ifndef CONFIG_UML_X86
static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
static __always_inline
void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
{
regs->ip = ip;
}
static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
static __always_inline
void int3_emulate_push(struct pt_regs *regs, unsigned long val)
{
/*
* The int3 handler in entry_64.S adds a gap between the
@ -138,7 +140,8 @@ static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
*(unsigned long *)regs->sp = val;
}
static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
static __always_inline
void int3_emulate_call(struct pt_regs *regs, unsigned long func)
{
int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
int3_emulate_jmp(regs, func);

View File

@ -5,12 +5,8 @@
DECLARE_STATIC_KEY_FALSE(trace_pagefault_key);
#define trace_pagefault_enabled() \
static_branch_unlikely(&trace_pagefault_key)
DECLARE_STATIC_KEY_FALSE(trace_resched_ipi_key);
#define trace_resched_ipi_enabled() \
static_branch_unlikely(&trace_resched_ipi_key)
#else
static inline bool trace_pagefault_enabled(void) { return false; }
static inline bool trace_resched_ipi_enabled(void) { return false; }
#endif
#endif

View File

@ -10,9 +10,6 @@
#ifdef CONFIG_X86_LOCAL_APIC
extern int trace_resched_ipi_reg(void);
extern void trace_resched_ipi_unreg(void);
DECLARE_EVENT_CLASS(x86_irq_vector,
TP_PROTO(int vector),
@ -37,18 +34,6 @@ DEFINE_EVENT_FN(x86_irq_vector, name##_exit, \
TP_PROTO(int vector), \
TP_ARGS(vector), NULL, NULL);
#define DEFINE_RESCHED_IPI_EVENT(name) \
DEFINE_EVENT_FN(x86_irq_vector, name##_entry, \
TP_PROTO(int vector), \
TP_ARGS(vector), \
trace_resched_ipi_reg, \
trace_resched_ipi_unreg); \
DEFINE_EVENT_FN(x86_irq_vector, name##_exit, \
TP_PROTO(int vector), \
TP_ARGS(vector), \
trace_resched_ipi_reg, \
trace_resched_ipi_unreg);
/*
* local_timer - called when entering/exiting a local timer interrupt
* vector handler
@ -99,7 +84,7 @@ TRACE_EVENT_PERF_PERM(irq_work_exit, is_sampling_event(p_event) ? -EPERM : 0);
/*
* reschedule - called when entering/exiting a reschedule vector handler
*/
DEFINE_RESCHED_IPI_EVENT(reschedule);
DEFINE_IRQ_VECTOR_EVENT(reschedule);
/*
* call_function - called when entering/exiting a call function interrupt

View File

@ -0,0 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_TRAPNR_H
#define _ASM_X86_TRAPNR_H
/* Interrupts/Exceptions */
#define X86_TRAP_DE 0 /* Divide-by-zero */
#define X86_TRAP_DB 1 /* Debug */
#define X86_TRAP_NMI 2 /* Non-maskable Interrupt */
#define X86_TRAP_BP 3 /* Breakpoint */
#define X86_TRAP_OF 4 /* Overflow */
#define X86_TRAP_BR 5 /* Bound Range Exceeded */
#define X86_TRAP_UD 6 /* Invalid Opcode */
#define X86_TRAP_NM 7 /* Device Not Available */
#define X86_TRAP_DF 8 /* Double Fault */
#define X86_TRAP_OLD_MF 9 /* Coprocessor Segment Overrun */
#define X86_TRAP_TS 10 /* Invalid TSS */
#define X86_TRAP_NP 11 /* Segment Not Present */
#define X86_TRAP_SS 12 /* Stack Segment Fault */
#define X86_TRAP_GP 13 /* General Protection Fault */
#define X86_TRAP_PF 14 /* Page Fault */
#define X86_TRAP_SPURIOUS 15 /* Spurious Interrupt */
#define X86_TRAP_MF 16 /* x87 Floating-Point Exception */
#define X86_TRAP_AC 17 /* Alignment Check */
#define X86_TRAP_MC 18 /* Machine Check */
#define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */
#define X86_TRAP_VE 20 /* Virtualization Exception */
#define X86_TRAP_CP 21 /* Control Protection Exception */
#define X86_TRAP_IRET 32 /* IRET Exception */
#endif

View File

@ -6,85 +6,9 @@
#include <linux/kprobes.h>
#include <asm/debugreg.h>
#include <asm/idtentry.h>
#include <asm/siginfo.h> /* TRAP_TRACE, ... */
#define dotraplinkage __visible
asmlinkage void divide_error(void);
asmlinkage void debug(void);
asmlinkage void nmi(void);
asmlinkage void int3(void);
asmlinkage void overflow(void);
asmlinkage void bounds(void);
asmlinkage void invalid_op(void);
asmlinkage void device_not_available(void);
#ifdef CONFIG_X86_64
asmlinkage void double_fault(void);
#endif
asmlinkage void coprocessor_segment_overrun(void);
asmlinkage void invalid_TSS(void);
asmlinkage void segment_not_present(void);
asmlinkage void stack_segment(void);
asmlinkage void general_protection(void);
asmlinkage void page_fault(void);
asmlinkage void async_page_fault(void);
asmlinkage void spurious_interrupt_bug(void);
asmlinkage void coprocessor_error(void);
asmlinkage void alignment_check(void);
#ifdef CONFIG_X86_MCE
asmlinkage void machine_check(void);
#endif /* CONFIG_X86_MCE */
asmlinkage void simd_coprocessor_error(void);
#if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
asmlinkage void xen_divide_error(void);
asmlinkage void xen_xennmi(void);
asmlinkage void xen_xendebug(void);
asmlinkage void xen_int3(void);
asmlinkage void xen_overflow(void);
asmlinkage void xen_bounds(void);
asmlinkage void xen_invalid_op(void);
asmlinkage void xen_device_not_available(void);
asmlinkage void xen_double_fault(void);
asmlinkage void xen_coprocessor_segment_overrun(void);
asmlinkage void xen_invalid_TSS(void);
asmlinkage void xen_segment_not_present(void);
asmlinkage void xen_stack_segment(void);
asmlinkage void xen_general_protection(void);
asmlinkage void xen_page_fault(void);
asmlinkage void xen_spurious_interrupt_bug(void);
asmlinkage void xen_coprocessor_error(void);
asmlinkage void xen_alignment_check(void);
#ifdef CONFIG_X86_MCE
asmlinkage void xen_machine_check(void);
#endif /* CONFIG_X86_MCE */
asmlinkage void xen_simd_coprocessor_error(void);
#endif
dotraplinkage void do_divide_error(struct pt_regs *regs, long error_code);
dotraplinkage void do_debug(struct pt_regs *regs, long error_code);
dotraplinkage void do_nmi(struct pt_regs *regs, long error_code);
dotraplinkage void do_int3(struct pt_regs *regs, long error_code);
dotraplinkage void do_overflow(struct pt_regs *regs, long error_code);
dotraplinkage void do_bounds(struct pt_regs *regs, long error_code);
dotraplinkage void do_invalid_op(struct pt_regs *regs, long error_code);
dotraplinkage void do_device_not_available(struct pt_regs *regs, long error_code);
dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsigned long cr2);
dotraplinkage void do_coprocessor_segment_overrun(struct pt_regs *regs, long error_code);
dotraplinkage void do_invalid_TSS(struct pt_regs *regs, long error_code);
dotraplinkage void do_segment_not_present(struct pt_regs *regs, long error_code);
dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code);
dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code);
dotraplinkage void do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
dotraplinkage void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code);
dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code);
dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code);
dotraplinkage void do_simd_coprocessor_error(struct pt_regs *regs, long error_code);
#ifdef CONFIG_X86_32
dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code);
#endif
dotraplinkage void do_mce(struct pt_regs *regs, long error_code);
#ifdef CONFIG_X86_64
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
asmlinkage __visible notrace
@ -92,6 +16,11 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
void __init trap_init(void);
#endif
#ifdef CONFIG_X86_F00F_BUG
/* For handling the FOOF bug */
void handle_invalid_op(struct pt_regs *regs);
#endif
static inline int get_si_code(unsigned long condition)
{
if (condition & DR_STEP)
@ -105,16 +34,6 @@ static inline int get_si_code(unsigned long condition)
extern int panic_on_unrecovered_nmi;
void math_emulate(struct math_emu_info *);
#ifndef CONFIG_X86_32
asmlinkage void smp_thermal_interrupt(struct pt_regs *regs);
asmlinkage void smp_threshold_interrupt(struct pt_regs *regs);
asmlinkage void smp_deferred_error_interrupt(struct pt_regs *regs);
#endif
void smp_apic_timer_interrupt(struct pt_regs *regs);
void smp_spurious_interrupt(struct pt_regs *regs);
void smp_error_interrupt(struct pt_regs *regs);
asmlinkage void smp_irq_move_cleanup_interrupt(void);
#ifdef CONFIG_VMAP_STACK
void __noreturn handle_stack_overflow(const char *message,
@ -122,31 +41,6 @@ void __noreturn handle_stack_overflow(const char *message,
unsigned long fault_address);
#endif
/* Interrupts/Exceptions */
enum {
X86_TRAP_DE = 0, /* 0, Divide-by-zero */
X86_TRAP_DB, /* 1, Debug */
X86_TRAP_NMI, /* 2, Non-maskable Interrupt */
X86_TRAP_BP, /* 3, Breakpoint */
X86_TRAP_OF, /* 4, Overflow */
X86_TRAP_BR, /* 5, Bound Range Exceeded */
X86_TRAP_UD, /* 6, Invalid Opcode */
X86_TRAP_NM, /* 7, Device Not Available */
X86_TRAP_DF, /* 8, Double Fault */
X86_TRAP_OLD_MF, /* 9, Coprocessor Segment Overrun */
X86_TRAP_TS, /* 10, Invalid TSS */
X86_TRAP_NP, /* 11, Segment Not Present */
X86_TRAP_SS, /* 12, Stack Segment Fault */
X86_TRAP_GP, /* 13, General Protection Fault */
X86_TRAP_PF, /* 14, Page Fault */
X86_TRAP_SPURIOUS, /* 15, Spurious Interrupt */
X86_TRAP_MF, /* 16, x87 Floating-Point Exception */
X86_TRAP_AC, /* 17, Alignment Check */
X86_TRAP_MC, /* 18, Machine Check */
X86_TRAP_XF, /* 19, SIMD Floating-Point Exception */
X86_TRAP_IRET = 32, /* 32, IRET Exception */
};
/*
* Page fault error code bits:
*

View File

@ -12,6 +12,8 @@
#define _ASM_X86_UV_UV_BAU_H
#include <linux/bitmap.h>
#include <asm/idtentry.h>
#define BITSPERBYTE 8
/*
@ -799,12 +801,6 @@ static inline void bau_cpubits_clear(struct bau_local_cpumask *dstp, int nbits)
bitmap_zero(&dstp->bits, nbits);
}
extern void uv_bau_message_intr1(void);
#ifdef CONFIG_TRACING
#define trace_uv_bau_message_intr1 uv_bau_message_intr1
#endif
extern void uv_bau_timeout_intr1(void);
struct atomic_short {
short counter;
};

View File

@ -1011,28 +1011,29 @@ struct bp_patching_desc {
static struct bp_patching_desc *bp_desc;
static inline struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
static __always_inline
struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
{
struct bp_patching_desc *desc = READ_ONCE(*descp); /* rcu_dereference */
struct bp_patching_desc *desc = __READ_ONCE(*descp); /* rcu_dereference */
if (!desc || !atomic_inc_not_zero(&desc->refs))
if (!desc || !arch_atomic_inc_not_zero(&desc->refs))
return NULL;
return desc;
}
static inline void put_desc(struct bp_patching_desc *desc)
static __always_inline void put_desc(struct bp_patching_desc *desc)
{
smp_mb__before_atomic();
atomic_dec(&desc->refs);
arch_atomic_dec(&desc->refs);
}
static inline void *text_poke_addr(struct text_poke_loc *tp)
static __always_inline void *text_poke_addr(struct text_poke_loc *tp)
{
return _stext + tp->rel_addr;
}
static int notrace patch_cmp(const void *key, const void *elt)
static __always_inline int patch_cmp(const void *key, const void *elt)
{
struct text_poke_loc *tp = (struct text_poke_loc *) elt;
@ -1042,9 +1043,8 @@ static int notrace patch_cmp(const void *key, const void *elt)
return 1;
return 0;
}
NOKPROBE_SYMBOL(patch_cmp);
int notrace poke_int3_handler(struct pt_regs *regs)
int noinstr poke_int3_handler(struct pt_regs *regs)
{
struct bp_patching_desc *desc;
struct text_poke_loc *tp;
@ -1077,9 +1077,9 @@ int notrace poke_int3_handler(struct pt_regs *regs)
* Skip the binary search if there is a single member in the vector.
*/
if (unlikely(desc->nr_entries > 1)) {
tp = bsearch(ip, desc->vec, desc->nr_entries,
sizeof(struct text_poke_loc),
patch_cmp);
tp = __inline_bsearch(ip, desc->vec, desc->nr_entries,
sizeof(struct text_poke_loc),
patch_cmp);
if (!tp)
goto out_put;
} else {
@ -1118,7 +1118,6 @@ out_put:
put_desc(desc);
return ret;
}
NOKPROBE_SYMBOL(poke_int3_handler);
#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc))
static struct text_poke_loc tp_vec[TP_VEC_MAX];

View File

@ -1088,23 +1088,14 @@ static void local_apic_timer_interrupt(void)
* [ if a single-CPU system runs an SMP kernel then we call the local
* interrupt as well. Thus we cannot inline the local irq ... ]
*/
__visible void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
{
struct pt_regs *old_regs = set_irq_regs(regs);
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
*
* update_process_times() expects us to have done irq_enter().
* Besides, if we don't timer interrupts ignore the global
* interrupt lock, which is the WrongThing (tm) to do.
*/
entering_ack_irq();
ack_APIC_irq();
trace_local_timer_entry(LOCAL_TIMER_VECTOR);
local_apic_timer_interrupt();
trace_local_timer_exit(LOCAL_TIMER_VECTOR);
exiting_irq();
set_irq_regs(old_regs);
}
@ -2120,15 +2111,21 @@ void __init register_lapic_address(unsigned long address)
* Local APIC interrupts
*/
/*
* This interrupt should _never_ happen with our APIC/SMP architecture
/**
* spurious_interrupt - Catch all for interrupts raised on unused vectors
* @regs: Pointer to pt_regs on stack
* @vector: The vector number
*
* This is invoked from ASM entry code to catch all interrupts which
* trigger on an entry which is routed to the common_spurious idtentry
* point.
*
* Also called from sysvec_spurious_apic_interrupt().
*/
__visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_IRQ(spurious_interrupt)
{
u8 vector = ~regs->orig_ax;
u32 v;
entering_irq();
trace_spurious_apic_entry(vector);
inc_irq_stat(irq_spurious_count);
@ -2158,13 +2155,17 @@ __visible void __irq_entry smp_spurious_interrupt(struct pt_regs *regs)
}
out:
trace_spurious_apic_exit(vector);
exiting_irq();
}
DEFINE_IDTENTRY_SYSVEC(sysvec_spurious_apic_interrupt)
{
__spurious_interrupt(regs, SPURIOUS_APIC_VECTOR);
}
/*
* This interrupt should never happen with our APIC/SMP architecture
*/
__visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_error_interrupt)
{
static const char * const error_interrupt_reason[] = {
"Send CS error", /* APIC Error Bit 0 */
@ -2178,7 +2179,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
};
u32 v, i = 0;
entering_irq();
trace_error_apic_entry(ERROR_APIC_VECTOR);
/* First tickle the hardware, only then report what went on. -- REW */
@ -2202,7 +2202,6 @@ __visible void __irq_entry smp_error_interrupt(struct pt_regs *regs)
apic_printk(APIC_DEBUG, KERN_CONT "\n");
trace_error_apic_exit(ERROR_APIC_VECTOR);
exiting_irq();
}
/**

View File

@ -115,7 +115,8 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
* denote it as spurious which is no harm as this is a rare event
* and interrupt handlers have to cope with spurious interrupts
* anyway. If the vector is unused, then it is marked so it won't
* trigger the 'No irq handler for vector' warning in do_IRQ().
* trigger the 'No irq handler for vector' warning in
* common_interrupt().
*
* This requires to hold vector lock to prevent concurrent updates to
* the affected vector.

View File

@ -861,13 +861,13 @@ static void free_moved_vector(struct apic_chip_data *apicd)
apicd->move_in_progress = 0;
}
asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
{
struct hlist_head *clhead = this_cpu_ptr(&cleanup_list);
struct apic_chip_data *apicd;
struct hlist_node *tmp;
entering_ack_irq();
ack_APIC_irq();
/* Prevent vectors vanishing under us */
raw_spin_lock(&vector_lock);
@ -892,7 +892,6 @@ asmlinkage __visible void __irq_entry smp_irq_move_cleanup_interrupt(void)
}
raw_spin_unlock(&vector_lock);
exiting_irq();
}
static void __send_cleanup_vector(struct apic_chip_data *apicd)

View File

@ -57,9 +57,6 @@ int main(void)
BLANK();
#undef ENTRY
OFFSET(TSS_ist, tss_struct, x86_tss.ist);
DEFINE(DB_STACK_OFFSET, offsetof(struct cea_exception_stacks, DB_stack) -
offsetof(struct cea_exception_stacks, DB1_stack));
BLANK();
#ifdef CONFIG_STACKPROTECTOR

View File

@ -10,10 +10,10 @@
*/
#include <linux/interrupt.h>
#include <asm/acrn.h>
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/hypervisor.h>
#include <asm/idtentry.h>
#include <asm/irq_regs.h>
static uint32_t __init acrn_detect(void)
@ -24,7 +24,7 @@ static uint32_t __init acrn_detect(void)
static void __init acrn_init_platform(void)
{
/* Setup the IDT for ACRN hypervisor callback */
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, acrn_hv_callback_vector);
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
}
static bool acrn_x2apic_available(void)
@ -39,7 +39,7 @@ static bool acrn_x2apic_available(void)
static void (*acrn_intr_handler)(void);
__visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_acrn_hv_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
@ -50,13 +50,12 @@ __visible void __irq_entry acrn_hv_vector_handler(struct pt_regs *regs)
* will block the interrupt whose vector is lower than
* HYPERVISOR_CALLBACK_VECTOR.
*/
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(irq_hv_callback_count);
if (acrn_intr_handler)
acrn_intr_handler();
exiting_irq();
set_irq_regs(old_regs);
}

View File

@ -1706,25 +1706,6 @@ void syscall_init(void)
X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
}
DEFINE_PER_CPU(int, debug_stack_usage);
DEFINE_PER_CPU(u32, debug_idt_ctr);
void debug_stack_set_zero(void)
{
this_cpu_inc(debug_idt_ctr);
load_current_idt();
}
NOKPROBE_SYMBOL(debug_stack_set_zero);
void debug_stack_reset(void)
{
if (WARN_ON(!this_cpu_read(debug_idt_ctr)))
return;
if (this_cpu_dec_return(debug_idt_ctr) == 0)
load_current_idt();
}
NOKPROBE_SYMBOL(debug_stack_reset);
#else /* CONFIG_X86_64 */
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;

View File

@ -907,14 +907,13 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
mce_log(&m);
}
asmlinkage __visible void __irq_entry smp_deferred_error_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
{
entering_irq();
trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
inc_irq_stat(irq_deferred_error_count);
deferred_error_int_vector();
trace_deferred_error_apic_exit(DEFERRED_ERROR_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}
/*

View File

@ -130,7 +130,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
/* Do initial initialization of a struct mce */
void mce_setup(struct mce *m)
noinstr void mce_setup(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
m->cpu = m->extcpu = smp_processor_id();
@ -140,12 +140,12 @@ void mce_setup(struct mce *m)
m->cpuid = cpuid_eax(1);
m->socketid = cpu_data(m->extcpu).phys_proc_id;
m->apicid = cpu_data(m->extcpu).initial_apicid;
rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap);
m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
if (this_cpu_has(X86_FEATURE_INTEL_PPIN))
rdmsrl(MSR_PPIN, m->ppin);
m->ppin = __rdmsr(MSR_PPIN);
else if (this_cpu_has(X86_FEATURE_AMD_PPIN))
rdmsrl(MSR_AMD_PPIN, m->ppin);
m->ppin = __rdmsr(MSR_AMD_PPIN);
m->microcode = boot_cpu_data.microcode;
}
@ -1100,13 +1100,15 @@ static void mce_clear_state(unsigned long *toclear)
* kdump kernel establishing a new #MC handler where a broadcasted MCE
* might not get handled properly.
*/
static bool __mc_check_crashing_cpu(int cpu)
static noinstr bool mce_check_crashing_cpu(void)
{
unsigned int cpu = smp_processor_id();
if (cpu_is_offline(cpu) ||
(crashing_cpu != -1 && crashing_cpu != cpu)) {
u64 mcgstatus;
mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
if (mcgstatus & MCG_STATUS_LMCES)
@ -1114,7 +1116,7 @@ static bool __mc_check_crashing_cpu(int cpu)
}
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
__wrmsr(MSR_IA32_MCG_STATUS, 0, 0);
return true;
}
}
@ -1230,12 +1232,11 @@ static void kill_me_maybe(struct callback_head *cb)
* backing the user stack, tracing that reads the user stack will cause
* potentially infinite recursion.
*/
void noinstr do_machine_check(struct pt_regs *regs, long error_code)
void noinstr do_machine_check(struct pt_regs *regs)
{
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
struct mca_config *cfg = &mca_cfg;
int cpu = smp_processor_id();
struct mce m, *final;
char *msg = NULL;
int worst = 0;
@ -1264,11 +1265,6 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
*/
int lmce = 1;
if (__mc_check_crashing_cpu(cpu))
return;
nmi_enter();
this_cpu_inc(mce_exception_count);
mce_gather_info(&m, regs);
@ -1356,7 +1352,7 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
sync_core();
if (worst != MCE_AR_SEVERITY && !kill_it)
goto out_ist;
return;
/* Fault was in user mode and we need to take some action */
if ((m.cs & 3) == 3) {
@ -1370,12 +1366,9 @@ void noinstr do_machine_check(struct pt_regs *regs, long error_code)
current->mce_kill_me.func = kill_me_now;
task_work_add(current, &current->mce_kill_me, true);
} else {
if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
mce_panic("Failed kernel mode recovery", &m, msg);
}
out_ist:
nmi_exit();
}
EXPORT_SYMBOL_GPL(do_machine_check);
@ -1902,21 +1895,84 @@ bool filter_mce(struct mce *m)
}
/* Handle unconfigured int18 (should never happen) */
static void unexpected_machine_check(struct pt_regs *regs, long error_code)
static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
instrumentation_begin();
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
instrumentation_end();
}
/* Call the installed machine check handler for this CPU setup. */
void (*machine_check_vector)(struct pt_regs *, long error_code) =
unexpected_machine_check;
void (*machine_check_vector)(struct pt_regs *) = unexpected_machine_check;
dotraplinkage notrace void do_mce(struct pt_regs *regs, long error_code)
static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
{
machine_check_vector(regs, error_code);
/*
* Only required when from kernel mode. See
* mce_check_crashing_cpu() for details.
*/
if (machine_check_vector == do_machine_check &&
mce_check_crashing_cpu())
return;
nmi_enter();
/*
* The call targets are marked noinstr, but objtool can't figure
* that out because it's an indirect call. Annotate it.
*/
instrumentation_begin();
trace_hardirqs_off_finish();
machine_check_vector(regs);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
NOKPROBE_SYMBOL(do_mce);
static __always_inline void exc_machine_check_user(struct pt_regs *regs)
{
idtentry_enter_user(regs);
instrumentation_begin();
machine_check_vector(regs);
instrumentation_end();
idtentry_exit_user(regs);
}
#ifdef CONFIG_X86_64
/* MCE hit kernel mode */
DEFINE_IDTENTRY_MCE(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
exc_machine_check_kernel(regs);
local_db_restore(dr7);
}
/* The user mode variant. */
DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
exc_machine_check_user(regs);
local_db_restore(dr7);
}
#else
/* 32bit unified entry point */
DEFINE_IDTENTRY_MCE(exc_machine_check)
{
unsigned long dr7;
dr7 = local_db_save();
if (user_mode(regs))
exc_machine_check_user(regs);
else
exc_machine_check_kernel(regs);
local_db_restore(dr7);
}
#endif
/*
* Called for each booted CPU to set up machine checks.

View File

@ -146,9 +146,9 @@ static void raise_exception(struct mce *m, struct pt_regs *pregs)
regs.cs = m->cs;
pregs = &regs;
}
/* in mcheck exeception handler, irq will be disabled */
/* do_machine_check() expects interrupts disabled -- at least */
local_irq_save(flags);
do_machine_check(pregs, 0);
do_machine_check(pregs);
local_irq_restore(flags);
m->finished = 0;
}

View File

@ -9,7 +9,7 @@
#include <asm/mce.h>
/* Pointer to the installed machine check handler for this CPU setup. */
extern void (*machine_check_vector)(struct pt_regs *, long error_code);
extern void (*machine_check_vector)(struct pt_regs *);
enum severity_level {
MCE_NO_SEVERITY,

View File

@ -21,12 +21,11 @@
int mce_p5_enabled __read_mostly;
/* Machine check handler for Pentium class Intel CPUs: */
static void pentium_machine_check(struct pt_regs *regs, long error_code)
static noinstr void pentium_machine_check(struct pt_regs *regs)
{
u32 loaddr, hi, lotype;
nmi_enter();
instrumentation_begin();
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
@ -39,8 +38,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
}
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
nmi_exit();
instrumentation_end();
}
/* Set up machine check reporting for processors with Intel style MCE: */

View File

@ -614,14 +614,13 @@ static void unexpected_thermal_interrupt(void)
static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;
asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_thermal)
{
entering_irq();
trace_thermal_apic_entry(THERMAL_APIC_VECTOR);
inc_irq_stat(irq_thermal_count);
smp_thermal_vector();
trace_thermal_apic_exit(THERMAL_APIC_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}
/* Thermal monitoring depends on APIC, ACPI and clock modulation */

View File

@ -21,12 +21,11 @@ static void default_threshold_interrupt(void)
void (*mce_threshold_vector)(void) = default_threshold_interrupt;
asmlinkage __visible void __irq_entry smp_threshold_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_threshold)
{
entering_irq();
trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);
inc_irq_stat(irq_threshold_count);
mce_threshold_vector();
trace_threshold_apic_exit(THRESHOLD_APIC_VECTOR);
exiting_ack_irq();
ack_APIC_irq();
}

View File

@ -17,14 +17,12 @@
#include "internal.h"
/* Machine check handler for WinChip C6: */
static void winchip_machine_check(struct pt_regs *regs, long error_code)
static noinstr void winchip_machine_check(struct pt_regs *regs)
{
nmi_enter();
instrumentation_begin();
pr_emerg("CPU0: Machine Check Exception.\n");
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
nmi_exit();
instrumentation_end();
}
/* Set up machine check reporting on the Winchip C6 series */

View File

@ -23,6 +23,7 @@
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/desc.h>
#include <asm/idtentry.h>
#include <asm/irq_regs.h>
#include <asm/i8259.h>
#include <asm/apic.h>
@ -40,11 +41,10 @@ static void (*hv_stimer0_handler)(void);
static void (*hv_kexec_handler)(void);
static void (*hv_crash_handler)(struct pt_regs *regs);
__visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_irq();
inc_irq_stat(irq_hv_callback_count);
if (vmbus_handler)
vmbus_handler();
@ -52,7 +52,6 @@ __visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs)
if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
ack_APIC_irq();
exiting_irq();
set_irq_regs(old_regs);
}
@ -73,19 +72,16 @@ EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq);
* Routines to do per-architecture handling of stimer0
* interrupts when in Direct Mode
*/
__visible void __irq_entry hv_stimer0_vector_handler(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_stimer0)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_irq();
inc_irq_stat(hyperv_stimer0_count);
if (hv_stimer0_handler)
hv_stimer0_handler();
add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0);
ack_APIC_irq();
exiting_irq();
set_irq_regs(old_regs);
}
@ -331,17 +327,19 @@ static void __init ms_hyperv_init_platform(void)
x86_platform.apic_post_init = hyperv_init;
hyperv_setup_mmu_ops();
/* Setup the IDT for hypervisor callback */
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, hyperv_callback_vector);
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);
/* Setup the IDT for reenlightenment notifications */
if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT)
if (ms_hyperv.features & HV_X64_ACCESS_REENLIGHTENMENT) {
alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
hyperv_reenlightenment_vector);
asm_sysvec_hyperv_reenlightenment);
}
/* Setup the IDT for stimer0 */
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE)
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
alloc_intr_gate(HYPERV_STIMER0_VECTOR,
hv_stimer0_callback_vector);
asm_sysvec_hyperv_stimer0);
}
# ifdef CONFIG_SMP
smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;

View File

@ -10,7 +10,6 @@
#include <asm/desc.h>
#include <asm/traps.h>
extern void double_fault(void);
#define ptr_ok(x) ((x) > PAGE_OFFSET && (x) < PAGE_OFFSET + MAXMEM)
#define TSS(x) this_cpu_read(cpu_tss_rw.x86_tss.x)
@ -21,7 +20,7 @@ static void set_df_gdt_entry(unsigned int cpu);
* Called by double_fault with CR0.TS and EFLAGS.NT cleared. The CPU thinks
* we're running the doublefault task. Cannot return.
*/
asmlinkage notrace void __noreturn doublefault_shim(void)
asmlinkage noinstr void __noreturn doublefault_shim(void)
{
unsigned long cr2;
struct pt_regs regs;
@ -40,7 +39,7 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
* Fill in pt_regs. A downside of doing this in C is that the unwinder
* won't see it (no ENCODE_FRAME_POINTER), so a nested stack dump
* won't successfully unwind to the source of the double fault.
* The main dump from do_double_fault() is fine, though, since it
* The main dump from exc_double_fault() is fine, though, since it
* uses these regs directly.
*
* If anyone ever cares, this could be moved to asm.
@ -70,7 +69,7 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
regs.cx = TSS(cx);
regs.bx = TSS(bx);
do_double_fault(&regs, 0, cr2);
exc_double_fault(&regs, 0, cr2);
/*
* x86_32 does not save the original CR3 anywhere on a task switch.
@ -84,7 +83,6 @@ asmlinkage notrace void __noreturn doublefault_shim(void)
*/
panic("cannot return from double fault\n");
}
NOKPROBE_SYMBOL(doublefault_shim);
DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
.tss = {
@ -95,7 +93,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack) = {
.ldt = 0,
.io_bitmap_base = IO_BITMAP_OFFSET_INVALID,
.ip = (unsigned long) double_fault,
.ip = (unsigned long) asm_exc_double_fault,
.flags = X86_EFLAGS_FIXED,
.es = __USER_DS,
.cs = __KERNEL_CS,

View File

@ -22,15 +22,13 @@
static const char * const exception_stack_names[] = {
[ ESTACK_DF ] = "#DF",
[ ESTACK_NMI ] = "NMI",
[ ESTACK_DB2 ] = "#DB2",
[ ESTACK_DB1 ] = "#DB1",
[ ESTACK_DB ] = "#DB",
[ ESTACK_MCE ] = "#MC",
};
const char *stack_type_name(enum stack_type type)
{
BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
if (type == STACK_TYPE_IRQ)
return "IRQ";
@ -79,7 +77,6 @@ static const
struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
EPAGERANGE(DF),
EPAGERANGE(NMI),
EPAGERANGE(DB1),
EPAGERANGE(DB),
EPAGERANGE(MCE),
};
@ -91,7 +88,7 @@ static bool in_exception_stack(unsigned long *stack, struct stack_info *info)
struct pt_regs *regs;
unsigned int k;
BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
BUILD_BUG_ON(N_EXCEPTION_STACKS != 4);
begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
/*

View File

@ -12,7 +12,7 @@
#include <asm/frame.h>
.code64
.section .entry.text, "ax"
.section .text, "ax"
#ifdef CONFIG_FRAME_POINTER
/* Save parent and function stack frames (rip and rbp) */

View File

@ -29,15 +29,16 @@
#ifdef CONFIG_PARAVIRT_XXL
#include <asm/asm-offsets.h>
#include <asm/paravirt.h>
#define GET_CR2_INTO(reg) GET_CR2_INTO_AX ; _ASM_MOV %_ASM_AX, reg
#else
#define INTERRUPT_RETURN iretq
#define GET_CR2_INTO(reg) _ASM_MOV %cr2, reg
#endif
/* we are not able to switch in one step to the final KERNEL ADDRESS SPACE
/*
* We are not able to switch in one step to the final KERNEL ADDRESS SPACE
* because we need identity-mapped pages.
*
*/
#define l4_index(x) (((x) >> 39) & 511)
#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))

View File

@ -32,6 +32,8 @@
#include <asm/processor.h>
#include <asm/debugreg.h>
#include <asm/user.h>
#include <asm/desc.h>
#include <asm/tlbflush.h>
/* Per cpu debug control register value */
DEFINE_PER_CPU(unsigned long, cpu_dr7);
@ -97,6 +99,8 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
unsigned long *dr7;
int i;
lockdep_assert_irqs_disabled();
for (i = 0; i < HBP_NUM; i++) {
struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
@ -115,6 +119,12 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
dr7 = this_cpu_ptr(&cpu_dr7);
*dr7 |= encode_dr7(i, info->len, info->type);
/*
* Ensure we first write cpu_dr7 before we set the DR7 register.
* This ensures an NMI never see cpu_dr7 0 when DR7 is not.
*/
barrier();
set_debugreg(*dr7, 7);
if (info->mask)
set_dr_addr_mask(info->mask, i);
@ -134,9 +144,11 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
void arch_uninstall_hw_breakpoint(struct perf_event *bp)
{
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned long *dr7;
unsigned long dr7;
int i;
lockdep_assert_irqs_disabled();
for (i = 0; i < HBP_NUM; i++) {
struct perf_event **slot = this_cpu_ptr(&bp_per_reg[i]);
@ -149,12 +161,20 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp)
if (WARN_ONCE(i == HBP_NUM, "Can't find any breakpoint slot"))
return;
dr7 = this_cpu_ptr(&cpu_dr7);
*dr7 &= ~__encode_dr7(i, info->len, info->type);
dr7 = this_cpu_read(cpu_dr7);
dr7 &= ~__encode_dr7(i, info->len, info->type);
set_debugreg(*dr7, 7);
set_debugreg(dr7, 7);
if (info->mask)
set_dr_addr_mask(0, i);
/*
* Ensure the write to cpu_dr7 is after we've set the DR7 register.
* This ensures an NMI never see cpu_dr7 0 when DR7 is not.
*/
barrier();
this_cpu_write(cpu_dr7, dr7);
}
static int arch_bp_generic_len(int x86_len)
@ -227,10 +247,76 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
}
/*
* Checks whether the range [addr, end], overlaps the area [base, base + size).
*/
static inline bool within_area(unsigned long addr, unsigned long end,
unsigned long base, unsigned long size)
{
return end >= base && addr < (base + size);
}
/*
* Checks whether the range from addr to end, inclusive, overlaps the fixed
* mapped CPU entry area range or other ranges used for CPU entry.
*/
static inline bool within_cpu_entry(unsigned long addr, unsigned long end)
{
int cpu;
/* CPU entry erea is always used for CPU entry */
if (within_area(addr, end, CPU_ENTRY_AREA_BASE,
CPU_ENTRY_AREA_TOTAL_SIZE))
return true;
for_each_possible_cpu(cpu) {
/* The original rw GDT is being used after load_direct_gdt() */
if (within_area(addr, end, (unsigned long)get_cpu_gdt_rw(cpu),
GDT_SIZE))
return true;
/*
* cpu_tss_rw is not directly referenced by hardware, but
* cpu_tss_rw is also used in CPU entry code,
*/
if (within_area(addr, end,
(unsigned long)&per_cpu(cpu_tss_rw, cpu),
sizeof(struct tss_struct)))
return true;
/*
* cpu_tlbstate.user_pcid_flush_mask is used for CPU entry.
* If a data breakpoint on it, it will cause an unwanted #DB.
* Protect the full cpu_tlbstate structure to be sure.
*/
if (within_area(addr, end,
(unsigned long)&per_cpu(cpu_tlbstate, cpu),
sizeof(struct tlb_state)))
return true;
}
return false;
}
static int arch_build_bp_info(struct perf_event *bp,
const struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw)
{
unsigned long bp_end;
bp_end = attr->bp_addr + attr->bp_len - 1;
if (bp_end < attr->bp_addr)
return -EINVAL;
/*
* Prevent any breakpoint of any type that overlaps the CPU
* entry area and data. This protects the IST stacks and also
* reduces the chance that we ever find out what happens if
* there's a data breakpoint on the GDT, IDT, or TSS.
*/
if (within_cpu_entry(attr->bp_addr, bp_end))
return -EINVAL;
hw->address = attr->bp_addr;
hw->mask = 0;
@ -439,7 +525,7 @@ static int hw_breakpoint_handler(struct die_args *args)
{
int i, cpu, rc = NOTIFY_STOP;
struct perf_event *bp;
unsigned long dr7, dr6;
unsigned long dr6;
unsigned long *dr6_p;
/* The DR6 value is pointed by args->err */
@ -454,9 +540,6 @@ static int hw_breakpoint_handler(struct die_args *args)
if ((dr6 & DR_TRAP_BITS) == 0)
return NOTIFY_DONE;
get_debugreg(dr7, 7);
/* Disable breakpoints during exception handling */
set_debugreg(0UL, 7);
/*
* Assert that local interrupts are disabled
* Reset the DRn bits in the virtualized register value.
@ -513,7 +596,6 @@ static int hw_breakpoint_handler(struct die_args *args)
(dr6 & (~DR_TRAP_BITS)))
rc = NOTIFY_DONE;
set_debugreg(dr7, 7);
put_cpu();
return rc;

View File

@ -4,6 +4,8 @@
*/
#include <linux/interrupt.h>
#include <asm/cpu_entry_area.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
#include <asm/proto.h>
#include <asm/desc.h>
@ -51,15 +53,23 @@ struct idt_data {
#define TSKG(_vector, _gdt) \
G(_vector, NULL, DEFAULT_STACK, GATE_TASK, DPL0, _gdt << 3)
#define IDT_TABLE_SIZE (IDT_ENTRIES * sizeof(gate_desc))
static bool idt_setup_done __initdata;
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_idts[] = {
INTG(X86_TRAP_DB, debug),
SYSG(X86_TRAP_BP, int3),
INTG(X86_TRAP_DB, asm_exc_debug),
SYSG(X86_TRAP_BP, asm_exc_int3),
#ifdef CONFIG_X86_32
INTG(X86_TRAP_PF, page_fault),
/*
* Not possible on 64-bit. See idt_setup_early_pf() for details.
*/
INTG(X86_TRAP_PF, asm_exc_page_fault),
#endif
};
@ -70,33 +80,33 @@ static const __initconst struct idt_data early_idts[] = {
* set up TSS.
*/
static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_DE, divide_error),
INTG(X86_TRAP_NMI, nmi),
INTG(X86_TRAP_BR, bounds),
INTG(X86_TRAP_UD, invalid_op),
INTG(X86_TRAP_NM, device_not_available),
INTG(X86_TRAP_OLD_MF, coprocessor_segment_overrun),
INTG(X86_TRAP_TS, invalid_TSS),
INTG(X86_TRAP_NP, segment_not_present),
INTG(X86_TRAP_SS, stack_segment),
INTG(X86_TRAP_GP, general_protection),
INTG(X86_TRAP_SPURIOUS, spurious_interrupt_bug),
INTG(X86_TRAP_MF, coprocessor_error),
INTG(X86_TRAP_AC, alignment_check),
INTG(X86_TRAP_XF, simd_coprocessor_error),
INTG(X86_TRAP_DE, asm_exc_divide_error),
INTG(X86_TRAP_NMI, asm_exc_nmi),
INTG(X86_TRAP_BR, asm_exc_bounds),
INTG(X86_TRAP_UD, asm_exc_invalid_op),
INTG(X86_TRAP_NM, asm_exc_device_not_available),
INTG(X86_TRAP_OLD_MF, asm_exc_coproc_segment_overrun),
INTG(X86_TRAP_TS, asm_exc_invalid_tss),
INTG(X86_TRAP_NP, asm_exc_segment_not_present),
INTG(X86_TRAP_SS, asm_exc_stack_segment),
INTG(X86_TRAP_GP, asm_exc_general_protection),
INTG(X86_TRAP_SPURIOUS, asm_exc_spurious_interrupt_bug),
INTG(X86_TRAP_MF, asm_exc_coprocessor_error),
INTG(X86_TRAP_AC, asm_exc_alignment_check),
INTG(X86_TRAP_XF, asm_exc_simd_coprocessor_error),
#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
#else
INTG(X86_TRAP_DF, double_fault),
INTG(X86_TRAP_DF, asm_exc_double_fault),
#endif
INTG(X86_TRAP_DB, debug),
INTG(X86_TRAP_DB, asm_exc_debug),
#ifdef CONFIG_X86_MCE
INTG(X86_TRAP_MC, &machine_check),
INTG(X86_TRAP_MC, asm_exc_machine_check),
#endif
SYSG(X86_TRAP_OF, overflow),
SYSG(X86_TRAP_OF, asm_exc_overflow),
#if defined(CONFIG_IA32_EMULATION)
SYSG(IA32_SYSCALL_VECTOR, entry_INT80_compat),
#elif defined(CONFIG_X86_32)
@ -109,95 +119,63 @@ static const __initconst struct idt_data def_idts[] = {
*/
static const __initconst struct idt_data apic_idts[] = {
#ifdef CONFIG_SMP
INTG(RESCHEDULE_VECTOR, reschedule_interrupt),
INTG(CALL_FUNCTION_VECTOR, call_function_interrupt),
INTG(CALL_FUNCTION_SINGLE_VECTOR, call_function_single_interrupt),
INTG(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt),
INTG(REBOOT_VECTOR, reboot_interrupt),
INTG(RESCHEDULE_VECTOR, asm_sysvec_reschedule_ipi),
INTG(CALL_FUNCTION_VECTOR, asm_sysvec_call_function),
INTG(CALL_FUNCTION_SINGLE_VECTOR, asm_sysvec_call_function_single),
INTG(IRQ_MOVE_CLEANUP_VECTOR, asm_sysvec_irq_move_cleanup),
INTG(REBOOT_VECTOR, asm_sysvec_reboot),
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
INTG(THERMAL_APIC_VECTOR, thermal_interrupt),
INTG(THERMAL_APIC_VECTOR, asm_sysvec_thermal),
#endif
#ifdef CONFIG_X86_MCE_THRESHOLD
INTG(THRESHOLD_APIC_VECTOR, threshold_interrupt),
INTG(THRESHOLD_APIC_VECTOR, asm_sysvec_threshold),
#endif
#ifdef CONFIG_X86_MCE_AMD
INTG(DEFERRED_ERROR_VECTOR, deferred_error_interrupt),
INTG(DEFERRED_ERROR_VECTOR, asm_sysvec_deferred_error),
#endif
#ifdef CONFIG_X86_LOCAL_APIC
INTG(LOCAL_TIMER_VECTOR, apic_timer_interrupt),
INTG(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi),
INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt),
INTG(X86_PLATFORM_IPI_VECTOR, asm_sysvec_x86_platform_ipi),
# ifdef CONFIG_HAVE_KVM
INTG(POSTED_INTR_VECTOR, kvm_posted_intr_ipi),
INTG(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi),
INTG(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi),
INTG(POSTED_INTR_VECTOR, asm_sysvec_kvm_posted_intr_ipi),
INTG(POSTED_INTR_WAKEUP_VECTOR, asm_sysvec_kvm_posted_intr_wakeup_ipi),
INTG(POSTED_INTR_NESTED_VECTOR, asm_sysvec_kvm_posted_intr_nested_ipi),
# endif
# ifdef CONFIG_IRQ_WORK
INTG(IRQ_WORK_VECTOR, irq_work_interrupt),
INTG(IRQ_WORK_VECTOR, asm_sysvec_irq_work),
# endif
#ifdef CONFIG_X86_UV
INTG(UV_BAU_MESSAGE, uv_bau_message_intr1),
#endif
INTG(SPURIOUS_APIC_VECTOR, spurious_interrupt),
INTG(ERROR_APIC_VECTOR, error_interrupt),
# ifdef CONFIG_X86_UV
INTG(UV_BAU_MESSAGE, asm_sysvec_uv_bau_message),
# endif
INTG(SPURIOUS_APIC_VECTOR, asm_sysvec_spurious_apic_interrupt),
INTG(ERROR_APIC_VECTOR, asm_sysvec_error_interrupt),
#endif
};
#ifdef CONFIG_X86_64
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, page_fault),
};
/*
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
static const __initconst struct idt_data dbg_idts[] = {
INTG(X86_TRAP_DB, debug),
};
#endif
/* Must be page-aligned because the real IDT is used in a fixmap. */
gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
/* Must be page-aligned because the real IDT is used in the cpu entry area */
static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
struct desc_ptr idt_descr __ro_after_init = {
.size = (IDT_ENTRIES * 2 * sizeof(unsigned long)) - 1,
.size = IDT_TABLE_SIZE - 1,
.address = (unsigned long) idt_table,
};
#ifdef CONFIG_X86_64
/* No need to be aligned, but done to keep all IDTs defined the same way. */
gate_desc debug_idt_table[IDT_ENTRIES] __page_aligned_bss;
void load_current_idt(void)
{
lockdep_assert_irqs_disabled();
load_idt(&idt_descr);
}
/*
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, debug, IST_INDEX_DB),
ISTG(X86_TRAP_NMI, nmi, IST_INDEX_NMI),
ISTG(X86_TRAP_DF, double_fault, IST_INDEX_DF),
#ifdef CONFIG_X86_MCE
ISTG(X86_TRAP_MC, &machine_check, IST_INDEX_MCE),
#endif
};
/*
* Override for the debug_idt. Same as the default, but with interrupt
* stack set to DEFAULT_STACK (0). Required for NMI trap handling.
*/
const struct desc_ptr debug_idt_descr = {
.size = IDT_ENTRIES * 16 - 1,
.address = (unsigned long) debug_idt_table,
};
#ifdef CONFIG_X86_F00F_BUG
bool idt_is_f00f_address(unsigned long address)
{
return ((address - idt_descr.address) >> 3) == 6;
}
#endif
static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
@ -214,7 +192,7 @@ static inline void idt_init_desc(gate_desc *gate, const struct idt_data *d)
#endif
}
static void
static __init void
idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sys)
{
gate_desc desc;
@ -227,7 +205,7 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
}
}
static void set_intr_gate(unsigned int n, const void *addr)
static __init void set_intr_gate(unsigned int n, const void *addr)
{
struct idt_data data;
@ -266,6 +244,27 @@ void __init idt_setup_traps(void)
}
#ifdef CONFIG_X86_64
/*
* Early traps running on the DEFAULT_STACK because the other interrupt
* stacks work only after cpu_init().
*/
static const __initconst struct idt_data early_pf_idts[] = {
INTG(X86_TRAP_PF, asm_exc_page_fault),
};
/*
* The exceptions which use Interrupt stacks. They are setup after
* cpu_init() when the TSS has been initialized.
*/
static const __initconst struct idt_data ist_idts[] = {
ISTG(X86_TRAP_DB, asm_exc_debug, IST_INDEX_DB),
ISTG(X86_TRAP_NMI, asm_exc_nmi, IST_INDEX_NMI),
ISTG(X86_TRAP_DF, asm_exc_double_fault, IST_INDEX_DF),
#ifdef CONFIG_X86_MCE
ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE),
#endif
};
/**
* idt_setup_early_pf - Initialize the idt table with early pagefault handler
*
@ -273,8 +272,10 @@ void __init idt_setup_traps(void)
* cpu_init() is invoked and sets up TSS. The IST variant is installed
* after that.
*
* FIXME: Why is 32bit and 64bit installing the PF handler at different
* places in the early setup code?
* Note, that X86_64 cannot install the real #PF handler in
* idt_setup_early_traps() because the memory intialization needs the #PF
* handler from the early_idt_handler_array to initialize the early page
* tables.
*/
void __init idt_setup_early_pf(void)
{
@ -289,18 +290,21 @@ void __init idt_setup_ist_traps(void)
{
idt_setup_from_table(idt_table, ist_idts, ARRAY_SIZE(ist_idts), true);
}
/**
* idt_setup_debugidt_traps - Initialize the debug idt table with debug traps
*/
void __init idt_setup_debugidt_traps(void)
{
memcpy(&debug_idt_table, &idt_table, IDT_ENTRIES * 16);
idt_setup_from_table(debug_idt_table, dbg_idts, ARRAY_SIZE(dbg_idts), false);
}
#endif
static void __init idt_map_in_cea(void)
{
/*
* Set the IDT descriptor to a fixed read-only location in the cpu
* entry area, so that the "sidt" instruction will not leak the
* location of the kernel, and to defend the IDT against arbitrary
* memory write vulnerabilities.
*/
cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
PAGE_KERNEL_RO);
idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
}
/**
* idt_setup_apic_and_irq_gates - Setup APIC/SMP and normal interrupt gates
*/
@ -318,11 +322,23 @@ void __init idt_setup_apic_and_irq_gates(void)
#ifdef CONFIG_X86_LOCAL_APIC
for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
set_bit(i, system_vectors);
/*
* Don't set the non assigned system vectors in the
* system_vectors bitmap. Otherwise they show up in
* /proc/interrupts.
*/
entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
set_intr_gate(i, entry);
}
#endif
/* Map IDT into CPU entry area and reload it. */
idt_map_in_cea();
load_idt(&idt_descr);
/* Make the IDT table read only */
set_memory_ro((unsigned long)&idt_table, 1);
idt_setup_done = true;
}
/**
@ -352,16 +368,14 @@ void idt_invalidate(void *addr)
load_idt(&idt);
}
void __init update_intr_gate(unsigned int n, const void *addr)
void __init alloc_intr_gate(unsigned int n, const void *addr)
{
if (WARN_ON_ONCE(!test_bit(n, system_vectors)))
if (WARN_ON(n < FIRST_SYSTEM_VECTOR))
return;
set_intr_gate(n, addr);
}
void alloc_intr_gate(unsigned int n, const void *addr)
{
BUG_ON(n < FIRST_SYSTEM_VECTOR);
if (!test_and_set_bit(n, system_vectors))
if (WARN_ON(idt_setup_done))
return;
if (!WARN_ON(test_and_set_bit(n, system_vectors)))
set_intr_gate(n, addr);
}

View File

@ -13,12 +13,14 @@
#include <linux/export.h>
#include <linux/irq.h>
#include <asm/irq_stack.h>
#include <asm/apic.h>
#include <asm/io_apic.h>
#include <asm/irq.h>
#include <asm/mce.h>
#include <asm/hw_irq.h>
#include <asm/desc.h>
#include <asm/traps.h>
#define CREATE_TRACE_POINTS
#include <asm/trace/irq_vectors.h>
@ -26,9 +28,6 @@
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
DEFINE_PER_CPU(struct pt_regs *, irq_regs);
EXPORT_PER_CPU_SYMBOL(irq_regs);
atomic_t irq_err_count;
/*
@ -224,35 +223,35 @@ u64 arch_irq_stat(void)
return sum;
}
static __always_inline void handle_irq(struct irq_desc *desc,
struct pt_regs *regs)
{
if (IS_ENABLED(CONFIG_X86_64))
run_on_irqstack_cond(desc->handle_irq, desc, regs);
else
__handle_irq(desc, regs);
}
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
* handlers).
* common_interrupt() handles all normal device IRQ's (the special SMP
* cross-CPU interrupts have their own entry points).
*/
__visible void __irq_entry do_IRQ(struct pt_regs *regs)
DEFINE_IDTENTRY_IRQ(common_interrupt)
{
struct pt_regs *old_regs = set_irq_regs(regs);
struct irq_desc * desc;
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
struct irq_desc *desc;
entering_irq();
/* entering_irq() tells RCU that we're not quiescent. Check it. */
/* entry code tells RCU that we're not quiescent. Check it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");
desc = __this_cpu_read(vector_irq[vector]);
if (likely(!IS_ERR_OR_NULL(desc))) {
if (IS_ENABLED(CONFIG_X86_32))
handle_irq(desc, regs);
else
generic_handle_irq_desc(desc);
handle_irq(desc, regs);
} else {
ack_APIC_irq();
if (desc == VECTOR_UNUSED) {
pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
__func__, smp_processor_id(),
vector);
} else {
@ -260,8 +259,6 @@ __visible void __irq_entry do_IRQ(struct pt_regs *regs)
}
}
exiting_irq();
set_irq_regs(old_regs);
}
@ -271,17 +268,16 @@ void (*x86_platform_ipi_callback)(void) = NULL;
/*
* Handler for X86_PLATFORM_IPI_VECTOR.
*/
__visible void __irq_entry smp_x86_platform_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_x86_platform_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
trace_x86_platform_ipi_entry(X86_PLATFORM_IPI_VECTOR);
inc_irq_stat(x86_platform_ipis);
if (x86_platform_ipi_callback)
x86_platform_ipi_callback();
trace_x86_platform_ipi_exit(X86_PLATFORM_IPI_VECTOR);
exiting_irq();
set_irq_regs(old_regs);
}
#endif
@ -302,41 +298,29 @@ EXPORT_SYMBOL_GPL(kvm_set_posted_intr_wakeup_handler);
/*
* Handler for POSTED_INTERRUPT_VECTOR.
*/
__visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_ipis);
exiting_irq();
set_irq_regs(old_regs);
}
/*
* Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
*/
__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_posted_intr_wakeup_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_wakeup_ipis);
kvm_posted_intr_wakeup_handler();
exiting_irq();
set_irq_regs(old_regs);
}
/*
* Handler for POSTED_INTERRUPT_NESTED_VECTOR.
*/
__visible void smp_kvm_posted_intr_nested_ipi(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi)
{
struct pt_regs *old_regs = set_irq_regs(regs);
entering_ack_irq();
ack_APIC_irq();
inc_irq_stat(kvm_posted_intr_nested_ipis);
exiting_irq();
set_irq_regs(old_regs);
}
#endif

View File

@ -148,7 +148,7 @@ void do_softirq_own_stack(void)
call_on_stack(__do_softirq, isp);
}
void handle_irq(struct irq_desc *desc, struct pt_regs *regs)
void __handle_irq(struct irq_desc *desc, struct pt_regs *regs)
{
int overflow = check_stack_overflow();

View File

@ -20,6 +20,7 @@
#include <linux/sched/task_stack.h>
#include <asm/cpu_entry_area.h>
#include <asm/irq_stack.h>
#include <asm/io_apic.h>
#include <asm/apic.h>
@ -70,3 +71,8 @@ int irq_init_percpu_irqstack(unsigned int cpu)
return 0;
return map_irq_stack(cpu);
}
void do_softirq_own_stack(void)
{
run_on_irqstack_cond(__do_softirq, NULL, NULL);
}

View File

@ -9,18 +9,18 @@
#include <linux/irq_work.h>
#include <linux/hardirq.h>
#include <asm/apic.h>
#include <asm/idtentry.h>
#include <asm/trace/irq_vectors.h>
#include <linux/interrupt.h>
#ifdef CONFIG_X86_LOCAL_APIC
__visible void __irq_entry smp_irq_work_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_irq_work)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_irq_work_entry(IRQ_WORK_VECTOR);
inc_irq_stat(apic_irq_work_irqs);
irq_work_run();
trace_irq_work_exit(IRQ_WORK_VECTOR);
exiting_irq();
}
void arch_irq_work_raise(void)

View File

@ -1073,13 +1073,6 @@ NOKPROBE_SYMBOL(kprobe_fault_handler);
int __init arch_populate_kprobe_blacklist(void)
{
int ret;
ret = kprobe_add_area_blacklist((unsigned long)__irqentry_text_start,
(unsigned long)__irqentry_text_end);
if (ret)
return ret;
return kprobe_add_area_blacklist((unsigned long)__entry_text_start,
(unsigned long)__entry_text_end);
}

View File

@ -286,9 +286,7 @@ static int can_optimize(unsigned long paddr)
* stack handling and registers setup.
*/
if (((paddr >= (unsigned long)__entry_text_start) &&
(paddr < (unsigned long)__entry_text_end)) ||
((paddr >= (unsigned long)__irqentry_text_start) &&
(paddr < (unsigned long)__irqentry_text_end)))
(paddr < (unsigned long)__entry_text_end)))
return 0;
/* Check there is enough space for a relative jump. */

View File

@ -217,7 +217,7 @@ again:
}
EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
u32 kvm_read_and_reset_apf_flags(void)
noinstr u32 kvm_read_and_reset_apf_flags(void)
{
u32 flags = 0;
@ -229,11 +229,11 @@ u32 kvm_read_and_reset_apf_flags(void)
return flags;
}
EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
{
u32 reason = kvm_read_and_reset_apf_flags();
bool rcu_exit;
switch (reason) {
case KVM_PV_REASON_PAGE_NOT_PRESENT:
@ -243,6 +243,9 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
return false;
}
rcu_exit = idtentry_enter_cond_rcu(regs);
instrumentation_begin();
/*
* If the host managed to inject an async #PF into an interrupt
* disabled region, then die hard as this is not going to end well
@ -257,13 +260,13 @@ bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
/* Page is swapped out by the host. */
kvm_async_pf_task_wait_schedule(token);
} else {
rcu_irq_enter();
kvm_async_pf_task_wake(token);
rcu_irq_exit();
}
instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit);
return true;
}
NOKPROBE_SYMBOL(__kvm_handle_async_pf);
static void __init paravirt_ops_setup(void)
{

View File

@ -303,7 +303,7 @@ NOKPROBE_SYMBOL(unknown_nmi_error);
static DEFINE_PER_CPU(bool, swallow_nmi);
static DEFINE_PER_CPU(unsigned long, last_nmi_rip);
static void default_do_nmi(struct pt_regs *regs)
static noinstr void default_do_nmi(struct pt_regs *regs)
{
unsigned char reason = 0;
int handled;
@ -329,6 +329,9 @@ static void default_do_nmi(struct pt_regs *regs)
__this_cpu_write(last_nmi_rip, regs->ip);
instrumentation_begin();
trace_hardirqs_off_finish();
handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
if (handled) {
@ -342,7 +345,7 @@ static void default_do_nmi(struct pt_regs *regs)
*/
if (handled > 1)
__this_cpu_write(swallow_nmi, true);
return;
goto out;
}
/*
@ -374,7 +377,7 @@ static void default_do_nmi(struct pt_regs *regs)
#endif
__this_cpu_add(nmi_stats.external, 1);
raw_spin_unlock(&nmi_reason_lock);
return;
goto out;
}
raw_spin_unlock(&nmi_reason_lock);
@ -412,8 +415,12 @@ static void default_do_nmi(struct pt_regs *regs)
__this_cpu_add(nmi_stats.swallow, 1);
else
unknown_nmi_error(reason, regs);
out:
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
}
NOKPROBE_SYMBOL(default_do_nmi);
/*
* NMIs can page fault or hit breakpoints which will cause it to lose
@ -467,44 +474,9 @@ enum nmi_states {
};
static DEFINE_PER_CPU(enum nmi_states, nmi_state);
static DEFINE_PER_CPU(unsigned long, nmi_cr2);
static DEFINE_PER_CPU(unsigned long, nmi_dr7);
#ifdef CONFIG_X86_64
/*
* In x86_64, we need to handle breakpoint -> NMI -> breakpoint. Without
* some care, the inner breakpoint will clobber the outer breakpoint's
* stack.
*
* If a breakpoint is being processed, and the debug stack is being
* used, if an NMI comes in and also hits a breakpoint, the stack
* pointer will be set to the same fixed address as the breakpoint that
* was interrupted, causing that stack to be corrupted. To handle this
* case, check if the stack that was interrupted is the debug stack, and
* if so, change the IDT so that new breakpoints will use the current
* stack and not switch to the fixed address. On return of the NMI,
* switch back to the original IDT.
*/
static DEFINE_PER_CPU(int, update_debug_stack);
static bool notrace is_debug_stack(unsigned long addr)
{
struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
unsigned long top = CEA_ESTACK_TOP(cs, DB);
unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
if (__this_cpu_read(debug_stack_usage))
return true;
/*
* Note, this covers the guard page between DB and DB1 as well to
* avoid two checks. But by all means @addr can never point into
* the guard page.
*/
return addr >= bot && addr < top;
}
NOKPROBE_SYMBOL(is_debug_stack);
#endif
dotraplinkage notrace void
do_nmi(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_RAW(exc_nmi)
{
if (IS_ENABLED(CONFIG_SMP) && cpu_is_offline(smp_processor_id()))
return;
@ -517,18 +489,7 @@ do_nmi(struct pt_regs *regs, long error_code)
this_cpu_write(nmi_cr2, read_cr2());
nmi_restart:
#ifdef CONFIG_X86_64
/*
* If we interrupted a breakpoint, it is possible that
* the nmi handler will have breakpoints too. We need to
* change the IDT such that breakpoints that happen here
* continue to use the NMI stack.
*/
if (unlikely(is_debug_stack(regs->sp))) {
debug_stack_set_zero();
this_cpu_write(update_debug_stack, 1);
}
#endif
this_cpu_write(nmi_dr7, local_db_save());
nmi_enter();
@ -539,12 +500,7 @@ nmi_restart:
nmi_exit();
#ifdef CONFIG_X86_64
if (unlikely(this_cpu_read(update_debug_stack))) {
debug_stack_reset();
this_cpu_write(update_debug_stack, 0);
}
#endif
local_db_restore(this_cpu_read(nmi_dr7));
if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
write_cr2(this_cpu_read(nmi_cr2));
@ -554,7 +510,6 @@ nmi_restart:
if (user_mode(regs))
mds_user_clear_cpu_buffers();
}
NOKPROBE_SYMBOL(do_nmi);
void stop_nmi(void)
{

View File

@ -27,6 +27,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/apic.h>
#include <asm/idtentry.h>
#include <asm/nmi.h>
#include <asm/mce.h>
#include <asm/trace/irq_vectors.h>
@ -130,13 +131,11 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
/*
* this function calls the 'stop' function on all other CPUs in the system.
*/
asmlinkage __visible void smp_reboot_interrupt(void)
DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
{
ipi_entering_ack_irq();
ack_APIC_irq();
cpu_emergency_vmxoff();
stop_this_cpu(NULL);
irq_exit();
}
static int register_stop_handler(void)
@ -221,47 +220,33 @@ static void native_stop_other_cpus(int wait)
/*
* Reschedule call back. KVM uses this interrupt to force a cpu out of
* guest mode
* guest mode.
*/
__visible void __irq_entry smp_reschedule_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi)
{
ack_APIC_irq();
trace_reschedule_entry(RESCHEDULE_VECTOR);
inc_irq_stat(irq_resched_count);
kvm_set_cpu_l1tf_flush_l1d();
if (trace_resched_ipi_enabled()) {
/*
* scheduler_ipi() might call irq_enter() as well, but
* nested calls are fine.
*/
irq_enter();
trace_reschedule_entry(RESCHEDULE_VECTOR);
scheduler_ipi();
trace_reschedule_exit(RESCHEDULE_VECTOR);
irq_exit();
return;
}
scheduler_ipi();
trace_reschedule_exit(RESCHEDULE_VECTOR);
}
__visible void __irq_entry smp_call_function_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_call_function)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_call_function_entry(CALL_FUNCTION_VECTOR);
inc_irq_stat(irq_call_count);
generic_smp_call_function_interrupt();
trace_call_function_exit(CALL_FUNCTION_VECTOR);
exiting_irq();
}
__visible void __irq_entry smp_call_function_single_interrupt(struct pt_regs *r)
DEFINE_IDTENTRY_SYSVEC(sysvec_call_function_single)
{
ipi_entering_ack_irq();
ack_APIC_irq();
trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
inc_irq_stat(irq_call_count);
generic_smp_call_function_single_interrupt();
trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
exiting_irq();
}
static int __init nonmi_ipi_setup(char *str)

View File

@ -25,20 +25,3 @@ void trace_pagefault_unreg(void)
{
static_branch_dec(&trace_pagefault_key);
}
#ifdef CONFIG_SMP
DEFINE_STATIC_KEY_FALSE(trace_resched_ipi_key);
int trace_resched_ipi_reg(void)
{
static_branch_inc(&trace_resched_ipi_key);
return 0;
}
void trace_resched_ipi_unreg(void)
{
static_branch_dec(&trace_resched_ipi_key);
}
#endif

View File

@ -97,24 +97,6 @@ int is_valid_bugaddr(unsigned long addr)
return ud == INSN_UD0 || ud == INSN_UD2;
}
int fixup_bug(struct pt_regs *regs, int trapnr)
{
if (trapnr != X86_TRAP_UD)
return 0;
switch (report_bug(regs->ip, regs)) {
case BUG_TRAP_TYPE_NONE:
case BUG_TRAP_TYPE_BUG:
break;
case BUG_TRAP_TYPE_WARN:
regs->ip += LEN_UD2;
return 1;
}
return 0;
}
static nokprobe_inline int
do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
struct pt_regs *regs, long error_code)
@ -145,7 +127,7 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
* process no chance to handle the signal and notice the
* kernel fault information, so that won't result in polluting
* the information about previously queued, but not yet
* delivered, faults. See also do_general_protection below.
* delivered, faults. See also exc_general_protection below.
*/
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
@ -190,42 +172,120 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
/*
* WARN*()s end up here; fix them up before we call the
* notifier chain.
*/
if (!user_mode(regs) && fixup_bug(regs, trapnr))
return;
if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
cond_local_irq_enable(regs);
do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
cond_local_irq_disable(regs);
}
}
#define IP ((void __user *)uprobe_get_trap_addr(regs))
#define DO_ERROR(trapnr, signr, sicode, addr, str, name) \
dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
{ \
do_error_trap(regs, error_code, str, trapnr, signr, sicode, addr); \
/*
* Posix requires to provide the address of the faulting instruction for
* SIGILL (#UD) and SIGFPE (#DE) in the si_addr member of siginfo_t.
*
* This address is usually regs->ip, but when an uprobe moved the code out
* of line then regs->ip points to the XOL code which would confuse
* anything which analyzes the fault address vs. the unmodified binary. If
* a trap happened in XOL code then uprobe maps regs->ip back to the
* original instruction address.
*/
static __always_inline void __user *error_get_trap_addr(struct pt_regs *regs)
{
return (void __user *)uprobe_get_trap_addr(regs);
}
DO_ERROR(X86_TRAP_DE, SIGFPE, FPE_INTDIV, IP, "divide error", divide_error)
DO_ERROR(X86_TRAP_OF, SIGSEGV, 0, NULL, "overflow", overflow)
DO_ERROR(X86_TRAP_UD, SIGILL, ILL_ILLOPN, IP, "invalid opcode", invalid_op)
DO_ERROR(X86_TRAP_OLD_MF, SIGFPE, 0, NULL, "coprocessor segment overrun", coprocessor_segment_overrun)
DO_ERROR(X86_TRAP_TS, SIGSEGV, 0, NULL, "invalid TSS", invalid_TSS)
DO_ERROR(X86_TRAP_NP, SIGBUS, 0, NULL, "segment not present", segment_not_present)
DO_ERROR(X86_TRAP_SS, SIGBUS, 0, NULL, "stack segment", stack_segment)
#undef IP
DEFINE_IDTENTRY(exc_divide_error)
{
do_error_trap(regs, 0, "divide_error", X86_TRAP_DE, SIGFPE,
FPE_INTDIV, error_get_trap_addr(regs));
}
dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_overflow)
{
do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL);
}
#ifdef CONFIG_X86_F00F_BUG
void handle_invalid_op(struct pt_regs *regs)
#else
static inline void handle_invalid_op(struct pt_regs *regs)
#endif
{
do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
ILL_ILLOPN, error_get_trap_addr(regs));
}
DEFINE_IDTENTRY_RAW(exc_invalid_op)
{
bool rcu_exit;
/*
* Handle BUG/WARN like NMIs instead of like normal idtentries:
* if we bugged/warned in a bad RCU context, for example, the last
* thing we want is to BUG/WARN again in the idtentry code, ad
* infinitum.
*/
if (!user_mode(regs) && is_valid_bugaddr(regs->ip)) {
enum bug_trap_type type;
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
type = report_bug(regs->ip, regs);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
if (type == BUG_TRAP_TYPE_WARN) {
/* Skip the ud2. */
regs->ip += LEN_UD2;
return;
}
/*
* Else, if this was a BUG and report_bug returns or if this
* was just a normal #UD, we want to continue onward and
* crash.
*/
}
rcu_exit = idtentry_enter_cond_rcu(regs);
instrumentation_begin();
handle_invalid_op(regs);
instrumentation_end();
idtentry_exit_cond_rcu(regs, rcu_exit);
}
DEFINE_IDTENTRY(exc_coproc_segment_overrun)
{
do_error_trap(regs, 0, "coprocessor segment overrun",
X86_TRAP_OLD_MF, SIGFPE, 0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
{
do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
{
do_error_trap(regs, error_code, "segment not present", X86_TRAP_NP,
SIGBUS, 0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
{
do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
0, NULL);
}
DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
{
char *str = "alignment check";
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
return;
@ -271,12 +331,19 @@ __visible void __noreturn handle_stack_overflow(const char *message,
* from the TSS. Returning is, in principle, okay, but changes to regs will
* be lost. If, for some reason, we need to return to a context with modified
* regs, the shim code could be adjusted to synchronize the registers.
*
* The 32bit #DF shim provides CR2 already as an argument. On 64bit it needs
* to be read before doing anything else.
*/
dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsigned long cr2)
DEFINE_IDTENTRY_DF(exc_double_fault)
{
static const char str[] = "double fault";
struct task_struct *tsk = current;
#ifdef CONFIG_VMAP_STACK
unsigned long address = read_cr2();
#endif
#ifdef CONFIG_X86_ESPFIX64
extern unsigned char native_irq_return_iret[];
@ -299,6 +366,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
regs->ip == (unsigned long)native_irq_return_iret)
{
struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
unsigned long *p = (unsigned long *)regs->sp;
/*
* regs->sp points to the failing IRET frame on the
@ -306,7 +374,11 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* in gpregs->ss through gpregs->ip.
*
*/
memmove(&gpregs->ip, (void *)regs->sp, 5*8);
gpregs->ip = p[0];
gpregs->cs = p[1];
gpregs->flags = p[2];
gpregs->sp = p[3];
gpregs->ss = p[4];
gpregs->orig_ax = 0; /* Missing (lost) #GP error code */
/*
@ -320,7 +392,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* which is what the stub expects, given that the faulting
* RIP will be the IRET instruction.
*/
regs->ip = (unsigned long)general_protection;
regs->ip = (unsigned long)asm_exc_general_protection;
regs->sp = (unsigned long)&gpregs->orig_ax;
return;
@ -328,6 +400,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
#endif
nmi_enter();
instrumentation_begin();
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
tsk->thread.error_code = error_code;
@ -371,27 +444,31 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* stack even if the actual trigger for the double fault was
* something else.
*/
if ((unsigned long)task_stack_page(tsk) - 1 - cr2 < PAGE_SIZE)
handle_stack_overflow("kernel stack overflow (double-fault)", regs, cr2);
if ((unsigned long)task_stack_page(tsk) - 1 - address < PAGE_SIZE) {
handle_stack_overflow("kernel stack overflow (double-fault)",
regs, address);
}
#endif
pr_emerg("PANIC: double fault, error_code: 0x%lx\n", error_code);
die("double fault", regs, error_code);
panic("Machine halted.");
instrumentation_end();
}
dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_bounds)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
if (notify_die(DIE_TRAP, "bounds", regs, 0,
X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
return;
cond_local_irq_enable(regs);
if (!user_mode(regs))
die("bounds", regs, error_code);
die("bounds", regs, 0);
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, 0, 0, NULL);
cond_local_irq_disable(regs);
}
enum kernel_gp_hint {
@ -438,7 +515,7 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
#define GPFSTR "general protection fault"
dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
{
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
enum kernel_gp_hint hint = GP_NO_HINT;
@ -446,17 +523,17 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
unsigned long gp_addr;
int ret;
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs);
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
return;
goto exit;
}
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
local_irq_disable();
return;
}
@ -468,12 +545,11 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
force_sig(SIGSEGV);
return;
goto exit;
}
if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
return;
goto exit;
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
@ -485,11 +561,11 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
if (!preemptible() &&
kprobe_running() &&
kprobe_fault_handler(regs, X86_TRAP_GP))
return;
goto exit;
ret = notify_die(DIE_GPF, desc, regs, error_code, X86_TRAP_GP, SIGSEGV);
if (ret == NOTIFY_STOP)
return;
goto exit;
if (error_code)
snprintf(desc, sizeof(desc), "segment-related " GPFSTR);
@ -511,47 +587,74 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
die_addr(desc, regs, error_code, gp_addr);
exit:
cond_local_irq_disable(regs);
}
NOKPROBE_SYMBOL(do_general_protection);
dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
static bool do_int3(struct pt_regs *regs)
{
if (poke_int3_handler(regs))
return;
/*
* Unlike any other non-IST entry, we can be called from pretty much
* any location in the kernel through kprobes -- text_poke() will most
* likely be handled by poke_int3_handler() above. This means this
* handler is effectively NMI-like.
*/
if (!user_mode(regs))
nmi_enter();
int res;
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
goto exit;
if (kgdb_ll_trap(DIE_INT3, "int3", regs, 0, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
return true;
#endif /* CONFIG_KGDB_LOW_LEVEL_TRAP */
#ifdef CONFIG_KPROBES
if (kprobe_int3_handler(regs))
goto exit;
return true;
#endif
res = notify_die(DIE_INT3, "int3", regs, 0, X86_TRAP_BP, SIGTRAP);
if (notify_die(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
goto exit;
return res == NOTIFY_STOP;
}
static void do_int3_user(struct pt_regs *regs)
{
if (do_int3(regs))
return;
cond_local_irq_enable(regs);
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, 0, NULL);
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, 0, 0, NULL);
cond_local_irq_disable(regs);
exit:
if (!user_mode(regs))
nmi_exit();
}
NOKPROBE_SYMBOL(do_int3);
DEFINE_IDTENTRY_RAW(exc_int3)
{
/*
* poke_int3_handler() is completely self contained code; it does (and
* must) *NOT* call out to anything, lest it hits upon yet another
* INT3.
*/
if (poke_int3_handler(regs))
return;
/*
* idtentry_enter_user() uses static_branch_{,un}likely() and therefore
* can trigger INT3, hence poke_int3_handler() must be done
* before. If the entry came from kernel mode, then use nmi_enter()
* because the INT3 could have been hit in any context including
* NMI.
*/
if (user_mode(regs)) {
idtentry_enter_user(regs);
instrumentation_begin();
do_int3_user(regs);
instrumentation_end();
idtentry_exit_user(regs);
} else {
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
if (!do_int3(regs))
die("int3", regs, 0);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
}
#ifdef CONFIG_X86_64
/*
@ -559,21 +662,20 @@ NOKPROBE_SYMBOL(do_int3);
* to switch to the normal thread stack if the interrupted code was in
* user mode. The actual stack switch is done in entry_64.S
*/
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
{
struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1;
if (regs != eregs)
*regs = *eregs;
return regs;
}
NOKPROBE_SYMBOL(sync_regs);
struct bad_iret_stack {
void *error_entry_ret;
struct pt_regs regs;
};
asmlinkage __visible notrace
asmlinkage __visible noinstr
struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
{
/*
@ -584,19 +686,21 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
* just below the IRET frame) and we want to pretend that the
* exception came from the IRET target.
*/
struct bad_iret_stack *new_stack =
(struct bad_iret_stack *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
struct bad_iret_stack tmp, *new_stack =
(struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
/* Copy the IRET target to the new stack. */
memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8);
/* Copy the IRET target to the temporary storage. */
memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
/* Copy the remainder of the stack from the current stack. */
memmove(new_stack, s, offsetof(struct bad_iret_stack, regs.ip));
memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
/* Update the entry stack */
memcpy(new_stack, &tmp, sizeof(tmp));
BUG_ON(!user_mode(&new_stack->regs));
return new_stack;
}
NOKPROBE_SYMBOL(fixup_bad_iret);
#endif
static bool is_sysenter_singlestep(struct pt_regs *regs)
@ -622,6 +726,43 @@ static bool is_sysenter_singlestep(struct pt_regs *regs)
#endif
}
static __always_inline void debug_enter(unsigned long *dr6, unsigned long *dr7)
{
/*
* Disable breakpoints during exception handling; recursive exceptions
* are exceedingly 'fun'.
*
* Since this function is NOKPROBE, and that also applies to
* HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
* HW_BREAKPOINT_W on our stack)
*
* Entry text is excluded for HW_BP_X and cpu_entry_area, which
* includes the entry stack is excluded for everything.
*/
*dr7 = local_db_save();
/*
* The Intel SDM says:
*
* Certain debug exceptions may clear bits 0-3. The remaining
* contents of the DR6 register are never cleared by the
* processor. To avoid confusion in identifying debug
* exceptions, debug handlers should clear the register before
* returning to the interrupted task.
*
* Keep it simple: clear DR6 immediately.
*/
get_debugreg(*dr6, 6);
set_debugreg(0, 6);
/* Filter out all the reserved bits which are preset to 1 */
*dr6 &= ~DR6_RESERVED;
}
static __always_inline void debug_exit(unsigned long dr7)
{
local_db_restore(dr7);
}
/*
* Our handling of the processor debug registers is non-trivial.
* We do not clear them on entry and exit from the kernel. Therefore
@ -646,86 +787,54 @@ static bool is_sysenter_singlestep(struct pt_regs *regs)
*
* May run on IST stack.
*/
dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
{
struct task_struct *tsk = current;
int user_icebp = 0;
unsigned long dr6;
bool user_icebp;
int si_code;
nmi_enter();
get_debugreg(dr6, 6);
/*
* The Intel SDM says:
*
* Certain debug exceptions may clear bits 0-3. The remaining
* contents of the DR6 register are never cleared by the
* processor. To avoid confusion in identifying debug
* exceptions, debug handlers should clear the register before
* returning to the interrupted task.
*
* Keep it simple: clear DR6 immediately.
*/
set_debugreg(0, 6);
/* Filter out all the reserved bits which are preset to 1 */
dr6 &= ~DR6_RESERVED;
/*
* The SDM says "The processor clears the BTF flag when it
* generates a debug exception." Clear TIF_BLOCKSTEP to keep
* TIF_BLOCKSTEP in sync with the hardware BTF flag.
*/
clear_tsk_thread_flag(tsk, TIF_BLOCKSTEP);
clear_thread_flag(TIF_BLOCKSTEP);
if (unlikely(!user_mode(regs) && (dr6 & DR_STEP) &&
is_sysenter_singlestep(regs))) {
dr6 &= ~DR_STEP;
if (!dr6)
goto exit;
/*
* else we might have gotten a single-step trap and hit a
* watchpoint at the same time, in which case we should fall
* through and handle the watchpoint.
*/
}
/*
* If DR6 is zero, no point in trying to handle it. The kernel is
* not using INT1.
*/
if (!user && !dr6)
return;
/*
* If dr6 has no reason to give us about the origin of this trap,
* then it's very likely the result of an icebp/int01 trap.
* User wants a sigtrap for that.
*/
if (!dr6 && user_mode(regs))
user_icebp = 1;
user_icebp = user && !dr6;
/* Store the virtualized DR6 value */
tsk->thread.debugreg6 = dr6;
#ifdef CONFIG_KPROBES
if (kprobe_debug_handler(regs))
goto exit;
if (kprobe_debug_handler(regs)) {
return;
}
#endif
if (notify_die(DIE_DEBUG, "debug", regs, (long)&dr6, error_code,
SIGTRAP) == NOTIFY_STOP)
goto exit;
/*
* Let others (NMI) know that the debug stack is in use
* as we may switch to the interrupt stack.
*/
debug_stack_usage_inc();
if (notify_die(DIE_DEBUG, "debug", regs, (long)&dr6, 0,
SIGTRAP) == NOTIFY_STOP) {
return;
}
/* It's safe to allow irq's after DR6 has been saved */
cond_local_irq_enable(regs);
if (v8086_mode(regs)) {
handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
X86_TRAP_DB);
cond_local_irq_disable(regs);
debug_stack_usage_dec();
goto exit;
handle_vm86_trap((struct kernel_vm86_regs *) regs, 0,
X86_TRAP_DB);
goto out;
}
if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
@ -739,23 +848,91 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
regs->flags &= ~X86_EFLAGS_TF;
}
si_code = get_si_code(tsk->thread.debugreg6);
if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS) || user_icebp)
send_sigtrap(regs, error_code, si_code);
cond_local_irq_disable(regs);
debug_stack_usage_dec();
send_sigtrap(regs, 0, si_code);
exit:
out:
cond_local_irq_disable(regs);
}
static __always_inline void exc_debug_kernel(struct pt_regs *regs,
unsigned long dr6)
{
nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
/*
* Catch SYSENTER with TF set and clear DR_STEP. If this hit a
* watchpoint at the same time then that will still be handled.
*/
if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
dr6 &= ~DR_STEP;
handle_debug(regs, dr6, false);
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
nmi_exit();
}
NOKPROBE_SYMBOL(do_debug);
static __always_inline void exc_debug_user(struct pt_regs *regs,
unsigned long dr6)
{
idtentry_enter_user(regs);
instrumentation_begin();
handle_debug(regs, dr6, true);
instrumentation_end();
idtentry_exit_user(regs);
}
#ifdef CONFIG_X86_64
/* IST stack entry */
DEFINE_IDTENTRY_DEBUG(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
exc_debug_kernel(regs, dr6);
debug_exit(dr7);
}
/* User entry, runs on regular task stack */
DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
exc_debug_user(regs, dr6);
debug_exit(dr7);
}
#else
/* 32 bit does not have separate entry points. */
DEFINE_IDTENTRY_DEBUG(exc_debug)
{
unsigned long dr6, dr7;
debug_enter(&dr6, &dr7);
if (user_mode(regs))
exc_debug_user(regs, dr6);
else
exc_debug_kernel(regs, dr6);
debug_exit(dr7);
}
#endif
/*
* Note that we play around with the 'TS' bit in an attempt to get
* the correct behaviour even in the presence of the asynchronous
* IRQ13 behaviour
*/
static void math_error(struct pt_regs *regs, int error_code, int trapnr)
static void math_error(struct pt_regs *regs, int trapnr)
{
struct task_struct *task = current;
struct fpu *fpu = &task->thread.fpu;
@ -766,16 +943,16 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
cond_local_irq_enable(regs);
if (!user_mode(regs)) {
if (fixup_exception(regs, trapnr, error_code, 0))
return;
if (fixup_exception(regs, trapnr, 0, 0))
goto exit;
task->thread.error_code = error_code;
task->thread.error_code = 0;
task->thread.trap_nr = trapnr;
if (notify_die(DIE_TRAP, str, regs, error_code,
trapnr, SIGFPE) != NOTIFY_STOP)
die(str, regs, error_code);
return;
if (notify_die(DIE_TRAP, str, regs, 0, trapnr,
SIGFPE) != NOTIFY_STOP)
die(str, regs, 0);
goto exit;
}
/*
@ -784,32 +961,37 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
fpu__save(fpu);
task->thread.trap_nr = trapnr;
task->thread.error_code = error_code;
task->thread.error_code = 0;
si_code = fpu__exception_code(fpu, trapnr);
/* Retry when we get spurious exceptions: */
if (!si_code)
return;
goto exit;
force_sig_fault(SIGFPE, si_code,
(void __user *)uprobe_get_trap_addr(regs));
exit:
cond_local_irq_disable(regs);
}
dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_coprocessor_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_MF);
math_error(regs, X86_TRAP_MF);
}
dotraplinkage void
do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_simd_coprocessor_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_XF);
if (IS_ENABLED(CONFIG_X86_INVD_BUG)) {
/* AMD 486 bug: INVD in CPL 0 raises #XF instead of #GP */
if (!static_cpu_has(X86_FEATURE_XMM)) {
__exc_general_protection(regs, 0);
return;
}
}
math_error(regs, X86_TRAP_XF);
}
dotraplinkage void
do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_spurious_interrupt_bug)
{
/*
* This addresses a Pentium Pro Erratum:
@ -832,13 +1014,10 @@ do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
*/
}
dotraplinkage void
do_device_not_available(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY(exc_device_not_available)
{
unsigned long cr0 = read_cr0();
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU) && (cr0 & X86_CR0_EM)) {
struct math_emu_info info = { };
@ -847,6 +1026,8 @@ do_device_not_available(struct pt_regs *regs, long error_code)
info.regs = regs;
math_emulate(&info);
cond_local_irq_disable(regs);
return;
}
#endif
@ -861,22 +1042,20 @@ do_device_not_available(struct pt_regs *regs, long error_code)
* to kill the task than getting stuck in a never-ending
* loop of #NM faults.
*/
die("unexpected #NM exception", regs, error_code);
die("unexpected #NM exception", regs, 0);
}
}
NOKPROBE_SYMBOL(do_device_not_available);
#ifdef CONFIG_X86_32
dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
DEFINE_IDTENTRY_SW(iret_error)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
local_irq_enable();
if (notify_die(DIE_TRAP, "iret exception", regs, error_code,
if (notify_die(DIE_TRAP, "iret exception", regs, 0,
X86_TRAP_IRET, SIGILL) != NOTIFY_STOP) {
do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, error_code,
do_trap(X86_TRAP_IRET, SIGILL, "iret exception", regs, 0,
ILL_BADSTK, (void __user *)NULL);
}
local_irq_disable();
}
#endif
@ -887,21 +1066,10 @@ void __init trap_init(void)
idt_setup_traps();
/*
* Set the IDT descriptor to a fixed read-only location, so that the
* "sidt" instruction will not leak the location of the kernel, and
* to defend the IDT against arbitrary memory write vulnerabilities.
* It will be reloaded in cpu_init() */
cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
PAGE_KERNEL_RO);
idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
/*
* Should be a barrier for any external CPU state:
*/
cpu_init();
idt_setup_ist_traps();
idt_setup_debugidt_traps();
}

View File

@ -74,13 +74,7 @@ static bool in_entry_code(unsigned long ip)
{
char *addr = (char *)ip;
if (addr >= __entry_text_start && addr < __entry_text_end)
return true;
if (addr >= __irqentry_text_start && addr < __irqentry_text_end)
return true;
return false;
return addr >= __entry_text_start && addr < __entry_text_end;
}
static inline unsigned long *last_frame(struct unwind_state *state)

View File

@ -134,7 +134,6 @@ SECTIONS
KPROBES_TEXT
ALIGN_ENTRY_TEXT_BEGIN
ENTRY_TEXT
IRQENTRY_TEXT
ALIGN_ENTRY_TEXT_END
SOFTIRQENTRY_TEXT
*(.fixup)

View File

@ -1839,7 +1839,7 @@ static void kvm_machine_check(void)
.flags = X86_EFLAGS_IF,
};
do_machine_check(&regs, 0);
do_machine_check(&regs);
#endif
}

View File

@ -3087,9 +3087,9 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
/*
* VMExit clears RFLAGS.IF and DR7, even on a consistency check.
*/
local_irq_enable();
if (hw_breakpoint_active())
set_debugreg(__this_cpu_read(cpu_dr7), 7);
local_irq_enable();
preempt_enable();
/*

View File

@ -4709,7 +4709,7 @@ static void kvm_machine_check(void)
.flags = X86_EFLAGS_IF,
};
do_machine_check(&regs, 0);
do_machine_check(&regs);
#endif
}

View File

@ -107,7 +107,6 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
*/
cea_map_stack(DF);
cea_map_stack(NMI);
cea_map_stack(DB1);
cea_map_stack(DB);
cea_map_stack(MCE);
}

View File

@ -204,8 +204,19 @@ void __init early_fixup_exception(struct pt_regs *regs, int trapnr)
if (fixup_exception(regs, trapnr, regs->orig_ax, 0))
return;
if (fixup_bug(regs, trapnr))
return;
if (trapnr == X86_TRAP_UD) {
if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN) {
/* Skip the ud2. */
regs->ip += LEN_UD2;
return;
}
/*
* If this was a BUG and report_bug returns or if this
* was just a normal #UD, we want to continue onward and
* crash.
*/
}
fail:
early_printk("PANIC: early exception 0x%02x IP %lx:%lx error %lx cr2 0x%lx\n",

View File

@ -414,21 +414,13 @@ static int is_errata100(struct pt_regs *regs, unsigned long address)
return 0;
}
/* Pentium F0 0F C7 C8 bug workaround: */
static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
{
#ifdef CONFIG_X86_F00F_BUG
unsigned long nr;
/*
* Pentium F0 0F C7 C8 bug workaround:
*/
if (boot_cpu_has_bug(X86_BUG_F00F)) {
nr = (address - idt_descr.address) >> 3;
if (nr == 6) {
do_invalid_op(regs, 0);
return 1;
}
if (boot_cpu_has_bug(X86_BUG_F00F) && idt_is_f00f_address(address)) {
handle_invalid_op(regs);
return 1;
}
#endif
return 0;
@ -786,6 +778,8 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
force_sig_fault(SIGSEGV, si_code, (void __user *)address);
local_irq_disable();
return;
}
@ -1355,11 +1349,38 @@ trace_page_fault_entries(struct pt_regs *regs, unsigned long error_code,
trace_page_fault_kernel(address, regs, error_code);
}
dotraplinkage void
do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
unsigned long address)
static __always_inline void
handle_page_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
trace_page_fault_entries(regs, error_code, address);
if (unlikely(kmmio_fault(regs, address)))
return;
/* Was the fault on kernel-controlled part of the address space? */
if (unlikely(fault_in_kernel_space(address))) {
do_kern_addr_fault(regs, error_code, address);
} else {
do_user_addr_fault(regs, error_code, address);
/*
* User address page fault handling might have reenabled
* interrupts. Fixing up all potential exit points of
* do_user_addr_fault() and its leaf functions is just not
* doable w/o creating an unholy mess or turning the code
* upside down.
*/
local_irq_disable();
}
}
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
unsigned long address = read_cr2();
bool rcu_exit;
prefetchw(&current->mm->mmap_lock);
/*
* KVM has two types of events that are, logically, interrupts, but
* are unfortunately delivered using the #PF vector. These events are
@ -1374,19 +1395,28 @@ do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
* getting values from real and async page faults mixed up.
*
* Fingers crossed.
*
* The async #PF handling code takes care of idtentry handling
* itself.
*/
if (kvm_handle_async_pf(regs, (u32)address))
return;
trace_page_fault_entries(regs, hw_error_code, address);
/*
* Entry handling for valid #PF from kernel mode is slightly
* different: RCU is already watching and rcu_irq_enter() must not
* be invoked because a kernel fault on a user space address might
* sleep.
*
* In case the fault hit a RCU idle region the conditional entry
* code reenabled RCU to avoid subsequent wreckage which helps
* debugability.
*/
rcu_exit = idtentry_enter_cond_rcu(regs);
if (unlikely(kmmio_fault(regs, address)))
return;
instrumentation_begin();
handle_page_fault(regs, error_code, address);
instrumentation_end();
/* Was the fault on kernel-controlled part of the address space? */
if (unlikely(fault_in_kernel_space(address)))
do_kern_addr_fault(regs, hw_error_code, address);
else
do_user_addr_fault(regs, hw_error_code, address);
idtentry_exit_cond_rcu(regs, rcu_exit);
}
NOKPROBE_SYMBOL(do_page_fault);

View File

@ -492,12 +492,12 @@ static void __init pti_setup_espfix64(void)
}
/*
* Clone the populated PMDs of the entry and irqentry text and force it RO.
* Clone the populated PMDs of the entry text and force it RO.
*/
static void pti_clone_entry_text(void)
{
pti_clone_pgtable((unsigned long) __entry_text_start,
(unsigned long) __irqentry_text_end,
(unsigned long) __entry_text_end,
PTI_CLONE_PMD);
}

View File

@ -1272,7 +1272,7 @@ static void process_uv2_message(struct msg_desc *mdp, struct bau_control *bcp)
* (the resource will not be freed until noninterruptable cpus see this
* interrupt; hardware may timeout the s/w ack and reply ERROR)
*/
void uv_bau_message_interrupt(struct pt_regs *regs)
DEFINE_IDTENTRY_SYSVEC(sysvec_uv_bau_message)
{
int count = 0;
cycles_t time_start;

View File

@ -13,6 +13,7 @@
#include <asm/smp.h>
#include <asm/reboot.h>
#include <asm/setup.h>
#include <asm/idtentry.h>
#include <asm/hypervisor.h>
#include <asm/e820/api.h>
#include <asm/early_ioremap.h>
@ -118,6 +119,17 @@ static void __init init_hvm_pv_info(void)
this_cpu_write(xen_vcpu_id, smp_processor_id());
}
DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback)
{
struct pt_regs *old_regs = set_irq_regs(regs);
inc_irq_stat(irq_hv_callback_count);
xen_hvm_evtchn_do_upcall();
set_irq_regs(old_regs);
}
#ifdef CONFIG_KEXEC_CORE
static void xen_hvm_shutdown(void)
{

View File

@ -604,32 +604,42 @@ struct trap_array_entry {
bool ist_okay;
};
#define TRAP_ENTRY(func, ist_ok) { \
.orig = asm_##func, \
.xen = xen_asm_##func, \
.ist_okay = ist_ok }
#define TRAP_ENTRY_REDIR(func, xenfunc, ist_ok) { \
.orig = asm_##func, \
.xen = xen_asm_##xenfunc, \
.ist_okay = ist_ok }
static struct trap_array_entry trap_array[] = {
{ debug, xen_xendebug, true },
{ double_fault, xen_double_fault, true },
TRAP_ENTRY_REDIR(exc_debug, exc_xendebug, true ),
TRAP_ENTRY(exc_double_fault, true ),
#ifdef CONFIG_X86_MCE
{ machine_check, xen_machine_check, true },
TRAP_ENTRY(exc_machine_check, true ),
#endif
{ nmi, xen_xennmi, true },
{ int3, xen_int3, false },
{ overflow, xen_overflow, false },
TRAP_ENTRY_REDIR(exc_nmi, exc_xennmi, true ),
TRAP_ENTRY(exc_int3, false ),
TRAP_ENTRY(exc_overflow, false ),
#ifdef CONFIG_IA32_EMULATION
{ entry_INT80_compat, xen_entry_INT80_compat, false },
#endif
{ page_fault, xen_page_fault, false },
{ divide_error, xen_divide_error, false },
{ bounds, xen_bounds, false },
{ invalid_op, xen_invalid_op, false },
{ device_not_available, xen_device_not_available, false },
{ coprocessor_segment_overrun, xen_coprocessor_segment_overrun, false },
{ invalid_TSS, xen_invalid_TSS, false },
{ segment_not_present, xen_segment_not_present, false },
{ stack_segment, xen_stack_segment, false },
{ general_protection, xen_general_protection, false },
{ spurious_interrupt_bug, xen_spurious_interrupt_bug, false },
{ coprocessor_error, xen_coprocessor_error, false },
{ alignment_check, xen_alignment_check, false },
{ simd_coprocessor_error, xen_simd_coprocessor_error, false },
TRAP_ENTRY(exc_page_fault, false ),
TRAP_ENTRY(exc_divide_error, false ),
TRAP_ENTRY(exc_bounds, false ),
TRAP_ENTRY(exc_invalid_op, false ),
TRAP_ENTRY(exc_device_not_available, false ),
TRAP_ENTRY(exc_coproc_segment_overrun, false ),
TRAP_ENTRY(exc_invalid_tss, false ),
TRAP_ENTRY(exc_segment_not_present, false ),
TRAP_ENTRY(exc_stack_segment, false ),
TRAP_ENTRY(exc_general_protection, false ),
TRAP_ENTRY(exc_spurious_interrupt_bug, false ),
TRAP_ENTRY(exc_coprocessor_error, false ),
TRAP_ENTRY(exc_alignment_check, false ),
TRAP_ENTRY(exc_simd_coprocessor_error, false ),
};
static bool __ref get_trap_addr(void **addr, unsigned int ist)
@ -641,7 +651,7 @@ static bool __ref get_trap_addr(void **addr, unsigned int ist)
* Replace trap handler addresses by Xen specific ones.
* Check for known traps using IST and whitelist them.
* The debugger ones are the only ones we care about.
* Xen will handle faults like double_fault, * so we should never see
* Xen will handle faults like double_fault, so we should never see
* them. Warn if there's an unexpected IST-using fault handler.
*/
for (nr = 0; nr < ARRAY_SIZE(trap_array); nr++) {

View File

@ -20,6 +20,7 @@
#include <asm/setup.h>
#include <asm/acpi.h>
#include <asm/numa.h>
#include <asm/idtentry.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
@ -993,7 +994,8 @@ static void __init xen_pvmmu_arch_setup(void)
HYPERVISOR_vm_assist(VMASST_CMD_enable,
VMASST_TYPE_pae_extended_cr3);
if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
if (register_callback(CALLBACKTYPE_event,
xen_asm_exc_xen_hypervisor_callback) ||
register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
BUG();

View File

@ -26,6 +26,7 @@
#include <linux/pgtable.h>
#include <asm/paravirt.h>
#include <asm/idtentry.h>
#include <asm/desc.h>
#include <asm/cpu.h>
@ -348,7 +349,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
ctxt->gs_base_kernel = per_cpu_offset(cpu);
#endif
ctxt->event_callback_eip =
(unsigned long)xen_hypervisor_callback;
(unsigned long)xen_asm_exc_xen_hypervisor_callback;
ctxt->failsafe_callback_eip =
(unsigned long)xen_failsafe_callback;
per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);

View File

@ -2,6 +2,7 @@
#include <linux/types.h>
#include <xen/xen.h>
#include <xen/hvm.h>
#include <xen/features.h>
#include <xen/interface/features.h>
@ -13,6 +14,6 @@ void xen_hvm_post_suspend(int suspend_cancelled)
xen_hvm_init_shared_info();
xen_vcpu_restore();
}
xen_callback_vector();
xen_setup_callback_vector();
xen_unplug_emulated_devices();
}

View File

@ -93,7 +93,7 @@ xen_iret_start_crit:
/*
* If there's something pending, mask events again so we can
* jump back into xen_hypervisor_callback. Otherwise do not
* jump back into exc_xen_hypervisor_callback. Otherwise do not
* touch XEN_vcpu_info_mask.
*/
jne 1f
@ -113,11 +113,11 @@ iret_restore_end:
* Events are masked, so jumping out of the critical region is
* OK.
*/
je xen_hypervisor_callback
je xen_asm_exc_xen_hypervisor_callback
1: iret
xen_iret_end_crit:
_ASM_EXTABLE(1b, iret_exc)
_ASM_EXTABLE(1b, asm_iret_error)
hyper_iret:
/* put this out of line since its very rarely used */
@ -127,7 +127,7 @@ SYM_CODE_END(xen_iret)
.globl xen_iret_start_crit, xen_iret_end_crit
/*
* This is called by xen_hypervisor_callback in entry_32.S when it sees
* This is called by xen_asm_exc_xen_hypervisor_callback in entry_32.S when it sees
* that the EIP at the time of interrupt was between
* xen_iret_start_crit and xen_iret_end_crit.
*
@ -144,7 +144,7 @@ SYM_CODE_END(xen_iret)
* eflags }
* cs } nested exception info
* eip }
* return address : (into xen_hypervisor_callback)
* return address : (into xen_asm_exc_xen_hypervisor_callback)
*
* In order to deliver the nested exception properly, we need to discard the
* nested exception frame such that when we handle the exception, we do it
@ -152,7 +152,8 @@ SYM_CODE_END(xen_iret)
*
* The only caveat is that if the outer eax hasn't been restored yet (i.e.
* it's still on stack), we need to restore its value here.
*/
*/
.pushsection .noinstr.text, "ax"
SYM_CODE_START(xen_iret_crit_fixup)
/*
* Paranoia: Make sure we're really coming from kernel space.
@ -181,3 +182,4 @@ SYM_CODE_START(xen_iret_crit_fixup)
2:
ret
SYM_CODE_END(xen_iret_crit_fixup)
.popsection

View File

@ -28,33 +28,33 @@ SYM_CODE_END(xen_\name)
_ASM_NOKPROBE(xen_\name)
.endm
xen_pv_trap divide_error
xen_pv_trap debug
xen_pv_trap xendebug
xen_pv_trap int3
xen_pv_trap xennmi
xen_pv_trap overflow
xen_pv_trap bounds
xen_pv_trap invalid_op
xen_pv_trap device_not_available
xen_pv_trap double_fault
xen_pv_trap coprocessor_segment_overrun
xen_pv_trap invalid_TSS
xen_pv_trap segment_not_present
xen_pv_trap stack_segment
xen_pv_trap general_protection
xen_pv_trap page_fault
xen_pv_trap spurious_interrupt_bug
xen_pv_trap coprocessor_error
xen_pv_trap alignment_check
xen_pv_trap asm_exc_divide_error
xen_pv_trap asm_exc_debug
xen_pv_trap asm_exc_xendebug
xen_pv_trap asm_exc_int3
xen_pv_trap asm_exc_xennmi
xen_pv_trap asm_exc_overflow
xen_pv_trap asm_exc_bounds
xen_pv_trap asm_exc_invalid_op
xen_pv_trap asm_exc_device_not_available
xen_pv_trap asm_exc_double_fault
xen_pv_trap asm_exc_coproc_segment_overrun
xen_pv_trap asm_exc_invalid_tss
xen_pv_trap asm_exc_segment_not_present
xen_pv_trap asm_exc_stack_segment
xen_pv_trap asm_exc_general_protection
xen_pv_trap asm_exc_page_fault
xen_pv_trap asm_exc_spurious_interrupt_bug
xen_pv_trap asm_exc_coprocessor_error
xen_pv_trap asm_exc_alignment_check
#ifdef CONFIG_X86_MCE
xen_pv_trap machine_check
xen_pv_trap asm_exc_machine_check
#endif /* CONFIG_X86_MCE */
xen_pv_trap simd_coprocessor_error
xen_pv_trap asm_exc_simd_coprocessor_error
#ifdef CONFIG_IA32_EMULATION
xen_pv_trap entry_INT80_compat
#endif
xen_pv_trap hypervisor_callback
xen_pv_trap asm_exc_xen_hypervisor_callback
__INIT
SYM_CODE_START(xen_early_idt_handler_array)

View File

@ -8,7 +8,6 @@
#include <xen/xen-ops.h>
/* These are code, but not functions. Defined in entry.S */
extern const char xen_hypervisor_callback[];
extern const char xen_failsafe_callback[];
void xen_sysenter_target(void);
@ -55,7 +54,6 @@ void xen_enable_sysenter(void);
void xen_enable_syscall(void);
void xen_vcpu_restore(void);
void xen_callback_vector(void);
void xen_hvm_init_shared_info(void);
void xen_unplug_emulated_devices(void);

View File

@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-y += grant-table.o features.o balloon.o manage.o preempt.o time.o
obj-y += grant-table.o features.o balloon.o manage.o time.o
obj-y += mem-reservation.o
obj-y += events/
obj-y += xenbus/

View File

@ -37,6 +37,7 @@
#ifdef CONFIG_X86
#include <asm/desc.h>
#include <asm/ptrace.h>
#include <asm/idtentry.h>
#include <asm/irq.h>
#include <asm/io_apic.h>
#include <asm/i8259.h>
@ -1236,9 +1237,6 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
struct pt_regs *old_regs = set_irq_regs(regs);
irq_enter();
#ifdef CONFIG_X86
inc_irq_stat(irq_hv_callback_count);
#endif
__xen_evtchn_do_upcall();
@ -1639,26 +1637,30 @@ EXPORT_SYMBOL_GPL(xen_set_callback_via);
/* Vector callbacks are better than PCI interrupts to receive event
* channel notifications because we can receive vector callbacks on any
* vcpu and we don't need PCI support or APIC interactions. */
void xen_callback_vector(void)
void xen_setup_callback_vector(void)
{
int rc;
uint64_t callback_via;
if (xen_have_vector_callback) {
callback_via = HVM_CALLBACK_VECTOR(HYPERVISOR_CALLBACK_VECTOR);
rc = xen_set_callback_via(callback_via);
if (rc) {
if (xen_set_callback_via(callback_via)) {
pr_err("Request for Xen HVM callback vector failed\n");
xen_have_vector_callback = 0;
return;
}
pr_info_once("Xen HVM callback vector for event delivery is enabled\n");
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR,
xen_hvm_callback_vector);
}
}
static __init void xen_alloc_callback_vector(void)
{
if (!xen_have_vector_callback)
return;
pr_info("Xen HVM callback vector for event delivery is enabled\n");
alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_xen_hvm_callback);
}
#else
void xen_callback_vector(void) {}
void xen_setup_callback_vector(void) {}
static inline void xen_alloc_callback_vector(void) {}
#endif
#undef MODULE_PARAM_PREFIX
@ -1692,8 +1694,10 @@ void __init xen_init_IRQ(void)
if (xen_initial_domain())
pci_xen_initial_domain();
}
if (xen_feature(XENFEAT_hvm_callback_vector))
xen_callback_vector();
if (xen_feature(XENFEAT_hvm_callback_vector)) {
xen_setup_callback_vector();
xen_alloc_callback_vector();
}
if (xen_hvm_domain()) {
native_init_IRQ();

View File

@ -1,42 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Preemptible hypercalls
*
* Copyright (C) 2014 Citrix Systems R&D ltd.
*/
#include <linux/sched.h>
#include <xen/xen-ops.h>
#ifndef CONFIG_PREEMPTION
/*
* Some hypercalls issued by the toolstack can take many 10s of
* seconds. Allow tasks running hypercalls via the privcmd driver to
* be voluntarily preempted even if full kernel preemption is
* disabled.
*
* Such preemptible hypercalls are bracketed by
* xen_preemptible_hcall_begin() and xen_preemptible_hcall_end()
* calls.
*/
DEFINE_PER_CPU(bool, xen_in_preemptible_hcall);
EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
asmlinkage __visible void xen_maybe_preempt_hcall(void)
{
if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
&& need_resched())) {
/*
* Clear flag as we may be rescheduled on a different
* cpu.
*/
__this_cpu_write(xen_in_preemptible_hcall, false);
local_irq_enable();
cond_resched();
local_irq_disable();
__this_cpu_write(xen_in_preemptible_hcall, true);
}
}
#endif /* CONFIG_PREEMPTION */

View File

@ -83,14 +83,19 @@ extern __printf(4, 5)
void warn_slowpath_fmt(const char *file, const int line, unsigned taint,
const char *fmt, ...);
#define __WARN() __WARN_printf(TAINT_WARN, NULL)
#define __WARN_printf(taint, arg...) \
warn_slowpath_fmt(__FILE__, __LINE__, taint, arg)
#define __WARN_printf(taint, arg...) do { \
instrumentation_begin(); \
warn_slowpath_fmt(__FILE__, __LINE__, taint, arg); \
instrumentation_end(); \
} while (0)
#else
extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
#define __WARN_printf(taint, arg...) do { \
instrumentation_begin(); \
__warn_printk(arg); \
__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
instrumentation_end(); \
} while (0)
#define WARN_ON_ONCE(condition) ({ \
int __ret_warn_on = !!(condition); \

View File

@ -4,7 +4,29 @@
#include <linux/types.h>
void *bsearch(const void *key, const void *base, size_t num, size_t size,
cmp_func_t cmp);
static __always_inline
void *__inline_bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
{
const char *pivot;
int result;
while (num > 0) {
pivot = base + (num >> 1) * size;
result = cmp(key, pivot);
if (result == 0)
return (void *)pivot;
if (result > 0) {
base = pivot + size;
num--;
}
num >>= 1;
}
return NULL;
}
extern void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp);
#endif /* _LINUX_BSEARCH_H */

View File

@ -33,13 +33,13 @@ static inline void user_exit(void)
}
/* Called with interrupts disabled. */
static inline void user_enter_irqoff(void)
static __always_inline void user_enter_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_USER);
}
static inline void user_exit_irqoff(void)
static __always_inline void user_exit_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_USER);
@ -75,7 +75,7 @@ static inline void exception_exit(enum ctx_state prev_ctx)
* is enabled. If context tracking is disabled, returns
* CONTEXT_DISABLED. This should be used primarily for debugging.
*/
static inline enum ctx_state ct_state(void)
static __always_inline enum ctx_state ct_state(void)
{
return context_tracking_enabled() ?
this_cpu_read(context_tracking.state) : CONTEXT_DISABLED;

View File

@ -26,12 +26,12 @@ struct context_tracking {
extern struct static_key_false context_tracking_key;
DECLARE_PER_CPU(struct context_tracking, context_tracking);
static inline bool context_tracking_enabled(void)
static __always_inline bool context_tracking_enabled(void)
{
return static_branch_unlikely(&context_tracking_key);
}
static inline bool context_tracking_enabled_cpu(int cpu)
static __always_inline bool context_tracking_enabled_cpu(int cpu)
{
return context_tracking_enabled() && per_cpu(context_tracking.active, cpu);
}
@ -41,7 +41,7 @@ static inline bool context_tracking_enabled_this_cpu(void)
return context_tracking_enabled() && __this_cpu_read(context_tracking.active);
}
static inline bool context_tracking_in_user(void)
static __always_inline bool context_tracking_in_user(void)
{
return __this_cpu_read(context_tracking.state) == CONTEXT_USER;
}

View File

@ -12,7 +12,7 @@ extern int debug_locks __read_mostly;
extern int debug_locks_silent __read_mostly;
static inline int __debug_locks_off(void)
static __always_inline int __debug_locks_off(void)
{
return xchg(&debug_locks, 0);
}

View File

@ -37,10 +37,25 @@ static __always_inline void rcu_irq_enter_check_tick(void)
lockdep_hardirq_enter(); \
} while (0)
/*
* Like __irq_enter() without time accounting for fast
* interrupts, e.g. reschedule IPI where time accounting
* is more expensive than the actual interrupt.
*/
#define __irq_enter_raw() \
do { \
preempt_count_add(HARDIRQ_OFFSET); \
lockdep_hardirq_enter(); \
} while (0)
/*
* Enter irq context (on NO_HZ, update jiffies):
*/
extern void irq_enter(void);
void irq_enter(void);
/*
* Like irq_enter(), but RCU is already watching.
*/
void irq_enter_rcu(void);
/*
* Exit irq context without processing softirqs:
@ -52,10 +67,24 @@ extern void irq_enter(void);
preempt_count_sub(HARDIRQ_OFFSET); \
} while (0)
/*
* Like __irq_exit() without time accounting
*/
#define __irq_exit_raw() \
do { \
lockdep_hardirq_exit(); \
preempt_count_sub(HARDIRQ_OFFSET); \
} while (0)
/*
* Exit irq context and process softirqs if needed:
*/
extern void irq_exit(void);
void irq_exit(void);
/*
* Like irq_exit(), but return with RCU watching.
*/
void irq_exit_rcu(void);
#ifndef arch_nmi_enter
#define arch_nmi_enter() do { } while (0)
@ -87,20 +116,24 @@ extern void rcu_nmi_exit(void);
arch_nmi_enter(); \
printk_nmi_enter(); \
lockdep_off(); \
ftrace_nmi_enter(); \
BUG_ON(in_nmi() == NMI_MASK); \
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \
lockdep_hardirq_enter(); \
instrumentation_begin(); \
ftrace_nmi_enter(); \
instrumentation_end(); \
} while (0)
#define nmi_exit() \
do { \
instrumentation_begin(); \
ftrace_nmi_exit(); \
instrumentation_end(); \
lockdep_hardirq_exit(); \
rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit(); \
lockdep_on(); \
printk_nmi_exit(); \
arch_nmi_exit(); \

View File

@ -760,8 +760,10 @@ extern int arch_early_irq_init(void);
/*
* We want to know which function is an entrypoint of a hardirq or a softirq.
*/
#define __irq_entry __attribute__((__section__(".irqentry.text")))
#define __softirq_entry \
__attribute__((__section__(".softirqentry.text")))
#ifndef __irq_entry
# define __irq_entry __attribute__((__section__(".irqentry.text")))
#endif
#define __softirq_entry __attribute__((__section__(".softirqentry.text")))
#endif

View File

@ -32,7 +32,7 @@
#ifdef CONFIG_TRACE_IRQFLAGS
extern void trace_hardirqs_on_prepare(void);
extern void trace_hardirqs_off_prepare(void);
extern void trace_hardirqs_off_finish(void);
extern void trace_hardirqs_on(void);
extern void trace_hardirqs_off(void);
# define lockdep_hardirq_context(p) ((p)->hardirq_context)
@ -101,7 +101,7 @@ do { \
#else
# define trace_hardirqs_on_prepare() do { } while (0)
# define trace_hardirqs_off_prepare() do { } while (0)
# define trace_hardirqs_off_finish() do { } while (0)
# define trace_hardirqs_on() do { } while (0)
# define trace_hardirqs_off() do { } while (0)
# define lockdep_hardirq_context(p) 0

View File

@ -90,13 +90,6 @@ unsigned int irq_from_evtchn(evtchn_port_t evtchn);
int irq_from_virq(unsigned int cpu, unsigned int virq);
evtchn_port_t evtchn_from_irq(unsigned irq);
#ifdef CONFIG_XEN_PVHVM
/* Xen HVM evtchn vector callback */
void xen_hvm_callback_vector(void);
#ifdef CONFIG_TRACING
#define trace_xen_hvm_callback_vector xen_hvm_callback_vector
#endif
#endif
int xen_set_callback_via(uint64_t via);
void xen_evtchn_do_upcall(struct pt_regs *regs);
void xen_hvm_evtchn_do_upcall(void);

Some files were not shown because too many files have changed in this diff Show More