linux-stable/kernel
Valentin Schneider 0cf2833400 panic, kexec: make __crash_kexec() NMI safe
commit 05c6257433 upstream.

Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
of mutex_trylock():

	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
		return 0;

This prevents an nmi_panic() from executing the main body of
__crash_kexec() which does the actual kexec into the kdump kernel.  The
warning and return are explained by:

  6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
  [...]
  The reasons for this are:

      1) There is a potential deadlock in the slowpath

      2) Another cpu which blocks on the rtmutex will boost the task
	 which allegedly locked the rtmutex, but that cannot work
	 because the hard/softirq context borrows the task context.

Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
and replace it with an atomic variable.  This is somewhat overzealous as
*some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
like crash_shrink_memory()), but this has the benefit of involving a
single unified lock and preventing any future NMI-related surprises.

Tested by triggering NMI panics via:

  $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
  $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
  $ echo 1 > /proc/sys/kernel/panic

  $ ipmitool power diag

Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
Fixes: 6ce47fd961 ("rtmutex: Warn if trylock is called from hard/softirq context")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:13:57 +02:00
..
bpf bpf: hash map, avoid deadlock with suitable hash mask 2023-04-13 16:48:16 +02:00
cgroup cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach() 2023-04-20 12:13:55 +02:00
configs
debug lockdown: also lock down previous kgdb use 2022-05-25 09:57:37 +02:00
dma swiotlb: max mapping size takes min align mask into account 2022-10-05 10:39:40 +02:00
entry entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up 2023-03-30 12:47:51 +02:00
events perf/core: Fix the same task check in perf_event_set_output 2023-04-13 16:48:24 +02:00
futex futex: Resend potentially swallowed owner death notification 2022-12-31 13:14:04 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:14:47 +01:00
irq irqdomain: Fix mapping-creation race 2023-03-17 08:48:58 +01:00
kcsan kcsan: avoid passing -g for test 2023-04-05 11:24:52 +02:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 12:34:30 +02:00
locking locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath 2023-03-10 09:39:57 +01:00
power PM: EM: fix memory leak with using debugfs_lookup() 2023-03-10 09:39:51 +01:00
printk kernel/printk/index.c: fix memory leak with using debugfs_lookup() 2023-03-11 13:57:32 +01:00
rcu rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug 2023-03-10 09:39:47 +01:00
sched sched/fair: Fix imbalance overflow 2023-04-20 12:13:56 +02:00
time time/debug: Fix memory leak with using debugfs_lookup() 2023-03-10 09:39:51 +01:00
trace tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance 2023-04-20 12:13:55 +02:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:14:40 +01:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:34:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-31 17:16:33 +02:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
auditfilter.c
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
backtracetest.c
bounds.c
capability.c
cfi.c cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2022-06-22 14:22:04 +02:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:24:53 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Do not bail-out in DYING/STARTING sections 2022-12-31 13:14:04 +01:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c ucounts: Base set_cred_ucounts changes on the real user 2022-02-23 12:03:20 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:27:22 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:57:38 +01:00
fork.c fork: allow CLONE_NEWTIME in clone3 flags 2023-03-17 08:48:47 +01:00
freezer.c
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt
kcov.c
kexec.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kexec_core.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kexec_elf.c
kexec_file.c panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kheaders.c
kmod.c
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-03-10 09:40:01 +01:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:13:57 +02:00
kthread.c kthread: add the helper function kthread_run_on_cpu() 2023-03-30 12:47:42 +02:00
latencytop.c
Makefile futex: Move to kernel/futex/ 2022-12-31 13:14:04 +01:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-01 08:27:23 +01:00
module_signature.c
module_signing.c
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Fix list iterator in padata_do_serial() 2022-12-31 13:14:24 +01:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-01 08:27:22 +01:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2021-08-10 12:53:07 +02:00
pid_namespace.c rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() 2023-03-10 09:39:09 +01:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-17 14:24:04 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:22:29 +02:00
range.c
reboot.c
regset.c
relay.c relay: fix type mismatch when allocating memory in relay_create_buf() 2022-12-31 13:14:05 +01:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-10 09:40:08 +01:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-06-09 10:22:46 +02:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-11-18 19:16:29 +01:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 21:24:42 +02:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2022-08-17 14:24:24 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c genirq: Change force_irqthreads to a static key 2021-08-10 22:50:07 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2022-04-13 20:59:28 +02:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
stop_machine.c
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:22:44 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-31 17:16:33 +02:00
sysctl-test.c
sysctl.c kernel/panic: move panic sysctls to its own file 2023-02-01 08:27:20 +01:00
task_work.c
taskstats.c
test_kprobes.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths 2023-03-17 08:48:59 +01:00
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:40:43 +02:00
watchdog_hld.c
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-09-28 11:11:56 +02:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00