linux-stable/kernel
Mel Gorman 693e7e52a8 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
commit 1c0908d8e4 upstream.

Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-04 11:26:28 +01:00
..
bpf bpf: Prevent decl_tag from being referenced in func_proto arg 2022-12-31 13:26:45 +01:00
cgroup memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:40:52 +01:00
configs xen: branch for v6.0-rc1b 2022-08-14 09:28:54 -07:00
debug Modules updates for v5.19-rc1 2022-05-26 17:13:43 -07:00
dma dma-mapping: mark dma_supported static 2022-09-07 10:38:28 +02:00
entry context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
events perf: Fix possible memleak in pmu_dev_alloc() 2022-12-31 13:25:42 +01:00
futex futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error 2023-01-04 11:26:28 +01:00
gcov gcov: add support for checksum field 2022-12-31 13:26:53 +01:00
irq genirq/irqdesc: Don't try to remove non-existing sysfs files 2022-12-31 13:25:45 +01:00
kcsan kcsan: test: Add a .kunitconfig to run KCSAN tests 2022-07-22 09:22:59 -06:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-21 12:37:51 +02:00
locking rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2023-01-04 11:26:28 +01:00
module module: Fix NULL vs IS_ERR checking for module_get_next_page 2022-12-31 13:25:57 +01:00
power PM: hibernate: Fix mistake in kerneldoc comment 2022-12-31 13:25:42 +01:00
printk printk: do not wait for consoles when suspended 2022-07-15 10:52:11 +02:00
rcu rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state() 2022-12-31 13:26:40 +01:00
sched sched/psi: Fix possible missing or delayed pending event 2022-12-31 13:25:42 +01:00
time time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 2022-08-09 20:02:13 +02:00
trace tracing/hist: Fix issue of losting command info in error_log 2022-12-31 13:26:28 +01:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:26:41 +01:00
async.c
audit.c audit: make is_audit_feature_set() static 2022-06-13 14:08:57 -04:00
audit.h
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c audit: use fsnotify group lock helpers 2022-04-25 14:37:28 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2022-04-25 14:37:12 +02:00
auditfilter.c
auditsc.c audit: free audit_proctitle only on task exit 2022-10-21 12:38:04 +02:00
backtracetest.c
bounds.c
capability.c
cfi.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
compat.c
configs.c
context_tracking.c MAINTAINERS: Add Paul as context tracking maintainer 2022-07-05 13:33:00 -07:00
cpu.c cpu/hotplug: Do not bail-out in DYING/STARTING sections 2022-12-31 13:25:46 +01:00
cpu_pm.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c x86: Mark __invalid_creds() __noreturn 2022-03-15 10:32:44 +01:00
delayacct.c delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
dma.c
exec_domain.c
exit.c exit: Fix typo in comment: s/sub-theads/sub-threads 2022-08-03 10:44:54 +02:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c
fork.c seccomp: Move copy_seccomp() to no failure path. 2022-12-31 13:25:40 +01:00
freezer.c
gen_kheaders.sh kheaders: Have cpio unconditionally replace files 2022-05-08 03:16:59 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c kernel/hung_task: fix address space of proc_dohung_task_timeout_secs 2022-07-29 18:12:35 -07:00
iomem.c
irq_work.c irq_work: use kasan_record_aux_stack_noalloc() record callstack 2022-04-15 14:49:55 -07:00
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kallsyms.c Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
kallsyms_internal.h kallsyms: move declarations to internal header 2022-07-17 17:31:39 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
kcov.c kcov: update pos before writing pc in trace function 2022-05-25 13:05:42 -07:00
kexec.c
kexec_core.c kexec: drop weak attribute from functions 2022-07-15 12:21:16 -04:00
kexec_elf.c
kexec_file.c Updates to various subsystems which I help look after. lib, ocfs2, 2022-08-07 10:03:24 -07:00
kexec_internal.h
kheaders.c
kmod.c
kprobes.c kprobes: kretprobe events missing on 2-core KVM guest 2023-01-04 11:26:27 +01:00
ksysfs.c kernel/ksysfs.c: use helper macro __ATTR_RW 2022-03-23 19:00:33 -07:00
kthread.c kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal 2022-06-16 19:11:30 -07:00
latencytop.c latencytop: move sysctl to its own file 2022-04-21 11:40:59 -07:00
Makefile kernel: remove platform_has() infrastructure 2022-08-01 07:42:56 +02:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c padata: Fix list iterator in padata_do_serial() 2022-12-31 13:26:20 +01:00
panic.c linux-kselftest-kunit-5.20-rc1 2022-08-02 19:34:45 -07:00
params.c
pid.c
pid_namespace.c kernel: pid_namespace: use NULL instead of using plain integer as pointer 2022-04-29 14:38:00 -07:00
profile.c profile: setup_profiling_timer() is moslty not implemented 2022-07-29 18:12:36 -07:00
ptrace.c ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced() 2022-07-09 11:06:19 -07:00
range.c
reboot.c Merge branch 'rework/kthreads' into for-linus 2022-06-23 19:11:28 +02:00
regset.c
relay.c relay: fix type mismatch when allocating memory in relay_create_buf() 2022-12-31 13:25:47 +01:00
resource.c resource: Introduce alloc_free_mem_region() 2022-07-21 17:19:25 -07:00
resource_kunit.c
rseq.c rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered 2022-11-26 09:27:55 +01:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-04-11 17:07:29 -07:00
scs.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
seccomp.c seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-07 09:53:43 -07:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2022-07-19 11:40:00 -07:00
smpboot.c cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again. 2022-04-12 14:13:01 +02:00
smpboot.h
softirq.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
stackleak.c stackleak: add on/off stack variants 2022-05-08 01:33:09 -07:00
stacktrace.c
static_call.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys.c arm64/sme: Implement vector length configuration prctl()s 2022-04-22 18:50:54 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-08 11:30:22 +01:00
task_work.c task_work: allow TWA_SIGNAL without a rescheduling IPI 2022-04-30 08:39:32 -06:00
taskstats.c kernel: make taskstats available from all net namespaces 2022-04-29 14:38:03 -07:00
torture.c
tracepoint.c tracepoint: Allow trace events in modules with TAINT_TEST 2022-09-06 22:26:00 -04:00
tsacct.c taskstats: version 12 with thread group and exe info 2022-04-29 14:38:03 -07:00
ucount.c
uid16.c
uid16.h
umh.c kthread: Don't allocate kthread_struct for init and umh 2022-05-06 14:49:44 -05:00
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c
watch_queue.c This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
watchdog.c powerpc updates for 6.0 2022-08-06 16:38:17 -07:00
watchdog_hld.c Revert "printk: add functions to prefer direct printing" 2022-06-23 18:41:40 +02:00
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-08-16 06:27:35 -10:00
workqueue_internal.h