linux-stable/kernel/locking
Mel Gorman 693e7e52a8 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
commit 1c0908d8e4 upstream.

Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-04 11:26:28 +01:00
..
Makefile locking/ww_mutex: Implement rtmutex based ww_mutex API functions 2021-08-17 19:05:26 +02:00
irqflag-debug.c lockdep: Noinstr annotate warn_bogus_irq_restore() 2021-02-10 14:44:39 +01:00
lock_events.c
lock_events.h locking/lock_events: Use raw_cpu_{add,inc}() for stats 2019-06-03 12:32:56 +02:00
lock_events_list.h locking/rwsem: Remove reader optimistic spinning 2020-12-09 17:08:48 +01:00
lockdep.c RCU pull request for v5.20 (or whatever) 2022-08-02 19:12:45 -07:00
lockdep_internals.h locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_proc.c locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_states.h
locktorture.c locktorture,rcutorture,torture: Always log error message 2021-12-07 16:36:17 -08:00
mcs_spinlock.h locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
mutex-debug.c locking/ww_mutex: Gather mutex_waiter initialization 2021-08-17 19:04:41 +02:00
mutex.c locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning 2022-04-05 10:24:36 +02:00
mutex.h locking/mutex: Move the 'struct mutex_waiter' definition from <linux/mutex.h> to the internal header 2021-08-17 18:24:31 +02:00
osq_lock.c locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
percpu-rwsem.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
qrwlock.c locking/qrwlock: Change "queue rwlock" to "queued rwlock" 2022-05-11 16:27:04 +02:00
qspinlock.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
qspinlock_paravirt.h Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-09-25 10:22:37 +02:00
qspinlock_stat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
rtmutex.c rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2023-01-04 11:26:28 +01:00
rtmutex_api.c rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2023-01-04 11:26:28 +01:00
rtmutex_common.h locking/rtmutex: Dont dereference waiter lockless 2021-08-25 15:42:32 +02:00
rwbase_rt.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
rwsem.c locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter 2022-07-30 10:58:28 +02:00
semaphore.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
spinlock.c locking/rwlocks: introduce write_lock_nested 2022-01-22 08:33:37 +02:00
spinlock_debug.c locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
spinlock_rt.c locking/rwlocks: introduce write_lock_nested 2022-01-22 08:33:37 +02:00
test-ww_mutex.c locking/ww-mutex: Fix uninitialized use of ret in test_aa() 2021-10-01 13:57:49 +02:00
ww_mutex.h locking/ww_mutex: Add rt_mutex based lock type and accessors 2021-08-17 19:05:11 +02:00
ww_rt_mutex.c kernel/locking: Use a pointer in ww_mutex_trylock(). 2021-11-17 14:48:49 +01:00