linux-stable/kernel/locking
Peter Zijlstra f7853c3424 locking/rtmutex: Fix task->pi_waiters integrity
Henry reported that rt_mutex_adjust_prio_check() has an ordering
problem and puts the lie to the comment in [7]. Sharing the sort key
between lock->waiters and owner->pi_waiters *does* create problems,
since unlike what the comment claims, holding [L] is insufficient.

Notably, consider:

	A
      /   \
     M1   M2
     |     |
     B     C

That is, task A owns both M1 and M2, B and C block on them. In this
case a concurrent chain walk (B & C) will modify their resp. sort keys
in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
is meaningless, they're different Ls.

This then gives rise to a race condition between [7] and [11], where
the requeue of pi_waiters will observe an inconsistent tree order.

	B				C

  (holds M1->wait_lock,		(holds M2->wait_lock,
   holds B->pi_lock)		 holds A->pi_lock)

  [7]
  waiter_update_prio();
  ...
  [8]
  raw_spin_unlock(B->pi_lock);
  ...
  [10]
  raw_spin_lock(A->pi_lock);

				[11]
				rt_mutex_enqueue_pi();
				// observes inconsistent A->pi_waiters
				// tree order

Fixing this means either extending the range of the owner lock from
[10-13] to [6-13], with the immediate problem that this means [6-8]
hold both blocked and owner locks, or duplicating the sort key.

Since the locking in chain walk is horrible enough without having to
consider pi_lock nesting rules, duplicate the sort key instead.

By giving each tree their own sort key, the above race becomes
harmless, if C sees B at the old location, then B will correct things
(if they need correcting) when it walks up the chain and reaches A.

Fixes: fb00aca474 ("rtmutex: Turn the plist into an rb-tree")
Reported-by: Henry Wu <triangletrap12@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Henry Wu <triangletrap12@gmail.com>
Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
2023-07-17 13:59:10 +02:00
..
irqflag-debug.c
lock_events.c
lock_events.h locking: add lockevent_read() prototype 2023-06-09 17:44:15 -07:00
lock_events_list.h
lockdep.c Locking changes for v6.5: 2023-06-27 14:14:30 -07:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
locktorture.c locktorture: Add long_hold to adjust lock-hold delays 2023-05-11 13:46:36 -07:00
Makefile
mcs_spinlock.h
mutex-debug.c
mutex.c
mutex.h
osq_lock.c
percpu-rwsem.c
qrwlock.c
qspinlock.c locking/qspinlock: Micro-optimize pending state waiting for unlock 2023-01-05 11:01:50 +01:00
qspinlock_paravirt.h
qspinlock_stat.h
rtmutex.c locking/rtmutex: Fix task->pi_waiters integrity 2023-07-17 13:59:10 +02:00
rtmutex_api.c locking/rtmutex: Fix task->pi_waiters integrity 2023-07-17 13:59:10 +02:00
rtmutex_common.h locking/rtmutex: Fix task->pi_waiters integrity 2023-07-17 13:59:10 +02:00
rwbase_rt.c locking/rwbase: Mitigate indefinite writer starvation 2023-04-29 09:08:52 +02:00
rwsem.c locking/rwsem: Add __always_inline annotation to __down_read_common() and inlined callers 2023-05-08 10:58:24 +02:00
semaphore.c
spinlock.c
spinlock_debug.c
spinlock_rt.c
test-ww_mutex.c locking: Reduce the number of locks in ww_mutex stress tests 2023-03-27 11:16:01 -07:00
ww_mutex.h locking/rtmutex: Fix task->pi_waiters integrity 2023-07-17 13:59:10 +02:00
ww_rt_mutex.c