linux-stable/kernel/rcu
Frederic Weisbecker 4991cb2d43 rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
[ Upstream commit 55d4669ef1 ]

When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
rnp->qsmaskinitnext, it means that all accesses from the offline CPU
preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
callbacks expiration and counter updates.

However interrupts can still fire after stop_machine() re-enables
interrupts and before rcutree_report_cpu_dead(). The related accesses
happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
are _NOT_ guaranteed to be seen by rcu_barrier() without proper
ordering, especially when callbacks are invoked there to the end, making
rcutree_migrate_callback() bypass barrier_lock.

The following theoretical race example can make rcu_barrier() hang:

CPU 0                                               CPU 1
-----                                               -----
//cpu_down()
smpboot_park_threads()
//ksoftirqd is parked now
<IRQ>
rcu_sched_clock_irq()
   invoke_rcu_core()
do_softirq()
   rcu_core()
      rcu_do_batch()
         // callback storm
         // rcu_do_batch() returns
         // before completing all
         // of them
   // do_softirq also returns early because of
   // timeout. It defers to ksoftirqd but
   // it's parked
</IRQ>
stop_machine()
   take_cpu_down()
                                                    rcu_barrier()
                                                        spin_lock(barrier_lock)
                                                        // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
<IRQ>
do_softirq()
   rcu_core()
      rcu_do_batch()
         //completes all pending callbacks
         //smp_mb() implied _after_ callback number dec
</IRQ>

rcutree_report_cpu_dead()
   rnp->qsmaskinitnext &= ~rdp->grpmask;

rcutree_migrate_callback()
   // no callback, early return without locking
   // barrier_lock
                                                        //observes !rcu_rdp_cpu_online(rdp)
                                                        rcu_barrier_entrain()
                                                           rcu_segcblist_entrain()
                                                              // Observe rcu_segcblist_n_cbs(rsclp) == 0
                                                              // because no barrier between reading
                                                              // rnp->qsmaskinitnext and rsclp->len
                                                              rcu_segcblist_add_len()
                                                                 smp_mb__before_atomic()
                                                                 // will now observe the 0 count and empty
                                                                 // list, but too late, we enqueue regardless
                                                                 WRITE_ONCE(rsclp->len, rsclp->len + v);
                                                        // ignored barrier callback
                                                        // rcu barrier stall...

This could be solved with a read memory barrier, enforcing the message
passing between rnp->qsmaskinitnext and rsclp->len, matching the full
memory barrier after rsclp->len addition in rcu_segcblist_add_len()
performed at the end of rcu_do_batch().

However the rcu_barrier() is complicated enough and probably doesn't
need too many more subtleties. CPU down is a slowpath and the
barrier_lock seldom contended. Solve the issue with unconditionally
locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
that either rcu_barrier() sees the empty queue or its entrained
callback will be migrated.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-14 13:58:41 +02:00
..
Kconfig rcu: Employ jiffies-based backstop to callback time limit 2023-05-11 13:42:39 -07:00
Kconfig.debug rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts 2023-01-09 12:09:52 -08:00
Makefile rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcu.h rcu: Introduce rcu_cpu_online() 2024-01-10 17:16:56 +01:00
rcu_segcblist.c rcu: Throttle callback invocation based on number of ready callbacks 2023-01-03 17:28:34 -08:00
rcu_segcblist.h rcu: Throttle callback invocation based on number of ready callbacks 2023-01-03 17:28:34 -08:00
rcuscale.c rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() 2023-07-14 15:01:49 -07:00
rcutorture.c rcutorture: Fix rcu_torture_fwd_cb_cr() data race 2024-08-14 13:58:41 +02:00
refscale.c refscale: Add a "jiffies" test 2023-07-14 15:01:04 -07:00
srcutiny.c rcu: Annotate SRCU's update-side lockdep dependencies 2023-03-27 11:15:59 -07:00
srcutree.c srcu: Only accelerate on enqueue time 2023-11-28 17:19:36 +00:00
sync.c rcu/sync: Use call_rcu_hurry() instead of call_rcu 2022-11-29 14:04:33 -08:00
tasks.h rcu/tasks: Fix stale task snaphot for Tasks Trace 2024-08-03 08:53:20 +02:00
tiny.c rcu: Refactor kvfree_call_rcu() and high-level helpers 2023-01-03 17:48:40 -08:00
tree.c rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation 2024-08-14 13:58:41 +02:00
tree.h rcu/tree: Defer setting of jiffies during stall reset 2023-11-28 17:20:02 +00:00
tree_exp.h rcu/exp: Handle RCU expedited grace period kworker allocation failure 2024-03-26 18:19:17 -04:00
tree_nocb.h rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock() 2024-04-13 13:07:34 +02:00
tree_plugin.h rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp 2023-05-11 13:42:39 -07:00
tree_stall.h rcu: Fix buffer overflow in print_cpu_stall_info() 2024-06-12 11:11:32 +02:00
update.c Merge branch 'stall.2023.01.09a' into HEAD 2023-02-02 16:40:07 -08:00