linux-stable/kernel/rcu
Frederic Weisbecker c817f5c016 rcu: Defer RCU kthreads wakeup when CPU is dying
[ Upstream commit e787644caf ]

When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiescent state propagates up to the top, some tasks may then be
woken up to complete the grace period: the main grace period kthread
and/or the expedited main workqueue (or kworker).

If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU. Since this happens
after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
timer gets ignored. Therefore if the RCU kthreads are waiting for RT
bandwidth to be available, they may never be actually scheduled.

This triggers TREE03 rcutorture hangs:

	 rcu: INFO: rcu_preempt self-detected stall on CPU
	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
	 rcu: RCU grace-period kthread stack dump:
	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
	 Call Trace:
	  <TASK>
	  __schedule+0x2eb/0xa80
	  schedule+0x1f/0x90
	  schedule_timeout+0x163/0x270
	  ? __pfx_process_timeout+0x10/0x10
	  rcu_gp_fqs_loop+0x37c/0x5b0
	  ? __pfx_rcu_gp_kthread+0x10/0x10
	  rcu_gp_kthread+0x17c/0x200
	  kthread+0xde/0x110
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork+0x2b/0x40
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork_asm+0x1b/0x30
	  </TASK>

The situation can't be solved with just unpinning the timer. The hrtimer
infrastructure and the nohz heuristics involved in finding the best
remote target for an unpinned timer would then also need to handle
enqueues from an offline CPU in the most horrendous way.

So fix this on the RCU side instead and defer the wake up to an online
CPU if it's too late for the local one.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:05 -08:00
..
Kconfig Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00
Kconfig.debug Char / Misc driver changes for 6.0-rc1 2022-08-04 11:05:48 -07:00
Makefile
rcu.h rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs 2023-07-19 16:21:01 +02:00
rcu_segcblist.c rcu: Clarify fill-the-gap comment in rcu_segcblist_advance() 2022-04-11 17:28:48 -07:00
rcu_segcblist.h rcu: Mark writes to the rcu_segcblist structure's ->flags field 2022-02-14 10:36:58 -08:00
rcuscale.c rcuscale: Move rcu_scale_writer() schedule_timeout_uninterruptible() to _idle() 2023-09-23 11:11:00 +02:00
rcutorture.c Merge branches 'doc.2022.08.31b', 'fixes.2022.08.31b', 'kvfree.2022.08.31b', 'nocb.2022.09.01a', 'poll.2022.08.31b', 'poll-srcu.2022.08.31b' and 'tasks.2022.08.31b' into HEAD 2022-09-01 10:55:57 -07:00
refscale.c refscale: Fix uninitalized use of wait_queue_head_t 2023-09-13 09:42:28 +02:00
srcutiny.c srcu: Make Tiny SRCU use full-sized grace-period counters 2022-08-31 05:10:15 -07:00
srcutree.c srcu: Fix callbacks acceleration mishandling 2024-01-10 17:10:26 +01:00
sync.c rcu_sync: Fix comment to properly reflect rcu_sync_exit() behavior 2022-04-20 16:51:11 -07:00
tasks.h rcu-tasks: Provide rcu_trace_implies_rcu_gp() 2024-01-25 15:27:26 -08:00
tiny.c Merge branches 'doc.2022.08.31b', 'fixes.2022.08.31b', 'kvfree.2022.08.31b', 'nocb.2022.09.01a', 'poll.2022.08.31b', 'poll-srcu.2022.08.31b' and 'tasks.2022.08.31b' into HEAD 2022-09-01 10:55:57 -07:00
tree.c rcu: Defer RCU kthreads wakeup when CPU is dying 2024-01-31 16:17:05 -08:00
tree.h rcu/tree: Defer setting of jiffies during stall reset 2023-11-28 17:07:11 +00:00
tree_exp.h rcu: Defer RCU kthreads wakeup when CPU is dying 2024-01-31 16:17:05 -08:00
tree_nocb.h rcu/nocb: Add CPU number to CPU-{,de}offload failure messages 2022-08-31 05:07:19 -07:00
tree_plugin.h rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp 2023-07-27 08:50:33 +02:00
tree_stall.h rcu/tree: Defer setting of jiffies during stall reset 2023-11-28 17:07:11 +00:00
update.c Merge branch 'ctxt.2022.07.05a' into HEAD 2022-07-21 17:46:18 -07:00