Commit graph

899 commits

Author SHA1 Message Date
Frederic Weisbecker
1721145527 rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking
Move the core RCU eqs/dynticks functions to context tracking so that
we can later merge all that code within context tracking.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
564506495c rcu/context-tracking: Move deferred nocb resched to context tracking
To prepare for migrating the RCU eqs accounting code to context tracking,
split the last-resort deferred nocb resched from rcu_user_enter() and
move it into a separate call from context tracking.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
95e04f48ec rcu/context_tracking: Move dynticks_nmi_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking
subsystem. The related nesting counters thus need to be moved to the
context tracking structure.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
904e600e60 rcu/context_tracking: Move dynticks_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking
subsystem. The related nesting counters thus need to be moved to the
context tracking structure.

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
62e2412df4 rcu/context_tracking: Move dynticks counter to context tracking
In order to prepare for merging RCU dynticks counter into the context
tracking state, move the rcu_data's dynticks field to the context
tracking structure. It will later be mixed within the context tracking
state itself.

[ paulmck: Move enum ctx_state into global scope. ]

Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
3864caafe7 rcu/context-tracking: Remove rcu_irq_enter/exit()
Now rcu_irq_enter/exit() is an unnecessary middle call between
ct_irq_enter/exit() and nmi_irq_enter/exit(). Take this opportunity
to remove the former functions and move the comments above them to the
new entrypoints.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:59 -07:00
Frederic Weisbecker
e67198cc05 context_tracking: Take idle eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking
subsystem. Start with moving the idle extended quiescent states
entrypoints to context tracking. For now those are dumb redirections to
existing RCU calls.

[ paulmck: Apply kernel test robot feedback. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
2022-07-05 13:32:16 -07:00
Paul E. McKenney
ed4ae5eff4 rcu: Apply noinstr to rcu_idle_enter() and rcu_idle_exit()
This commit applies the "noinstr" tag to the rcu_idle_enter() and
rcu_idle_exit() functions, which are invoked from portions of the idle
loop that cannot be instrumented.  These tags require reworking the
rcu_eqs_enter() and rcu_eqs_exit() functions that these two functions
invoke in order to cause them to use normal assertions rather than
lockdep.  In addition, within rcu_idle_exit(), the raw versions of
local_irq_save() and local_irq_restore() are used, again to avoid issues
with lockdep in uninstrumented code.

This patch is based in part on an earlier patch by Jiri Olsa, discussions
with Peter Zijlstra and Frederic Weisbecker, earlier changes by Thomas
Gleixner, and off-list discussions with Yonghong Song.

Link: https://lore.kernel.org/lkml/20220515203653.4039075-1-jolsa@kernel.org/
Reported-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Alexei Starovoitov <ast@kernel.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Yonghong Song <yhs@fb.com>
2022-06-20 09:30:10 -07:00
Paul E. McKenney
ce13389053 Merge branch 'exp.2022.05.11a' into HEAD
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
2022-05-11 11:49:35 -07:00
Kalesh Singh
9621fbee44 rcu: Move expedited grace period (GP) work to RT kthread_worker
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tim Murray <timmurray@google.com>
Reported-by: Wei Wang <wvw@google.com>
Tested-by: Kyle Lin <kylelin@google.com>
Tested-by: Chunwei Lu <chunweilu@google.com>
Tested-by: Lulu Wang <luluw@google.com>
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-05-11 11:47:10 -07:00
Paul E. McKenney
be05ee5437 Merge branches 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b', 'rcu-tasks.2022.04.11b', 'srcu.2022.05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
docs.2022.04.20a: Documentation updates.
fixes.2022.04.20a: Miscellaneous fixes.
nocb.2022.04.11b: Callback-offloading updates.
rcu-tasks.2022.04.11b: RCU-tasks updates.
srcu.2022.05.03a: Put SRCU on a memory diet.
torture.2022.04.11b: Torture-test updates.
torture-tasks.2022.04.20a: Avoid torture testing changing RCU configuration.
torturescript.2022.04.20a: Torture-test scripting updates.
2022-05-03 10:21:40 -07:00
Frederic Weisbecker
70ae7b0ce0 rcu: Fix preemption mode check on synchronize_rcu[_expedited]()
An early check on synchronize_rcu[_expedited]() tries to determine if
the current CPU is in UP mode on an SMP no-preempt kernel, in which case
there is no need to start a grace period since the current assumed
quiescent state is all we need.

However the preemption mode doesn't take into account the boot selected
preemption mode under CONFIG_PREEMPT_DYNAMIC=y, missing a possible
early return if the running flavour is "none" or "voluntary".

Use the shiny new preempt mode accessors to fix this.  However,
avoid invoking them during early boot because doing so triggers a
WARN_ON_ONCE().

[ paulmck: Update for mainlined API. ]

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20 16:51:11 -07:00
Paul E. McKenney
75182a4eaa rcu: Add comments to final rcu_gp_cleanup() "if" statement
The final "if" statement in rcu_gp_cleanup() has proven to be rather
confusing, straightforward though it might have seemed when initially
written.  This commit therefore adds comments to its "then" and "else"
clauses to at least provide a more elevated form of confusion.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20 16:51:10 -07:00
Paul E. McKenney
c708b08c65 rcu: Check for jiffies going backwards
A report of a 12-jiffy normal RCU CPU stall warning raises interesting
questions about the nature of time on the offending system.  This commit
instruments rcu_sched_clock_irq(), which is RCU's hook into the
scheduling-clock interrupt, checking for the jiffies counter going
backwards.

Reported-by: Saravanan D <sarvanand@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:28:48 -07:00
Paul E. McKenney
99d6a2acb8 rcutorture: Suppress debugging grace period delays during flooding
Tree RCU supports grace-period delays using the rcutree.gp_cleanup_delay,
rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot
parameters.  These delays are strictly for debugging purposes, and have
proven quite effective at exposing bugs involving race with CPU-hotplug
operations.  However, these delays can result in false positives when
used in conjunction with callback flooding, for example, those generated
by the rcutorture.fwd_progress kernel boot parameter.

This commit therefore suppresses grace-period delays while callback
flooding is in progress.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:07:28 -07:00
Paul E. McKenney
5d90070816 rcu-tasks: Make Tasks RCU account for userspace execution
The main Tasks RCU quiescent state is voluntary context switch.  However,
userspace execution is also a valid quiescent state, and is a valuable one
for userspace applications that spin repeatedly executing light-weight
non-sleeping system calls.  Currently, such an application can delay a
Tasks RCU grace period for many tens of seconds.

This commit therefore enlists the aid of the scheduler-clock interrupt to
provide a Tasks RCU quiescent state when it interrupted a task executing
in userspace.

[ paulmck: Apply feedback from kernel test robot. ]

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Neil Spring <ntspring@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:06:42 -07:00
Frederic Weisbecker
87c5adf06b rcu/nocb: Initialize nocb kthreads only for boot CPU prior SMP initialization
The rcu_spawn_gp_kthread() function is called as an early initcall, which
means that SMP initialization hasn't happened yet and only the boot CPU is
online. Therefore, create only the NOCB kthreads related to the boot CPU.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:05:58 -07:00
Frederic Weisbecker
3352911fa9 rcu: Initialize boost kthread only for boot node prior SMP initialization
The rcu_spawn_gp_kthread() function is called as an early initcall,
which means that SMP initialization hasn't happened yet and only the
boot CPU is online.  Therefore, create only the boost kthread for the
leaf node of the boot CPU.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:05:58 -07:00
Frederic Weisbecker
2eed973adc rcu: Assume rcu_init() is called before smp
The rcu_init() function is called way before SMP is initialized and
therefore only the boot CPU should be online at this stage.

Simplify the boot per-cpu initialization accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-11 17:05:58 -07:00
Linus Torvalds
35dc0352bb RCU pull request for v5.18
This pull request contains the following branches:
 
 exp.2022.02.24a: Contains a fix for idle detection from Neeraj Upadhyay
 	and missing access marking detected by KCSAN.
 
 fixes.2022.02.14a: Miscellaneous fixes.
 
 rcu_barrier.2022.02.08a: Reduces coupling between rcu_barrier() and
 	CPU-hotplug operations, so that rcu_barrier() no longer needs
 	to do cpus_read_lock().  This may also someday allow system
 	boot to bring CPUs online concurrently.
 
 rcu-tasks.2022.02.08a: Enable more aggressive movement to per-CPU
 	queueing when reacting to excessive lock contention due
 	to workloads placing heavy update-side stress on RCU tasks.
 
 rt.2022.02.01b: Improvements to RCU priority boosting, including
 	changes from Neeraj Upadhyay, Zqiang, and Alison Chaiken.
 
 torture.2022.02.01b: Various fixes improving test robustness and
 	debug information.
 
 torturescript.2022.02.08a: Add tests for SRCU size transitions, further
 	compress torture.sh build products, and improve debug output.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmIusb0THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jAklD/9VXLK7crcg2YeRXUIg1IOdnancsVCV
 MNtTfxNYqYIis+W2UfuHKuQu2yEXF5fihdY0J9TQv0byHsprp6FIZT+i1An4Ukgd
 0vyHjd/DaIKgs2txsB1DjhlatWlJUfQuBwhtNUkpYFLFwKdCI1l813bPbNlL+GiL
 p0ZejVMpBC5HgE6sDOtaaQSAB+AEUp+Lgr+yaG/On8hfzwWFKO8KldxhiKY9n07v
 SNDfKDgXB+80hx4RBVGbkuogV3s9brFULoNRXJy7Uf79DtiY09uazhhA3G0TjO34
 zGwmF91dqsXDF/Uz8g4aZO0xYRXUchOrsQ5lgO/GhTVbM9I0wWlMHEk/8WHyBJkU
 vlXOMuwzBc9/5uwZE3rnkA4a3nkXhPQjLlCr+/I7A/7Vsv9IBW9WSlgMvUN0Qf4S
 XAwTnIqfErnR60a+L0+HRr5kIV5VoXcxqI/Nv0/4/BMLRubS/c7cYjOTxXNJL9SU
 50pv5vty9xk3HSpuz0JAOyLf+PUT773uUQhFr5xCBSCVqbAm5WFg6hWPAgrN/tUS
 wstBc0wlA73rKVJxeLDQwHc/oT1zTUEzswVZITQ5zLHK0t0GbeR6QHccsdeaJyTe
 DisX+66A6YQrEuJmx5xUZqjYHqtYLDOBTbHA3ZwQmvjKu8ibWZ8Fg9ioURLCS4bF
 +FVkp/5KdcAN9w==
 =ljVY
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Fix idle detection (Neeraj Upadhyay) and missing access marking
   detected by KCSAN.

 - Reduce coupling between rcu_barrier() and CPU-hotplug operations, so
   that rcu_barrier() no longer needs to do cpus_read_lock(). This may
   also someday allow system boot to bring CPUs online concurrently.

 - Enable more aggressive movement to per-CPU queueing when reacting to
   excessive lock contention due to workloads placing heavy update-side
   stress on RCU tasks.

 - Improvements to RCU priority boosting, including changes from Neeraj
   Upadhyay, Zqiang, and Alison Chaiken.

 - Various fixes improving test robustness and debug information.

 - Add tests for SRCU size transitions, further compress torture.sh
   build products, and improve debug output.

 - Miscellaneous fixes.

* tag 'rcu.2022.03.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
  rcu: Replace cpumask_weight with cpumask_empty where appropriate
  rcu: Remove __read_mostly annotations from rcu_scheduler_active externs
  rcu: Uninline multi-use function: finish_rcuwait()
  rcu: Mark writes to the rcu_segcblist structure's ->flags field
  kasan: Record work creation stack trace with interrupts enabled
  rcu: Inline __call_rcu() into call_rcu()
  rcu: Add mutex for rcu boost kthread spawning and affinity setting
  rcu: Fix description of kvfree_rcu()
  MAINTAINERS:  Add Frederic and Neeraj to their RCU files
  rcutorture: Provide non-power-of-two Tasks RCU scenarios
  rcutorture: Test SRCU size transitions
  torture: Make torture.sh help message match reality
  rcu-tasks: Set ->percpu_enqueue_shift to zero upon contention
  rcu-tasks: Use order_base_2() instead of ilog2()
  rcu: Create and use an rcu_rdp_cpu_online()
  rcu: Make rcu_barrier() no longer block CPU-hotplug operations
  rcu: Rework rcu_barrier() and callback-migration logic
  rcu: Refactor rcu_barrier() empty-list handling
  rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
  torture: Change KVM environment variable to RCUTORTURE
  ...
2022-03-21 14:00:56 -07:00
Frederic Weisbecker
2984539959 tick/rcu: Remove obsolete rcu_needs_cpu() parameters
With the removal of CONFIG_RCU_FAST_NO_HZ, the parameters in
rcu_needs_cpu() are not necessary anymore. Simply remove them.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
2022-03-07 23:01:26 +01:00
Paul E. McKenney
d5578190be Merge branches 'exp.2022.02.24a', 'fixes.2022.02.14a', 'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD
exp.2022.02.24a: Expedited grace-period updates.
fixes.2022.02.14a: Miscellaneous fixes.
rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug.
rcu-tasks.2022.02.08a: RCU-tasks updates.
rt.2022.02.01b: Real-time-related updates.
torture.2022.02.01b: Torture-test updates.
torturescript.2022.02.08a: Torture-test scripting updates.
2022-02-24 09:38:46 -08:00
Zqiang
d818cc76e2 kasan: Record work creation stack trace with interrupts enabled
Recording the work creation stack trace for KASAN reports in
call_rcu() is expensive, due to unwinding the stack, but also
due to acquiring depot_lock inside stackdepot (which may be contended).
Because calling kasan_record_aux_stack_noalloc() does not require
interrupts to already be disabled, this may unnecessarily extend
the time with interrupts disabled.

Therefore, move calling kasan_record_aux_stack() before the section
with interrupts disabled.

Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-14 10:36:58 -08:00
Paul E. McKenney
1fe09ebe7a rcu: Inline __call_rcu() into call_rcu()
Because __call_rcu() is invoked only by call_rcu(), this commit inlines
the former into the latter.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-14 10:36:58 -08:00
David Woodhouse
218b957a69 rcu: Add mutex for rcu boost kthread spawning and affinity setting
As we handle parallel CPU bringup, we will need to take care to avoid
spawning multiple boost threads, or race conditions when setting their
affinity. Spotted by Paul McKenney.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-14 10:36:35 -08:00
Paul E. McKenney
5ae0f1b58b rcu: Create and use an rcu_rdp_cpu_online()
The pattern "rdp->grpmask & rcu_rnp_online_cpus(rnp)" occurs frequently
in RCU code in order to determine whether rdp->cpu is online from an
RCU perspective.  This commit therefore creates an rcu_rdp_cpu_online()
function to replace it.

[ paulmck: Apply kernel test robot unused-variable feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:12:28 -08:00
Paul E. McKenney
80b3fd474c rcu: Make rcu_barrier() no longer block CPU-hotplug operations
This commit removes the cpus_read_lock() and cpus_read_unlock() calls
from rcu_barrier(), thus allowing CPUs to come and go during the course
of rcu_barrier() execution.  Posting of the ->barrier_head callbacks does
synchronize with portions of RCU's CPU-hotplug notifiers, but these locks
are held for short time periods on both sides.  Thus, full CPU-hotplug
operations could both start and finish during the execution of a given
rcu_barrier() invocation.

Additional synchronization is provided by a global ->barrier_lock.
Since the ->barrier_lock is only used during rcu_barrier() execution and
during onlining/offlining a CPU, the contention for this lock should
be low.  It might be tempting to make use of a per-CPU lock just on
general principles, but straightforward attempts to do this have the
problems shown below.

Initial state: 3 CPUs present, CPU 0 and CPU1 do not have
any callback and CPU2 has callbacks.

1. CPU0 calls rcu_barrier().

2. CPU1 starts offlining for CPU2. CPU1 calls
   rcutree_migrate_callbacks(). rcu_barrier_entrain() is called
   from rcutree_migrate_callbacks(), with CPU2's rdp->barrier_lock.
   It does not entrain ->barrier_head for CPU2, as rcu_barrier()
   on CPU0 hasn't started the barrier sequence (by calling
   rcu_seq_start(&rcu_state.barrier_sequence)) yet.

3. CPU0 starts new barrier sequence. It iterates over
   CPU0 and CPU1, after acquiring their per-cpu ->barrier_lock
   and finds 0 segcblist length. It updates ->barrier_seq_snap
   for CPU0 and CPU1 and continues loop iteration to CPU2.

    for_each_possible_cpu(cpu) {
        raw_spin_lock_irqsave(&rdp->barrier_lock, flags);
        if (!rcu_segcblist_n_cbs(&rdp->cblist)) {
            WRITE_ONCE(rdp->barrier_seq_snap, gseq);
            raw_spin_unlock_irqrestore(&rdp->barrier_lock, flags);
            rcu_barrier_trace(TPS("NQ"), cpu, rcu_state.barrier_sequence);
            continue;
        }

4. rcutree_migrate_callbacks() completes execution on CPU1.
   Segcblist len for CPU2 becomes 0.

5. The loop iteration on CPU0, checks rcu_segcblist_n_cbs(&rdp->cblist)
   for CPU2 and completes the loop iteration after setting
   ->barrier_seq_snap.

6. As there isn't any ->barrier_head callback entrained; at
   this point, rcu_barrier() in CPU0 returns.

7. The callbacks, which migrated from CPU2 to CPU1, execute.

Straightforward per-CPU locking is also subject to the following race
condition noted by Boqun Feng:

1. CPU0 calls rcu_barrier(), starting a new barrier sequence by invoking
   rcu_seq_start() and init_completion(), but does not yet initialize
   rcu_state.barrier_cpu_count.

2. CPU1 starts offlining for CPU2, calling rcutree_migrate_callbacks(),
   which in turn calls rcu_barrier_entrain() holding CPU2's.
   rdp->barrier_lock.  It then entrains ->barrier_head for CPU2
   and atomically increments rcu_state.barrier_cpu_count, which is
   unfortunately not yet initialized to the value 2.

3. The just-entrained RCU callback is invoked.  It atomically
   decrements rcu_state.barrier_cpu_count and sees that it is
   now zero.  This callback therefore invokes complete().

4. CPU0 continues executing rcu_barrier(), but is not blocked
   by its call to wait_for_completion().  This results in rcu_barrier()
   returning before all pre-existing callbacks have been invoked,
   which is a bug.

Therefore, synchronization is provided by rcu_state.barrier_lock,
which is also held across the initialization sequence, especially the
rcu_seq_start() and the atomic_set() that sets rcu_state.barrier_cpu_count
to the value 2.  In addition, this lock is held when entraining the
rcu_barrier() callback, when deciding whether or not a CPU has callbacks
that rcu_barrier() must wait on, when setting the ->qsmaskinitnext for
incoming CPUs, and when migrating callbacks from a CPU that is going
offline.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:12:28 -08:00
Paul E. McKenney
a16578dd5e rcu: Rework rcu_barrier() and callback-migration logic
This commit reworks rcu_barrier() and callback-migration logic to
permit allowing rcu_barrier() to run concurrently with CPU-hotplug
operations.  The key trick is for callback migration to check to see if
an rcu_barrier() is in flight, and, if so, enqueue the ->barrier_head
callback on its behalf.

This commit adds synchronization with RCU's CPU-hotplug notifiers.  Taken
together, this will permit a later commit to remove the cpus_read_lock()
and cpus_read_unlock() calls from rcu_barrier().

[ paulmck: Updated per kbuild test robot feedback. ]
[ paulmck: Updated per reviews session with Neeraj, Frederic, Uladzislau, and Boqun. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:12:28 -08:00
Paul E. McKenney
0cabb47af3 rcu: Refactor rcu_barrier() empty-list handling
This commit saves a few lines by checking first for an empty callback
list.  If the callback list is empty, then that CPU is taken care of,
regardless of its online or nocb state.  Also simplify tracing accordingly
and fold a few lines together.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:12:28 -08:00
David Woodhouse
82980b1622 rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.

However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.

So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).

There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.

At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-08 10:11:41 -08:00
Zqiang
c951587585 rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings
When the rcutree.use_softirq kernel boot parameter is set to zero, all
RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads.
If these kthreads are being starved, quiescent states will not be
reported, which in turn means that the grace period will not end, which
can in turn trigger RCU CPU stall warnings.  This commit therefore dumps
stack traces of stalled CPUs' rcuc kthreads, which can help identify
what is preventing those kthreads from running.

Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-01 17:22:17 -08:00
Alison Chaiken
c8b16a6526 rcu: Elevate priority of offloaded callback threads
When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
parameter signals initialization code to boost the priority of rcuc
callbacks to the designated value.  With the additional
CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
command-line parameter, the callbacks on the listed cores are
offloaded to new rcuop kthreads that are not pinned to the cores whose
post-grace-period work is performed.  While the rcuop kthreads perform
the same function as the rcuc kthreads they offload, the kthread_prio
parameter only boosts the priority of the rcuc kthreads.  Fix this
inconsistency by elevating rcuop kthreads to the same priority as the rcuc
kthreads.

Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-01 17:19:25 -08:00
Alison Chaiken
c8db27dd0e rcu: Move kthread_prio bounds-check to a separate function
Move the bounds-check of the kthread_prio cmdline parameter to a new
function in order to faciliate a different callsite.

Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-01 17:19:02 -08:00
Zqiang
4b4399b245 rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0
The per-CPU "rcuc" kthreads are used only by kernels booted with
rcutree.use_softirq=0, but they are nevertheless unconditionally created
by kernels built with CONFIG_RCU_BOOST=y.  This results in "rcuc"
kthreads being created that are never actually used.  This commit
therefore refrains from creating these kthreads unless the kernel
is actually booted with rcutree.use_softirq=0.

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-02-01 17:19:02 -08:00
Paul E. McKenney
f80fe66c38 Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD
doc.2021.11.30c: Documentation updates.
exp.2021.12.07a: Expedited-grace-period fixes.
fastnohz.2021.11.30c: Remove CONFIG_RCU_FAST_NO_HZ.
fixes.2021.11.30c: Miscellaneous fixes.
nocb.2021.12.09a: No-CB CPU updates.
nolibc.2021.11.30c: Tiny in-kernel library updates.
tasks.2021.12.09a: RCU-tasks updates, including update-side scalability.
torture.2021.12.07a: Torture-test in-kernel module updates.
torturescript.2021.11.30c: Torture-test scripting updates.
2021-12-09 11:38:09 -08:00
Frederic Weisbecker
0598a4d442 rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
rcu_core() tries to ensure that its self-invocation in case of callbacks
overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
to trigger local RCU core from nocb_cb kthread since it can execute
on a CPU different from the target rdp. Also in case of overload, the
nocb_cb kthread simply iterates a new loop of callbacks processing.

However the "offloaded" check that aims at preventing misplaced
rcu_core() invocations is wrong. First of all that state is volatile
and second: softirq/rcuc can execute while the target rdp is offloaded.
As a result rcu_core() can be invoked on the wrong CPU while in the
process of (de-)offloading.

Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
irrespective of the rdp offloaded state.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
a554ba2888 rcu: Apply callbacks processing time limit only on softirq
Time limit only makes sense when callbacks are serviced in softirq mode
because:

_ In case we need to get back to the scheduler,
  cond_resched_tasks_rcu_qs() is called after each callback.

_ In case some other softirq vector needs the CPU, the call to
  local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
  them via a call to do_softirq().

Therefore, make sure the time limit only applies to softirq mode.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
3e61e95e2d rcu: Fix callbacks processing time limit retaining cond_resched()
The callbacks processing time limit makes sure we are not exceeding a
given amount of time executing the queue.

However its "continue" clause bypasses the cond_resched() call on
rcuc and NOCB kthreads, delaying it until we reach the limit, which can
be very long...

Make sure the scheduler has a higher priority than the time limit.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
78ad37a2c5 rcu/nocb: Limit number of softirq callbacks only on softirq
The current condition to limit the number of callbacks executed in a
row checks the offloaded state of the rdp. Not only is it volatile
but it is also misleading: the rcu_core() may well be executing
callbacks concurrently with NOCB kthreads, and the offloaded state
would then be verified on both cases. As a result the limit would
spuriously not apply anymore on softirq while in the middle of
(de-)offloading process.

Fix and clarify the condition with those constraints in mind:

_ If callbacks are processed either by rcuc or NOCB kthread, the call
  to cond_resched_tasks_rcu_qs() is enough to take care of the overload.

_ If instead callbacks are processed by softirqs:
  * If need_resched(), exit the callbacks processing
  * Otherwise if CPU is idle we can continue
  * Otherwise exit because a softirq shouldn't interrupt a task for too
    long nor deprive other pending softirq vectors of the CPU.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
7b65dfa32d rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
Instead of hardcoding IRQ save and nocb lock, use the consolidated
API (and fix a comment as per Valentin Schneider's suggestion).

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
344e219d7d rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
It's not entirely obvious why rdp->qlen_last_fqs_check is updated before
processing the queue only on offloaded rdp. There can be different
effect to that, either in favour of triggering the force quiescent state
path or not. For example:

1) If the number of callbacks has decreased since the last
   rdp->qlen_last_fqs_check update (because we recently called
   rcu_do_batch() and we executed below qhimark callbacks) and the number
   of processed callbacks on a subsequent do_batch() arranges for
   exceeding qhimark on non-offloaded but not on offloaded setup, then we
   may spare a later run to the force quiescent state
   slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded
   counterpart scenario.

   Here is such an offloaded scenario instance:

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 1000 callback
        rcu_segcblist_n_cbs(rdp) = 1000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // not taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 2001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 1000 and the fqs slowpath would have executed.

2) If the number of callbacks has increased since the last
   rdp->qlen_last_fqs_check update (because we recently queued below
   qhimark callbacks) and the number of callbacks executed in rcu_do_batch()
   doesn't exceed qhimark for either offloaded or non-offloaded setup,
   then it's possible that the offloaded scenario later run the force
   quiescent state slow path on __call_rcu_nocb_wake() while the
   non-offloaded doesn't.

    qhimark = 1000
    rdp->last_qlen_last_fqs_check = 3000
    rcu_segcblist_n_cbs(rdp) = 2000

    rcu_do_batch() {
        if (offloaded)
            rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000
        // run 100 callbacks
        // concurrent queued 100
        rcu_segcblist_n_cbs(rdp) = 2000
        // Not updating rdp->qlen_last_fqs_check
        if (count < rdp->qlen_last_fqs_check - qhimark)
            rdp->qlen_last_fqs_check = count;
    }

    call_rcu() * 1001 {
        __call_rcu_nocb_wake() {
            // Taking the fqs slowpath:
            // rcu_segcblist_n_cbs(rdp) == 3001
            // rdp->qlen_last_fqs_check == 2000
            // qhimark == 1000
            if (len > rdp->qlen_last_fqs_check + qhimark)
                ...
    }

    In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check
    would be 3000 and the fqs slowpath would have executed.

The reason for updating rdp->qlen_last_fqs_check when invoking callbacks
for offloaded CPUs is that there is usually no point in waking up either
the rcuog or rcuoc kthreads while in this state.  After all, both threads
are prohibited from indefinite sleeps.

The exception is when some huge number of callbacks are enqueued while
rcu_do_batch() is in the midst of invoking, in which case interrupting
the rcuog kthread's timed sleep might get more callbacks set up for the
next grace period.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Original-patch-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
b3bb02fe5a rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
When callbacks are offloaded, the NOCB kthreads handle the callbacks
progression on behalf of rcu_core().

However during the (de-)offloading process, the kthread may not be
entirely up to the task. As a result some callbacks grace period
sequence number may remain stale for a while because rcu_core() won't
take care of them either.

Fix this with forcing callbacks acceleration from rcu_core() as long
as the offloading process isn't complete.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Thomas Gleixner
24ee940d89 rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.

This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.

However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.

This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.

Fix this with moving the offloaded check inside the rnp locking section.

Reported-and-tested-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
fbb94cbd70 rcu/nocb: Invoke rcu_core() at the start of deoffloading
On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.

In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.

Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.

Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Frederic Weisbecker
213d56bf33 rcu/nocb: Prepare state machine for a new step
Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
segcblist flags because it is an exclusive state that doesn't mix up
with the other flags. Remove it in favour of:

_ A flag specifying that rcu_core() needs to perform callbacks execution
  and acceleration

and

_ A flag specifying we want the nocb lock to be held in any needed
  circumstances

This clarifies the code and is more flexible: It allows to have a state
where rcu_core() runs with locking while offloading hasn't started yet.
This is a necessary step to prepare for triggering rcu_core() at the
very beginning of the de-offloading process so that rcu_core() won't
dismiss work while being preempted by the de-offloading process, at
least not without a pending subsequent rcu_core() that will quickly
catch up.

Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:24:44 -08:00
Paul E. McKenney
614ddad17f rcu: Tighten rcu_advance_cbs_nowake() checks
Currently, rcu_advance_cbs_nowake() checks that a grace period is in
progress, however, that grace period could end just after the check.
This commit rechecks that a grace period is still in progress while
holding the rcu_node structure's lock.  The grace period cannot end while
the current CPU's rcu_node structure's ->lock is held, thus avoiding
false positives from the WARN_ON_ONCE().

As Daniel Vacek noted, it is not necessary for the rcu_node structure
to have a CPU that has not yet passed through its quiescent state.

Tested-by: Guillaume Morin <guillaume@morinfr.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:23:03 -08:00
Paul E. McKenney
790da24897 rcu: Make idle entry report expedited quiescent states
In non-preemptible kernels, an unfortunately timed expedited grace period
can result in the rcu_exp_handler() IPI handler setting the rcu_data
structure's cpu_no_qs.b.exp field just as the target CPU enters idle.
There are situations in which this field will not be checked until after
that CPU exits idle.  The resulting grace-period latency does not qualify
as "expedited".

This commit therefore checks this field upon non-preemptible idle entry in
the rcu_preempt_deferred_qs() function.  It also qualifies the rcu_core()
preempt_count() check with IS_ENABLED(CONFIG_PREEMPT_COUNT) to prevent
false-positive quiescent states from count-free kernels.

Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07 16:22:22 -08:00
Jun Miao
300c0c5e72 rcu: Avoid alloc_pages() when recording stack
The default kasan_record_aux_stack() calls stack_depot_save() with GFP_NOWAIT,
which in turn can then call alloc_pages(GFP_NOWAIT, ...).  In general, however,
it is not even possible to use either GFP_ATOMIC nor GFP_NOWAIT in certain
non-preemptive contexts/RT kernel including raw_spin_locks (see gfp.h and ab00db216c).
Fix it by instructing stackdepot to not expand stack storage via alloc_pages()
in case it runs out by using kasan_record_aux_stack_noalloc().

Jianwei Hu reported:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 15319, name: python3
INFO: lockdep is turned off.
irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last  enabled at (0): [<ffffffff856c8b13>] copy_process+0xaf3/0x2590
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 6 PID: 15319 Comm: python3 Tainted: G        W  O 5.15-rc7-preempt-rt #1
  Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1b 12/17/2018
  Call Trace:
    show_stack+0x52/0x58
    dump_stack+0xa1/0xd6
    ___might_sleep.cold+0x11c/0x12d
    rt_spin_lock+0x3f/0xc0
    rmqueue+0x100/0x1460
    rmqueue+0x100/0x1460
    mark_usage+0x1a0/0x1a0
    ftrace_graph_ret_addr+0x2a/0xb0
    rmqueue_pcplist.constprop.0+0x6a0/0x6a0
     __kasan_check_read+0x11/0x20
     __zone_watermark_ok+0x114/0x270
     get_page_from_freelist+0x148/0x630
     is_module_text_address+0x32/0xa0
     __alloc_pages_nodemask+0x2f6/0x790
     __alloc_pages_slowpath.constprop.0+0x12d0/0x12d0
     create_prof_cpu_mask+0x30/0x30
     alloc_pages_current+0xb1/0x150
     stack_depot_save+0x39f/0x490
     kasan_save_stack+0x42/0x50
     kasan_save_stack+0x23/0x50
     kasan_record_aux_stack+0xa9/0xc0
     __call_rcu+0xff/0x9c0
     call_rcu+0xe/0x10
     put_object+0x53/0x70
     __delete_object+0x7b/0x90
     kmemleak_free+0x46/0x70
     slab_free_freelist_hook+0xb4/0x160
     kfree+0xe5/0x420
     kfree_const+0x17/0x30
     kobject_cleanup+0xaa/0x230
     kobject_put+0x76/0x90
     netdev_queue_update_kobjects+0x17d/0x1f0
     ... ...
     ksys_write+0xd9/0x180
     __x64_sys_write+0x42/0x50
     do_syscall_64+0x38/0x50
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Links: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/kasan.h?id=7cb3007ce2da27ec02a1a3211941e7fe6875b642
Fixes: 84109ab585 ("rcu: Record kvfree_call_rcu() call stack for KASAN")
Fixes: 26e760c9a7 ("rcu: kasan: record and print call_rcu() call stack")
Reported-by: Jianwei Hu <jianwei.hu@windriver.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Marco Elver <elver@google.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Jun Miao <jun.miao@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Changbin Du
2407a64f80 rcu: in_irq() cleanup
This commit replaces the obsolete and ambiguous macro in_irq() with its
shiny new in_hardirq() equivalent.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:25:20 -08:00
Paul E. McKenney
bc849e9192 rcu: Move rcu_needs_cpu() to tree.c
Now that RCU_FAST_NO_HZ is no more, there is but one implementation of
the rcu_needs_cpu() function.  This commit therefore moves this function
from kernel/rcu/tree_plugin.c to kernel/rcu/tree.c.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-11-30 17:24:47 -08:00