The new ->gq_seq grace-period sequence numbers must be shifted down,
which give artifacts when these numbers wrap. This commit therefore
enables rcutorture and rcuperf to handle grace-period sequence numbers
even if they do wrap. It does this by allowing a special subtraction
function to be specified, and this function subtracts before shifting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
In the old days of ->gpnum and ->completed, the code requesting a new
grace period checked to see if that grace period had already started,
bailing early if so. The new-age ->gp_seq approach instead checks
whether the grace period has already finished. A compensating change
pushed the requested grace period down to the bottom of the tree, thus
reducing lock contention and even eliminating it in some cases. But why
not further reduce contention, especially on large systems, by doing both,
especially given that the cost of doing both is extremely small?
This commit therefore adds a new rcu_seq_started() function that checks
whether a specified grace period has already started. It then uses
this new function in place of rcu_seq_done() in the rcu_start_this_gp()
function's funnel locking code.
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The "cpustart" trace event shows a stale gp_seq. This is because it uses
rdp->gp_seq, which is updated only at the end of the __note_gp_changes()
function. This commit therefore instead uses rnp->gp_seq.
An alternative fix would be to update rdp->gp_seq earlier, but this would
break RCU's detection of the beginning of a new-to-this-CPU grace period.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently Tree RCU's clean-up code emits a "CleanupMore" trace event in
response to late-arriving grace-period requests even if the grace period
was already requested. This makes "CleanupMore" show up an extra time (in
addition to once for each rcu_node structure that was previously marked
with the request), and for no good reason. This commit therefore avoids
emitting this trace message unless the the only request for this next
grace period arrived during or after the cleanup scan of the rcu_node
structures.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The old grace-period start code would acquire only the leaf's rcu_node
structure's ->lock if that structure believed that a grace period was
in progress. The new code advances to the leaf's parent in this case,
needlessly acquiring then leaf's parent's ->lock. This commit therefore
checks the grace-period state after marking the leaf with the need for
the specified grace period, and if the leaf believes that a grace period
is in progress, takes an early exit.
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add "Startedleaf" tracing as suggested by Joel Fernandes. ]
Now that the rcu_data structure contains ->gp_seq_needed, create an
rcu_accelerate_cbs_unlocked() helper function that locklessly checks to
see if new callbacks' required grace period has already been requested.
If so, update the callback list locally and again locklessly. (Though
interrupts must be and are disabled to avoid racing with conflicting
updates in interrupt handlers.)
Otherwise, call rcu_accelerate_cbs() as before.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that everything has been converted to use ->gp_seq instead of
->gpnum and ->completed, this commit removes ->gpnum and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_quiescent_state_report tracepoint use ->gp_seq
instead of ->gpnum.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_future_grace_period tracepoint use gp_seq
instead of ->gpnum and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the rcu_grace_period tracepoint use gp_seq instead
of ->gpnum or ->completed. It also introduces a "cpuofl-bgp" string to
less obscurely indicate when a CPU has gone offline while a grace period
is waiting on it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_nocb_wait_gp() check rdp->gp_seq_needed to see
if the current CPU already knows about the needed grace period having
already been requested. If so, it avoids acquiring the corresponding
leaf rcu_node structure's ->lock, thus decreasing contention. This
optimization is intended for cases where either multiple leader rcuo
kthreads are running on the same CPU or these kthreads are running on
a non-offloaded (e.g., housekeeping) CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Move lock release past "if" as suggested by Joel Fernandes. ]
[ paulmck: Fix caching of furthest-future requested grace period. ]
One problem with the ->need_future_gp[] array is that the grace-period
assignment of each element changes as the grace periods complete.
This means that it is necessary to hold a lock when checking this
array to learn if a given grace period has already been requested.
This increase lock contention, which is the opposite of helpful.
This commit therefore replaces the ->need_future_gp[] with a single
->gp_seq_needed value and keeps it updated in the rcu_data structure.
This will enable reliable lockless checking of whether or not a given
grace period has already been requested.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
SRCU has long used ->srcu_gp_seq, and now RCU uses ->gp_seq. This
commit therefore moves the rcutorture_get_gp_data() function from
a ->gpnum / ->completed pair to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes the RCU CPU stall-warning code in print_other_cpu_stall(),
print_cpu_stall(), and check_cpu_stall() use ->gp_seq instead of ->gpnum
and ->completed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts the grace-period request code paths from ->completed
and ->gpnum to ->gp_seq. The need_future_gp_element() macro encapsulates
the shift operation required to use ->gp_seq as an index to the
->need_future_gp[] array. The rcu_cbs_completed() function is removed
in favor of the rcu_seq_snap() function. The rcu_start_this_gp()
gets some temporary consistency checks and uses rcu_seq_done(),
rcu_seq_current(), rcu_seq_state(), and rcu_gp_in_progress() in place
of the earlier open-coded comparisons of ->gpnum and ->completed.
The rcu_future_gp_cleanup() function replaces use of ->completed
with ->gp_seq. The rcu_accelerate_cbs() function replaces a call to
rcu_cbs_completed() with one to rcu_seq_snap(). The rcu_advance_cbs()
function replaces an access to >completed with one to ->gp_seq and adds
some temporary warnings. The rcu_nocb_wait_gp() function replaces a
call to rcu_cbs_completed() with one to rcu_seq_snap() and an open-coded
comparison with rcu_seq_done().
The temporary warnings will be removed when the various ->gpnum and
->completed fields are removed. Their purpose is to locate code who
might still be using ->gpnum and ->completed. (Much easier that way
than trying to trace down the causes of too-short grace periods and
grace-period hangs!)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the quiescent-state no-backtracking checks from
->gpnum and ->completed to ->gp_seq.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the interrupt-disabled detection mechanism to
->gp_seq. This mechanism is used as part of RCU CPU stall warnings,
and detects cases where the stall is due to a CPU having interrupts
disabled.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_gp_in_progress() use ->gp_seq instead of
->completed and ->gpnum. The READ_ONCE() invocations are buried
in rcu_seq_current().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit makes rcu_implicit_dynticks_qs() use ->gp_seq, with the
exception of tracing, which will be converted later.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit converts rcu_gpnum_ovf() to use ->gp_seq instead of ->gpnum.
Same size unsigned long, so same approach.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves __note_gp_changes(), note_gp_changes(), and
__rcu_pending() to ->gp_seq, creating new rcu_seq_completed_gp() and
rcu_seq_new_gp() functions for this purpose.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Reinstate "cpuend: trace as suggested by Joel Fernandes. ]
This commit converts get_state_synchronize_rcu(), cond_synchronize_rcu(),
get_state_synchronize_sched(), and cond_synchronize_sched() from ->gpnum
and ->completed to ->gp_seq. Note that this also introduces a full
memory barrier in the already-done paths off cond_synchronize_rcu() and
cond_synchronize_sched(), as work with LKMM indicates that the earlier
smp_load_acquire() were insufficiently strong in some situations where
these two functions were called just as the grace period ended. In such
cases, these two functions would not gain the benefit of memory ordering
at the end of the grace period.
Please note that the performance impact is negligible, as you shouldn't
be using either function anywhere near a fastpath in any case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches the functions reporting quiescent states from
use of ->gpnum to ->gp_seq. In either case, the point is to handle
races where a given grace period ends before a quiescent state can
be reported. Failing to catch these races would result in too-short
grace periods, hence the checking.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit switches rcu_check_gp_kthread_starvation() from printing
->gpnum and ->completed to printing ->gp_seq upon detecting a starving
RCU grace-period kthread during an RCU CPU stall warning.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcutorture test invokes rcu_batches_started(),
rcu_batches_completed(), rcu_batches_started_bh(),
rcu_batches_completed_bh(), rcu_batches_started_sched(), and
rcu_batches_completed_sched() to do grace-period consistency checks,
and rcuperf uses the _completed variants for statistics.
These functions use ->gpnum and ->completed. This commit therefore
replaces them with rcu_get_gp_seq(), rcu_bh_get_gp_seq(), and
rcu_sched_get_gp_seq(), adjusting rcutorture and rcuperf to make
use of them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit moves rcu_gp_slow() to ->gp_seq. This function only uses
the grace-period number to modulate delay, so rcu_seq_ctr(rsp->gp_seq)
gets the same effect, at least in cases where the delay is to happen
more than four times per wrap of an unsigned long.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds grace-period sequence numbers (->gp_seq) to the
rcu_state, rcu_node, and rcu_data structures, and updates them.
It also checks for consistency between rsp->gpnum and rsp->gp_seq.
These ->gp_seq counters will eventually replace the existing ->gpnum
and ->completed counters, allowing a single memory access to determine
whether or not a grace period is in progress and if so, which one.
This in turn will enable changes that will reduce ->lock contention on
the leaf rcu_node structures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests. This commit therefore adds an else-clause
to avoid this double write.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second. This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU. This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.
This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters). This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in
rcu_gp_cleanup() triggers (inexplicably, of course) every so often.
This commit therefore extracts more information.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.
RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.
Fix this by initializing RCU way early in the MTRR case.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
Now that grace-period requests use funnel locking and now that they
set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period
kthread has not yet started, rcu_gp_kthread() no longer needs to check
need_any_future_gp() at startup time. This commit therefore removes
this check.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp()
can be greatly simplified. This simplification eliminates the last
call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(),
both of which which can then be eliminated. And then, because
cpu_needs_another_gp() is called only from __rcu_pending(), it can be
inlined and eliminated.
This commit carries out the simplification, inlining, and elimination
called out above.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
All of the cpu_needs_another_gp() function's checks (except for
newly arrived callbacks) have been subsumed into the rcu_gp_cleanup()
function's scan of the rcu_node tree. This commit therefore drops the
call to cpu_needs_another_gp(). The check for newly arrived callbacks
is supplied by rcu_accelerate_cbs(). Any needed advancing (as in the
earlier rcu_advance_cbs() call) will be supplied when the corresponding
CPU becomes aware of the end of the now-completed grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
If rcu_start_this_gp() is invoked with a requested grace period more
than three in the future, then either the ->need_future_gp[] array
needs to be bigger or the caller needs to be repaired. This commit
therefore adds a WARN_ON_ONCE() checking for this condition.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_start_this_gp() function had a simple form of funnel locking that
used only the leaves and root of the rcu_node tree, which is fine for
systems with only a few hundred CPUs, but sub-optimal for systems having
thousands of CPUs. This commit therefore adds full-tree funnel locking.
This variant of funnel locking is unusual in the following ways:
1. The leaf-level rcu_node structure's ->lock is held throughout.
Other funnel-locking implementations drop the leaf-level lock
before progressing to the next level of the tree.
2. Funnel locking can be started at the root, which is convenient
for code that already holds the root rcu_node structure's ->lock.
Other funnel-locking implementations start at the leaves.
3. If an rcu_node structure other than the initial one believes
that a grace period is in progress, it is not necessary to
go further up the tree. This is because grace-period cleanup
scans the full tree, so that marking the need for a subsequent
grace period anywhere in the tree suffices -- but only if
a grace period is currently in progress.
4. It is possible that the RCU grace-period kthread has not yet
started, and this case must be handled appropriately.
However, the general approach of using a tree to control lock contention
is still in place.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_accelerate_cbs() function selects a grace-period target, which
it uses to have rcu_segcblist_accelerate() assign numbers to recently
queued callbacks. Then it invokes rcu_start_future_gp(), which selects
a grace-period target again, which is a bit pointless. This commit
therefore changes rcu_start_future_gp() to take the grace-period target as
a parameter, thus avoiding double selection. This commit also changes
the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect
this change in functionality, and also makes a similar change to the
name of trace_rcu_future_gp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and
much of its code is redundant when invoked from that context. This commit
therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(),
then removes rcu_start_gp_advanced().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Once the grace period has ended, any RCU_GP_FLAG_FQS requests are
irrelevant: The grace period has ended, so there is no longer any
point in forcing quiescent states in order to try to make it end sooner.
This commit therefore causes rcu_gp_cleanup() to clear any bits other
than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
It is true that currently only the low-order two bits are used, so
there should be no problem given modern machines and compilers, but
good hygiene and maintainability dictates use of an unsigned long
instead of an int. This commit therefore makes this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The __rcu_process_callbacks() function currently checks to see if
the current CPU needs a grace period and also if there is any other
reason to kick off a new grace period. This is one of the fail-safe
checks that has been rendered unnecessary by the changes that increase
the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace
period is required. Because this particular fail-safe involved acquiring
the root rcu_node structure's ->lock, which has seen excessive contention
in real life, this fail-safe needs to go.
However, one check must remain, namely the check for newly arrived
RCU callbacks that have not yet been associated with a grace period.
One might hope that the checks in __note_gp_changes(), which is invoked
indirectly from rcu_check_quiescent_state(), would suffice, but this
function won't be invoked at all if RCU is idle. It is therefore necessary
to replace the fail-safe checks with a simpler check for newly arrived
callbacks during an RCU idle period, which is exactly what this commit
does. This change removes the final call to rcu_start_gp(), so this
function is removed as well.
Note that lockless use of cpu_needs_another_gp() is racy, but that
these races are harmless in this case. If RCU really is idle, the
values will not change, so the return value from cpu_needs_another_gp()
will be correct. If RCU is not idle, the resulting redundant call to
rcu_accelerate_cbs() will be harmless, and might even have the benefit
of reducing grace-period latency a bit.
This commit also moves interrupt disabling into the "if" statement to
improve real-time response a bit.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
When __call_rcu_core() notices excessive numbers of callbacks pending
on the current CPU, we know that at least one of them is not yet
classified, namely the one that was just now queued. Therefore, it
is not necessary to invoke rcu_start_gp() and thus not necessary to
acquire the root rcu_node structure's ->lock. This commit therefore
replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing
an acquisition of the root rcu_node structure's ->lock with that of
this CPU's leaf rcu_node structure.
This decreases contention on the root rcu_node structure's ->lock.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_migrate_callbacks() function invokes rcu_advance_cbs()
twice, ignoring the return value. This is OK at pressent because of
failsafe code that does the wakeup when needed. However, this failsafe
code acquires the root rcu_node structure's lock frequently, while
rcu_migrate_callbacks() does so only once per CPU-offline operation.
This commit therefore makes rcu_migrate_callbacks()
wake up the RCU GP kthread when either call to rcu_advance_cbs()
returns true, thus removing need for the failsafe code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
There is no longer any need for ->need_future_gp[] to count the number of
requests for future grace periods, so this commit converts the additions
to assignments to "true" and reduces the size of each element to one byte.
While we are in the area, fix an obsolete comment.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Currently, the rcu_future_needs_gp() function checks only the current
element of the ->need_future_gps[] array, which might miss elements that
were offset from the expected element, for example, due to races with
the start or the end of a grace period. This commit therefore makes
rcu_future_needs_gp() use the need_any_future_gp() macro to check all
of the elements of this array.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset
state to reflect the end of the grace period. It also checks to see
whether a new grace period is needed, but in a number of cases, rather
than directly cause the new grace period to be immediately started, it
instead leaves the grace-period-needed state where various fail-safes
can find it. This works fine, but results in higher contention on the
root rcu_node structure's ->lock, which is undesirable, and contention
on that lock has recently become noticeable.
This commit therefore makes rcu_gp_cleanup() immediately start a new
grace period if there is any need for one.
It is quite possible that it will later be necessary to throttle the
grace-period rate, but that can be dealt with when and if.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
The rcu_gp_kthread() function immediately sleeps waiting to be notified
of the need for a new grace period, which currently works because there
are a number of code sequences that will provide the needed wakeup later.
However, some of these code sequences need to acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_gp_kthread() check for early-boot activity
when it starts up, omitting the initial sleep in that case.
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>