Merge branches 'rcu/staging-core', 'rcu/staging-docs' and 'rcu/staging-kfree', remote-tracking branches 'paul/srcu-cf.2023.04.04a', 'fbq/rcu/lockdep.2023.03.27a' and 'fbq/rcu/rcutorture.2023.03.20a' into rcu/staging

This commit is contained in:
Joel Fernandes (Google) 2023-04-05 13:50:37 +00:00
43 changed files with 839 additions and 305 deletions

View file

@ -277,7 +277,7 @@ the following access functions:
Again, only one request in a given batch need actually carry out a
grace-period operation, which means there must be an efficient way to
identify which of many concurrent reqeusts will initiate the grace
identify which of many concurrent requests will initiate the grace
period, and that there be an efficient way for the remaining requests to
wait for that grace period to complete. However, that is the topic of
the next section.
@ -405,7 +405,7 @@ Use of Workqueues
In earlier implementations, the task requesting the expedited grace
period also drove it to completion. This straightforward approach had
the disadvantage of needing to account for POSIX signals sent to user
tasks, so more recent implemementations use the Linux kernel's
tasks, so more recent implementations use the Linux kernel's
workqueues (see Documentation/core-api/workqueue.rst).
The requesting task still does counter snapshotting and funnel-lock
@ -465,7 +465,7 @@ corresponding disadvantage that workqueues cannot be used until they are
initialized, which does not happen until some time after the scheduler
spawns the first task. Given that there are parts of the kernel that
really do want to execute grace periods during this mid-boot “dead
zone”, expedited grace periods must do something else during thie time.
zone”, expedited grace periods must do something else during this time.
What they do is to fall back to the old practice of requiring that the
requesting task drive the expedited grace period, as was the case before

View file

@ -168,7 +168,7 @@ an ``atomic_add_return()`` of zero) to detect idle CPUs.
+-----------------------------------------------------------------------+
The approach must be extended to handle one final case, that of waking a
task blocked in ``synchronize_rcu()``. This task might be affinitied to
task blocked in ``synchronize_rcu()``. This task might be affined to
a CPU that is not yet aware that the grace period has ended, and thus
might not yet be subject to the grace period's memory ordering.
Therefore, there is an ``smp_mb()`` after the return from

View file

@ -201,7 +201,7 @@ work looked at debugging uses of RCU [Seyster:2011:RFA:2075416.2075425].
In 2012, Josh Triplett received his Ph.D. with his dissertation
covering RCU-protected resizable hash tables and the relationship
between memory barriers and read-side traversal order: If the updater
is making changes in the opposite direction from the read-side traveral
is making changes in the opposite direction from the read-side traversal
order, the updater need only execute a memory-barrier instruction,
but if in the same direction, the updater needs to wait for a grace
period between the individual updates [JoshTriplettPhD]. Also in 2012,
@ -1245,7 +1245,7 @@ Oregon Health and Sciences University"
[Viewed September 5, 2005]"
,annotation={
First posting showing how RCU can be safely adapted for
preemptable RCU read side critical sections.
preemptible RCU read side critical sections.
}
}
@ -1888,7 +1888,7 @@ Revised:
\url{https://lore.kernel.org/r/20070910183004.GA3299@linux.vnet.ibm.com}
[Viewed October 25, 2007]"
,annotation={
Final patch for preemptable RCU to -rt. (Later patches were
Final patch for preemptible RCU to -rt. (Later patches were
to mainline, eventually incorporated.)
}
}
@ -2275,7 +2275,7 @@ lot of {Linux} into your technology!!!"
\url{https://lore.kernel.org/r/20090724001429.GA17374@linux.vnet.ibm.com}
[Viewed August 15, 2009]"
,annotation={
First posting of simple and fast preemptable RCU.
First posting of simple and fast preemptible RCU.
}
}
@ -2639,7 +2639,7 @@ lot of {Linux} into your technology!!!"
RCU-protected hash tables, barriers vs. read-side traversal order.
.
If the updater is making changes in the opposite direction from
the read-side traveral order, the updater need only execute a
the read-side traversal order, the updater need only execute a
memory-barrier instruction, but if in the same direction, the
updater needs to wait for a grace period between the individual
updates.

View file

@ -107,7 +107,7 @@ UP systems, including PREEMPT SMP builds running on UP systems.
Quick Quiz #3:
Why can't synchronize_rcu() return immediately on UP systems running
preemptable RCU?
preemptible RCU?
.. _answer_quick_quiz_up:
@ -143,7 +143,7 @@ Answer to Quick Quiz #2:
Answer to Quick Quiz #3:
Why can't synchronize_rcu() return immediately on UP systems
running preemptable RCU?
running preemptible RCU?
Because some other task might have been preempted in the middle
of an RCU read-side critical section. If synchronize_rcu()

View file

@ -70,7 +70,7 @@ over a rather long period of time, but improvements are always welcome!
can serve as rcu_read_lock_sched(), but is less readable and
prevents lockdep from detecting locking issues.
Please not that you *cannot* rely on code known to be built
Please note that you *cannot* rely on code known to be built
only in non-preemptible kernels. Such code can and will break,
especially in kernels built with CONFIG_PREEMPT_COUNT=y.

View file

@ -65,7 +65,7 @@ checking of rcu_dereference() primitives:
rcu_access_pointer(p):
Return the value of the pointer and omit all barriers,
but retain the compiler constraints that prevent duplicating
or coalescsing. This is useful when testing the
or coalescing. This is useful when testing the
value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean

View file

@ -216,7 +216,7 @@ Kernel boot arguments can also be supplied, for example, to control
rcutorture's module parameters. For example, to test a change to RCU's
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
This will of course result in the scripting reporting a failure, namely
the resuling RCU CPU stall warning. As noted above, reducing memory may
the resulting RCU CPU stall warning. As noted above, reducing memory may
require disabling rcutorture's callback-flooding tests::
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
@ -370,5 +370,5 @@ You can also re-run a previous remote run in a manner similar to kvm.sh:
tools/testing/selftests/rcutorture/res/2022.11.03-11.26.28-remote \
--duration 24h
In this case, most of the kvm-again.sh parmeters may be supplied following
In this case, most of the kvm-again.sh parameters may be supplied following
the pathname of the old run-results directory.

View file

@ -597,10 +597,10 @@ to avoid having to write your own callback::
If the occasional sleep is permitted, the single-argument form may
be used, omitting the rcu_head structure from struct foo.
kfree_rcu(old_fp);
kfree_rcu_mightsleep(old_fp);
This variant of kfree_rcu() almost never blocks, but might do so by
invoking synchronize_rcu() in response to memory-allocation failure.
This variant almost never blocks, but might do so by invoking
synchronize_rcu() in response to memory-allocation failure.
Again, see checklist.rst for additional rules governing the use of RCU.

View file

@ -1615,7 +1615,7 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
drbd_send_sync_param(peer_device);
}
kvfree_rcu(old_disk_conf);
kvfree_rcu_mightsleep(old_disk_conf);
kfree(old_plan);
mod_timer(&device->request_timer, jiffies + HZ);
goto success;
@ -2446,7 +2446,7 @@ int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info)
mutex_unlock(&connection->resource->conf_update);
mutex_unlock(&connection->data.mutex);
kvfree_rcu(old_net_conf);
kvfree_rcu_mightsleep(old_net_conf);
if (connection->cstate >= C_WF_REPORT_PARAMS) {
struct drbd_peer_device *peer_device;
@ -2860,7 +2860,7 @@ int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info)
new_disk_conf->disk_size = (sector_t)rs.resize_size;
rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf);
mutex_unlock(&device->resource->conf_update);
kvfree_rcu(old_disk_conf);
kvfree_rcu_mightsleep(old_disk_conf);
new_disk_conf = NULL;
}

View file

@ -3759,7 +3759,7 @@ static int receive_protocol(struct drbd_connection *connection, struct packet_in
drbd_info(connection, "peer data-integrity-alg: %s\n",
integrity_alg[0] ? integrity_alg : "(none)");
kvfree_rcu(old_net_conf);
kvfree_rcu_mightsleep(old_net_conf);
return 0;
disconnect_rcu_unlock:
@ -4127,7 +4127,7 @@ static int receive_sizes(struct drbd_connection *connection, struct packet_info
rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf);
mutex_unlock(&connection->resource->conf_update);
kvfree_rcu(old_disk_conf);
kvfree_rcu_mightsleep(old_disk_conf);
drbd_info(device, "Peer sets u_size to %lu sectors (old: %lu)\n",
(unsigned long)p_usize, (unsigned long)my_usize);

View file

@ -2071,7 +2071,7 @@ static int w_after_conn_state_ch(struct drbd_work *w, int unused)
conn_free_crypto(connection);
mutex_unlock(&connection->resource->conf_update);
kvfree_rcu(old_conf);
kvfree_rcu_mightsleep(old_conf);
}
if (ns_max.susp_fen) {

View file

@ -687,7 +687,7 @@ int vmci_ctx_remove_notification(u32 context_id, u32 remote_cid)
spin_unlock(&context->lock);
if (notifier)
kvfree_rcu(notifier);
kvfree_rcu_mightsleep(notifier);
vmci_ctx_put(context);

View file

@ -209,7 +209,7 @@ int vmci_event_unsubscribe(u32 sub_id)
if (!s)
return VMCI_ERROR_NOT_FOUND;
kvfree_rcu(s);
kvfree_rcu_mightsleep(s);
return VMCI_SUCCESS;
}

View file

@ -242,7 +242,7 @@ mlx5e_int_port_remove(struct mlx5e_tc_int_port_priv *priv,
mlx5_del_flow_rules(int_port->rx_rule);
mapping_remove(ctx, int_port->mapping);
mlx5e_int_port_metadata_free(priv, int_port->match_metadata);
kfree_rcu(int_port);
kfree_rcu_mightsleep(int_port);
priv->num_ports--;
}

View file

@ -670,7 +670,7 @@ static int mlx5e_macsec_del_txsa(struct macsec_context *ctx)
mlx5e_macsec_cleanup_sa(macsec, tx_sa, true);
mlx5_destroy_encryption_key(macsec->mdev, tx_sa->enc_key_id);
kfree_rcu(tx_sa);
kfree_rcu_mightsleep(tx_sa);
macsec_device->tx_sa[assoc_num] = NULL;
out:
@ -849,7 +849,7 @@ static void macsec_del_rxsc_ctx(struct mlx5e_macsec *macsec, struct mlx5e_macsec
xa_erase(&macsec->sc_xarray, rx_sc->sc_xarray_element->fs_id);
metadata_dst_free(rx_sc->md_dst);
kfree(rx_sc->sc_xarray_element);
kfree_rcu(rx_sc);
kfree_rcu_mightsleep(rx_sc);
}
static int mlx5e_macsec_del_rxsc(struct macsec_context *ctx)

View file

@ -2500,7 +2500,7 @@ static void ext4_apply_quota_options(struct fs_context *fc,
qname = rcu_replace_pointer(sbi->s_qf_names[i], qname,
lockdep_is_held(&sb->s_umount));
if (qname)
kfree_rcu(qname);
kfree_rcu_mightsleep(qname);
}
}

View file

@ -134,7 +134,8 @@ struct held_lock {
unsigned int read:2; /* see lock_acquire() comment */
unsigned int check:1; /* see lock_acquire() comment */
unsigned int hardirqs_off:1;
unsigned int references:12; /* 32 bits */
unsigned int sync:1;
unsigned int references:11; /* 32 bits */
unsigned int pin_count;
};
@ -268,6 +269,10 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
extern void lock_release(struct lockdep_map *lock, unsigned long ip);
extern void lock_sync(struct lockdep_map *lock, unsigned int subclass,
int read, int check, struct lockdep_map *nest_lock,
unsigned long ip);
/* lock_is_held_type() returns */
#define LOCK_STATE_UNKNOWN -1
#define LOCK_STATE_NOT_HELD 0
@ -554,6 +559,7 @@ do { \
#define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
#define lock_map_release(l) lock_release(l, _THIS_IP_)
#define lock_map_sync(l) lock_sync(l, 0, 0, 1, NULL, _THIS_IP_)
#ifdef CONFIG_PROVE_LOCKING
# define might_lock(lock) \

View file

@ -73,6 +73,9 @@ struct raw_notifier_head {
struct srcu_notifier_head {
struct mutex mutex;
#ifdef CONFIG_TREE_SRCU
struct srcu_usage srcuu;
#endif
struct srcu_struct srcu;
struct notifier_block __rcu *head;
};
@ -107,7 +110,7 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
{ \
.mutex = __MUTEX_INITIALIZER(name.mutex), \
.head = NULL, \
.srcu = __SRCU_STRUCT_INIT(name.srcu, pcpu), \
.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu), \
}
#define ATOMIC_NOTIFIER_HEAD(name) \

View file

@ -102,6 +102,32 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
return lock_is_held(&ssp->dep_map);
}
/*
* Annotations provide deadlock detection for SRCU.
*
* Similar to other lockdep annotations, except there is an additional
* srcu_lock_sync(), which is basically an empty *write*-side critical section,
* see lock_sync() for more information.
*/
/* Annotates a srcu_read_lock() */
static inline void srcu_lock_acquire(struct lockdep_map *map)
{
lock_map_acquire_read(map);
}
/* Annotates a srcu_read_lock() */
static inline void srcu_lock_release(struct lockdep_map *map)
{
lock_map_release(map);
}
/* Annotates a synchronize_srcu() */
static inline void srcu_lock_sync(struct lockdep_map *map)
{
lock_map_sync(map);
}
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
@ -109,6 +135,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
return 1;
}
#define srcu_lock_acquire(m) do { } while (0)
#define srcu_lock_release(m) do { } while (0)
#define srcu_lock_sync(m) do { } while (0)
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#define SRCU_NMI_UNKNOWN 0x0
@ -182,7 +212,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
srcu_check_nmi_safety(ssp, false);
retval = __srcu_read_lock(ssp);
rcu_lock_acquire(&(ssp)->dep_map);
srcu_lock_acquire(&(ssp)->dep_map);
return retval;
}
@ -254,7 +284,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
{
WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, false);
rcu_lock_release(&(ssp)->dep_map);
srcu_lock_release(&(ssp)->dep_map);
__srcu_read_unlock(ssp, idx);
}

View file

@ -31,7 +31,7 @@ struct srcu_struct {
void srcu_drive_gp(struct work_struct *wp);
#define __SRCU_STRUCT_INIT(name, __ignored) \
#define __SRCU_STRUCT_INIT(name, __ignored, ___ignored) \
{ \
.srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq), \
.srcu_cb_tail = &name.srcu_cb_head, \
@ -44,9 +44,9 @@ void srcu_drive_gp(struct work_struct *wp);
* Tree SRCU, which needs some per-CPU data.
*/
#define DEFINE_SRCU(name) \
struct srcu_struct name = __SRCU_STRUCT_INIT(name, name)
struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
#define DEFINE_STATIC_SRCU(name) \
static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name)
static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
void synchronize_srcu(struct srcu_struct *ssp);

View file

@ -58,9 +58,9 @@ struct srcu_node {
};
/*
* Per-SRCU-domain structure, similar in function to rcu_state.
* Per-SRCU-domain structure, update-side data linked from srcu_struct.
*/
struct srcu_struct {
struct srcu_usage {
struct srcu_node *node; /* Combining tree. */
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
@ -68,7 +68,6 @@ struct srcu_struct {
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
spinlock_t __private lock; /* Protect counters and size state. */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
@ -77,7 +76,6 @@ struct srcu_struct {
unsigned long srcu_size_jiffies; /* Current contention-measurement interval. */
unsigned long srcu_n_lock_retries; /* Contention events in current interval. */
unsigned long srcu_n_exp_nodelay; /* # expedited no-delays in current GP phase. */
struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */
bool sda_is_static; /* May ->sda be passed to free_percpu()? */
unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */
struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */
@ -89,32 +87,68 @@ struct srcu_struct {
unsigned long reschedule_jiffies;
unsigned long reschedule_count;
struct delayed_work work;
struct lockdep_map dep_map;
struct srcu_struct *srcu_ssp;
};
/* Values for size state variable (->srcu_size_state). */
#define SRCU_SIZE_SMALL 0
#define SRCU_SIZE_ALLOC 1
#define SRCU_SIZE_WAIT_BARRIER 2
#define SRCU_SIZE_WAIT_CALL 3
#define SRCU_SIZE_WAIT_CBS1 4
#define SRCU_SIZE_WAIT_CBS2 5
#define SRCU_SIZE_WAIT_CBS3 6
#define SRCU_SIZE_WAIT_CBS4 7
#define SRCU_SIZE_BIG 8
/*
* Per-SRCU-domain structure, similar in function to rcu_state.
*/
struct srcu_struct {
unsigned int srcu_idx; /* Current rdr array element. */
struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */
struct lockdep_map dep_map;
struct srcu_usage *srcu_sup; /* Update-side data. */
};
// Values for size state variable (->srcu_size_state). Once the state
// has been set to SRCU_SIZE_ALLOC, the grace-period code advances through
// this state machine one step per grace period until the SRCU_SIZE_BIG state
// is reached. Otherwise, the state machine remains in the SRCU_SIZE_SMALL
// state indefinitely.
#define SRCU_SIZE_SMALL 0 // No srcu_node combining tree, ->node == NULL
#define SRCU_SIZE_ALLOC 1 // An srcu_node tree is being allocated, initialized,
// and then referenced by ->node. It will not be used.
#define SRCU_SIZE_WAIT_BARRIER 2 // The srcu_node tree starts being used by everything
// except call_srcu(), especially by srcu_barrier().
// By the end of this state, all CPUs and threads
// are aware of this tree's existence.
#define SRCU_SIZE_WAIT_CALL 3 // The srcu_node tree starts being used by call_srcu().
// By the end of this state, all of the call_srcu()
// invocations that were running on a non-boot CPU
// and using the boot CPU's callback queue will have
// completed.
#define SRCU_SIZE_WAIT_CBS1 4 // Don't trust the ->srcu_have_cbs[] grace-period
#define SRCU_SIZE_WAIT_CBS2 5 // sequence elements or the ->srcu_data_have_cbs[]
#define SRCU_SIZE_WAIT_CBS3 6 // CPU-bitmask elements until all four elements of
#define SRCU_SIZE_WAIT_CBS4 7 // each array have been initialized.
#define SRCU_SIZE_BIG 8 // The srcu_node combining tree is fully initialized
// and all aspects of it are being put to use.
/* Values for state variable (bottom bits of ->srcu_gp_seq). */
#define SRCU_STATE_IDLE 0
#define SRCU_STATE_SCAN1 1
#define SRCU_STATE_SCAN2 2
#define __SRCU_STRUCT_INIT(name, pcpu_name) \
{ \
.sda = &pcpu_name, \
.lock = __SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = -1UL, \
.work = __DELAYED_WORK_INITIALIZER(name.work, NULL, 0), \
__SRCU_DEP_MAP_INIT(name) \
#define __SRCU_USAGE_INIT(name) \
{ \
.lock = __SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = -1UL, \
.work = __DELAYED_WORK_INITIALIZER(name.work, NULL, 0), \
}
#define __SRCU_STRUCT_INIT_COMMON(name, usage_name) \
.srcu_sup = &usage_name, \
__SRCU_DEP_MAP_INIT(name)
#define __SRCU_STRUCT_INIT_MODULE(name, usage_name) \
{ \
__SRCU_STRUCT_INIT_COMMON(name, usage_name) \
}
#define __SRCU_STRUCT_INIT(name, usage_name, pcpu_name) \
{ \
.sda = &pcpu_name, \
__SRCU_STRUCT_INIT_COMMON(name, usage_name) \
}
/*
@ -137,16 +171,18 @@ struct srcu_struct {
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#ifdef MODULE
# define __DEFINE_SRCU(name, is_static) \
is_static struct srcu_struct name; \
extern struct srcu_struct * const __srcu_struct_##name; \
struct srcu_struct * const __srcu_struct_##name \
# define __DEFINE_SRCU(name, is_static) \
static struct srcu_usage name##_srcu_usage = __SRCU_USAGE_INIT(name##_srcu_usage); \
is_static struct srcu_struct name = __SRCU_STRUCT_INIT_MODULE(name, name##_srcu_usage); \
extern struct srcu_struct * const __srcu_struct_##name; \
struct srcu_struct * const __srcu_struct_##name \
__section("___srcu_struct_ptrs") = &name
#else
# define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
is_static struct srcu_struct name = \
__SRCU_STRUCT_INIT(name, name##_srcu_data)
# define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_data, name##_srcu_data); \
static struct srcu_usage name##_srcu_usage = __SRCU_USAGE_INIT(name##_srcu_usage); \
is_static struct srcu_struct name = \
__SRCU_STRUCT_INIT(name, name##_srcu_usage, name##_srcu_data)
#endif
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)

View file

@ -1881,6 +1881,8 @@ print_circular_lock_scenario(struct held_lock *src,
struct lock_class *source = hlock_class(src);
struct lock_class *target = hlock_class(tgt);
struct lock_class *parent = prt->class;
int src_read = src->read;
int tgt_read = tgt->read;
/*
* A direct locking problem where unsafe_class lock is taken
@ -1908,7 +1910,10 @@ print_circular_lock_scenario(struct held_lock *src,
printk(" Possible unsafe locking scenario:\n\n");
printk(" CPU0 CPU1\n");
printk(" ---- ----\n");
printk(" lock(");
if (tgt_read != 0)
printk(" rlock(");
else
printk(" lock(");
__print_lock_name(target);
printk(KERN_CONT ");\n");
printk(" lock(");
@ -1917,7 +1922,12 @@ print_circular_lock_scenario(struct held_lock *src,
printk(" lock(");
__print_lock_name(target);
printk(KERN_CONT ");\n");
printk(" lock(");
if (src_read != 0)
printk(" rlock(");
else if (src->sync)
printk(" sync(");
else
printk(" lock(");
__print_lock_name(source);
printk(KERN_CONT ");\n");
printk("\n *** DEADLOCK ***\n\n");
@ -4531,7 +4541,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
return 0;
}
}
if (!hlock->hardirqs_off) {
/*
* For lock_sync(), don't mark the ENABLED usage, since lock_sync()
* creates no critical section and no extra dependency can be introduced
* by interrupts
*/
if (!hlock->hardirqs_off && !hlock->sync) {
if (hlock->read) {
if (!mark_lock(curr, hlock,
LOCK_ENABLED_HARDIRQ_READ))
@ -4910,7 +4926,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check, int hardirqs_off,
struct lockdep_map *nest_lock, unsigned long ip,
int references, int pin_count)
int references, int pin_count, int sync)
{
struct task_struct *curr = current;
struct lock_class *class = NULL;
@ -4961,7 +4977,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
class_idx = class - lock_classes;
if (depth) { /* we're holding locks */
if (depth && !sync) {
/* we're holding locks and the new held lock is not a sync */
hlock = curr->held_locks + depth - 1;
if (hlock->class_idx == class_idx && nest_lock) {
if (!references)
@ -4995,6 +5012,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
hlock->trylock = trylock;
hlock->read = read;
hlock->check = check;
hlock->sync = !!sync;
hlock->hardirqs_off = !!hardirqs_off;
hlock->references = references;
#ifdef CONFIG_LOCK_STAT
@ -5056,6 +5074,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
if (!validate_chain(curr, hlock, chain_head, chain_key))
return 0;
/* For lock_sync(), we are done here since no actual critical section */
if (hlock->sync)
return 1;
curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
@ -5197,7 +5219,7 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
hlock->read, hlock->check,
hlock->hardirqs_off,
hlock->nest_lock, hlock->acquire_ip,
hlock->references, hlock->pin_count)) {
hlock->references, hlock->pin_count, 0)) {
case 0:
return 1;
case 1:
@ -5667,7 +5689,7 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
lockdep_recursion_inc();
__lock_acquire(lock, subclass, trylock, read, check,
irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 0);
lockdep_recursion_finish();
raw_local_irq_restore(flags);
}
@ -5693,6 +5715,34 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
}
EXPORT_SYMBOL_GPL(lock_release);
/*
* lock_sync() - A special annotation for synchronize_{s,}rcu()-like API.
*
* No actual critical section is created by the APIs annotated with this: these
* APIs are used to wait for one or multiple critical sections (on other CPUs
* or threads), and it means that calling these APIs inside these critical
* sections is potential deadlock.
*/
void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
int check, struct lockdep_map *nest_lock, unsigned long ip)
{
unsigned long flags;
if (unlikely(!lockdep_enabled()))
return;
raw_local_irq_save(flags);
check_flags(flags);
lockdep_recursion_inc();
__lock_acquire(lock, subclass, 0, read, check,
irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 1);
check_chain_key(current);
lockdep_recursion_finish();
raw_local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(lock_sync);
noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
{
unsigned long flags;

View file

@ -659,7 +659,7 @@ static int __init test_ww_mutex_init(void)
if (ret)
return ret;
ret = stress(4095, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
ret = stress(2047, hweight32(STRESS_ALL)*ncpus, STRESS_ALL);
if (ret)
return ret;

View file

@ -14,6 +14,43 @@
/*
* Grace-period counter management.
*
* The two least significant bits contain the control flags.
* The most significant bits contain the grace-period sequence counter.
*
* When both control flags are zero, no grace period is in progress.
* When either bit is non-zero, a grace period has started and is in
* progress. When the grace period completes, the control flags are reset
* to 0 and the grace-period sequence counter is incremented.
*
* However some specific RCU usages make use of custom values.
*
* SRCU special control values:
*
* SRCU_SNP_INIT_SEQ : Invalid/init value set when SRCU node
* is initialized.
*
* SRCU_STATE_IDLE : No SRCU gp is in progress
*
* SRCU_STATE_SCAN1 : State set by rcu_seq_start(). Indicates
* we are scanning the readers on the slot
* defined as inactive (there might well
* be pending readers that will use that
* index, but their number is bounded).
*
* SRCU_STATE_SCAN2 : State set manually via rcu_seq_set_state()
* Indicates we are flipping the readers
* index and then scanning the readers on the
* slot newly designated as inactive (again,
* the number of pending readers that will use
* this inactive index is bounded).
*
* RCU polled GP special control value:
*
* RCU_GET_STATE_COMPLETED : State value indicating an already-completed
* polled GP has completed. This value covers
* both the state and the counter of the
* grace-period sequence number.
*/
#define RCU_SEQ_CTR_SHIFT 2
@ -341,11 +378,13 @@ extern void rcu_init_geometry(void);
* specified state structure (for SRCU) or the only rcu_state structure
* (for RCU).
*/
#define srcu_for_each_node_breadth_first(sp, rnp) \
#define _rcu_for_each_node_breadth_first(sp, rnp) \
for ((rnp) = &(sp)->node[0]; \
(rnp) < &(sp)->node[rcu_num_nodes]; (rnp)++)
#define rcu_for_each_node_breadth_first(rnp) \
srcu_for_each_node_breadth_first(&rcu_state, rnp)
_rcu_for_each_node_breadth_first(&rcu_state, rnp)
#define srcu_for_each_node_breadth_first(ssp, rnp) \
_rcu_for_each_node_breadth_first(ssp->srcu_sup, rnp)
/*
* Scan the leaves of the rcu_node hierarchy for the rcu_state structure.

View file

@ -631,8 +631,7 @@ static int compute_real(int n)
static int
rcu_scale_shutdown(void *arg)
{
wait_event(shutdown_wq,
atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_scale_cleanup();
kernel_power_off();
@ -716,7 +715,7 @@ kfree_scale_thread(void *arg)
// is tested.
if ((kfree_rcu_test_single && !kfree_rcu_test_double) ||
(kfree_rcu_test_both && torture_random(&tr) & 0x800))
kfree_rcu(alloc_ptr);
kfree_rcu_mightsleep(alloc_ptr);
else
kfree_rcu(alloc_ptr, rh);
}
@ -771,8 +770,8 @@ kfree_scale_cleanup(void)
static int
kfree_scale_shutdown(void *arg)
{
wait_event(shutdown_wq,
atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
wait_event_idle(shutdown_wq,
atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
smp_mb(); /* Wake before output. */

View file

@ -119,7 +119,9 @@ torture_param(int, stutter, 5, "Number of seconds to run/halt test");
torture_param(int, test_boost, 1, "Test RCU prio boost: 0=no, 1=maybe, 2=yes.");
torture_param(int, test_boost_duration, 4, "Duration of each boost test, seconds.");
torture_param(int, test_boost_interval, 7, "Interval between boost tests, seconds.");
torture_param(int, test_nmis, 0, "End-test NMI tests, 0 to disable.");
torture_param(bool, test_no_idle_hz, true, "Test support for tickless idle CPUs");
torture_param(int, test_srcu_lockdep, 0, "Test specified SRCU deadlock scenario.");
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
static char *torture_type = "rcu";
@ -179,7 +181,6 @@ static atomic_t n_rcu_torture_mbchk_tries;
static atomic_t n_rcu_torture_error;
static long n_rcu_torture_barrier_error;
static long n_rcu_torture_boost_ktrerror;
static long n_rcu_torture_boost_rterror;
static long n_rcu_torture_boost_failure;
static long n_rcu_torture_boosts;
static atomic_long_t n_rcu_torture_timers;
@ -2194,12 +2195,11 @@ rcu_torture_stats_print(void)
atomic_read(&n_rcu_torture_alloc),
atomic_read(&n_rcu_torture_alloc_fail),
atomic_read(&n_rcu_torture_free));
pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld rtbre: %ld ",
pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld ",
atomic_read(&n_rcu_torture_mberror),
atomic_read(&n_rcu_torture_mbchk_fail), atomic_read(&n_rcu_torture_mbchk_tries),
n_rcu_torture_barrier_error,
n_rcu_torture_boost_ktrerror,
n_rcu_torture_boost_rterror);
n_rcu_torture_boost_ktrerror);
pr_cont("rtbf: %ld rtb: %ld nt: %ld ",
n_rcu_torture_boost_failure,
n_rcu_torture_boosts,
@ -2217,15 +2217,13 @@ rcu_torture_stats_print(void)
if (atomic_read(&n_rcu_torture_mberror) ||
atomic_read(&n_rcu_torture_mbchk_fail) ||
n_rcu_torture_barrier_error || n_rcu_torture_boost_ktrerror ||
n_rcu_torture_boost_rterror || n_rcu_torture_boost_failure ||
i > 1) {
n_rcu_torture_boost_failure || i > 1) {
pr_cont("%s", "!!! ");
atomic_inc(&n_rcu_torture_error);
WARN_ON_ONCE(atomic_read(&n_rcu_torture_mberror));
WARN_ON_ONCE(atomic_read(&n_rcu_torture_mbchk_fail));
WARN_ON_ONCE(n_rcu_torture_barrier_error); // rcu_barrier()
WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread
WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio
WARN_ON_ONCE(n_rcu_torture_boost_failure); // boost failed (TIMER_SOFTIRQ RT prio?)
WARN_ON_ONCE(i > 1); // Too-short grace period
}
@ -2358,7 +2356,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
"n_barrier_cbs=%d "
"onoff_interval=%d onoff_holdoff=%d "
"read_exit_delay=%d read_exit_burst=%d "
"nocbs_nthreads=%d nocbs_toggle=%d\n",
"nocbs_nthreads=%d nocbs_toggle=%d "
"test_nmis=%d\n",
torture_type, tag, nrealreaders, nfakewriters,
stat_interval, verbose, test_no_idle_hz, shuffle_interval,
stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,
@ -2369,7 +2368,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
n_barrier_cbs,
onoff_interval, onoff_holdoff,
read_exit_delay, read_exit_burst,
nocbs_nthreads, nocbs_toggle);
nocbs_nthreads, nocbs_toggle,
test_nmis);
}
static int rcutorture_booster_cleanup(unsigned int cpu)
@ -3273,6 +3273,29 @@ static void rcu_torture_read_exit_cleanup(void)
torture_stop_kthread(rcutorture_read_exit, read_exit_task);
}
static void rcutorture_test_nmis(int n)
{
#if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)
int cpu;
int dumpcpu;
int i;
for (i = 0; i < n; i++) {
preempt_disable();
cpu = smp_processor_id();
dumpcpu = cpu + 1;
if (dumpcpu >= nr_cpu_ids)
dumpcpu = 0;
pr_alert("%s: CPU %d invoking dump_cpu_task(%d)\n", __func__, cpu, dumpcpu);
dump_cpu_task(dumpcpu);
preempt_enable();
schedule_timeout_uninterruptible(15 * HZ);
}
#else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)
WARN_ONCE(n, "Non-zero rcutorture.test_nmis=%d permitted only when rcutorture is built in.\n", test_nmis);
#endif // #else // #if IS_BUILTIN(CONFIG_RCU_TORTURE_TEST)
}
static enum cpuhp_state rcutor_hp;
static void
@ -3297,6 +3320,8 @@ rcu_torture_cleanup(void)
return;
}
rcutorture_test_nmis(test_nmis);
if (cur_ops->gp_kthread_dbg)
cur_ops->gp_kthread_dbg();
rcu_torture_read_exit_cleanup();
@ -3463,6 +3488,188 @@ static void rcutorture_sync(void)
cur_ops->sync();
}
static DEFINE_MUTEX(mut0);
static DEFINE_MUTEX(mut1);
static DEFINE_MUTEX(mut2);
static DEFINE_MUTEX(mut3);
static DEFINE_MUTEX(mut4);
static DEFINE_MUTEX(mut5);
static DEFINE_MUTEX(mut6);
static DEFINE_MUTEX(mut7);
static DEFINE_MUTEX(mut8);
static DEFINE_MUTEX(mut9);
static DECLARE_RWSEM(rwsem0);
static DECLARE_RWSEM(rwsem1);
static DECLARE_RWSEM(rwsem2);
static DECLARE_RWSEM(rwsem3);
static DECLARE_RWSEM(rwsem4);
static DECLARE_RWSEM(rwsem5);
static DECLARE_RWSEM(rwsem6);
static DECLARE_RWSEM(rwsem7);
static DECLARE_RWSEM(rwsem8);
static DECLARE_RWSEM(rwsem9);
DEFINE_STATIC_SRCU(srcu0);
DEFINE_STATIC_SRCU(srcu1);
DEFINE_STATIC_SRCU(srcu2);
DEFINE_STATIC_SRCU(srcu3);
DEFINE_STATIC_SRCU(srcu4);
DEFINE_STATIC_SRCU(srcu5);
DEFINE_STATIC_SRCU(srcu6);
DEFINE_STATIC_SRCU(srcu7);
DEFINE_STATIC_SRCU(srcu8);
DEFINE_STATIC_SRCU(srcu9);
static int srcu_lockdep_next(const char *f, const char *fl, const char *fs, const char *fu, int i,
int cyclelen, int deadlock)
{
int j = i + 1;
if (j >= cyclelen)
j = deadlock ? 0 : -1;
if (j >= 0)
pr_info("%s: %s(%d), %s(%d), %s(%d)\n", f, fl, i, fs, j, fu, i);
else
pr_info("%s: %s(%d), %s(%d)\n", f, fl, i, fu, i);
return j;
}
// Test lockdep on SRCU-based deadlock scenarios.
static void rcu_torture_init_srcu_lockdep(void)
{
int cyclelen;
int deadlock;
bool err = false;
int i;
int j;
int idx;
struct mutex *muts[] = { &mut0, &mut1, &mut2, &mut3, &mut4,
&mut5, &mut6, &mut7, &mut8, &mut9 };
struct rw_semaphore *rwsems[] = { &rwsem0, &rwsem1, &rwsem2, &rwsem3, &rwsem4,
&rwsem5, &rwsem6, &rwsem7, &rwsem8, &rwsem9 };
struct srcu_struct *srcus[] = { &srcu0, &srcu1, &srcu2, &srcu3, &srcu4,
&srcu5, &srcu6, &srcu7, &srcu8, &srcu9 };
int testtype;
if (!test_srcu_lockdep)
return;
deadlock = test_srcu_lockdep / 1000;
testtype = (test_srcu_lockdep / 10) % 100;
cyclelen = test_srcu_lockdep % 10;
WARN_ON_ONCE(ARRAY_SIZE(muts) != ARRAY_SIZE(srcus));
if (WARN_ONCE(deadlock != !!deadlock,
"%s: test_srcu_lockdep=%d and deadlock digit %d must be zero or one.\n",
__func__, test_srcu_lockdep, deadlock))
err = true;
if (WARN_ONCE(cyclelen <= 0,
"%s: test_srcu_lockdep=%d and cycle-length digit %d must be greater than zero.\n",
__func__, test_srcu_lockdep, cyclelen))
err = true;
if (err)
goto err_out;
if (testtype == 0) {
pr_info("%s: test_srcu_lockdep = %05d: SRCU %d-way %sdeadlock.\n",
__func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-");
if (deadlock && cyclelen == 1)
pr_info("%s: Expect hang.\n", __func__);
for (i = 0; i < cyclelen; i++) {
j = srcu_lockdep_next(__func__, "srcu_read_lock", "synchronize_srcu",
"srcu_read_unlock", i, cyclelen, deadlock);
idx = srcu_read_lock(srcus[i]);
if (j >= 0)
synchronize_srcu(srcus[j]);
srcu_read_unlock(srcus[i], idx);
}
return;
}
if (testtype == 1) {
pr_info("%s: test_srcu_lockdep = %05d: SRCU/mutex %d-way %sdeadlock.\n",
__func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-");
for (i = 0; i < cyclelen; i++) {
pr_info("%s: srcu_read_lock(%d), mutex_lock(%d), mutex_unlock(%d), srcu_read_unlock(%d)\n",
__func__, i, i, i, i);
idx = srcu_read_lock(srcus[i]);
mutex_lock(muts[i]);
mutex_unlock(muts[i]);
srcu_read_unlock(srcus[i], idx);
j = srcu_lockdep_next(__func__, "mutex_lock", "synchronize_srcu",
"mutex_unlock", i, cyclelen, deadlock);
mutex_lock(muts[i]);
if (j >= 0)
synchronize_srcu(srcus[j]);
mutex_unlock(muts[i]);
}
return;
}
if (testtype == 2) {
pr_info("%s: test_srcu_lockdep = %05d: SRCU/rwsem %d-way %sdeadlock.\n",
__func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-");
for (i = 0; i < cyclelen; i++) {
pr_info("%s: srcu_read_lock(%d), down_read(%d), up_read(%d), srcu_read_unlock(%d)\n",
__func__, i, i, i, i);
idx = srcu_read_lock(srcus[i]);
down_read(rwsems[i]);
up_read(rwsems[i]);
srcu_read_unlock(srcus[i], idx);
j = srcu_lockdep_next(__func__, "down_write", "synchronize_srcu",
"up_write", i, cyclelen, deadlock);
down_write(rwsems[i]);
if (j >= 0)
synchronize_srcu(srcus[j]);
up_write(rwsems[i]);
}
return;
}
#ifdef CONFIG_TASKS_TRACE_RCU
if (testtype == 3) {
pr_info("%s: test_srcu_lockdep = %05d: SRCU and Tasks Trace RCU %d-way %sdeadlock.\n",
__func__, test_srcu_lockdep, cyclelen, deadlock ? "" : "non-");
if (deadlock && cyclelen == 1)
pr_info("%s: Expect hang.\n", __func__);
for (i = 0; i < cyclelen; i++) {
char *fl = i == 0 ? "rcu_read_lock_trace" : "srcu_read_lock";
char *fs = i == cyclelen - 1 ? "synchronize_rcu_tasks_trace"
: "synchronize_srcu";
char *fu = i == 0 ? "rcu_read_unlock_trace" : "srcu_read_unlock";
j = srcu_lockdep_next(__func__, fl, fs, fu, i, cyclelen, deadlock);
if (i == 0)
rcu_read_lock_trace();
else
idx = srcu_read_lock(srcus[i]);
if (j >= 0) {
if (i == cyclelen - 1)
synchronize_rcu_tasks_trace();
else
synchronize_srcu(srcus[j]);
}
if (i == 0)
rcu_read_unlock_trace();
else
srcu_read_unlock(srcus[i], idx);
}
return;
}
#endif // #ifdef CONFIG_TASKS_TRACE_RCU
err_out:
pr_info("%s: test_srcu_lockdep = %05d does nothing.\n", __func__, test_srcu_lockdep);
pr_info("%s: test_srcu_lockdep = DNNL.\n", __func__);
pr_info("%s: D: Deadlock if nonzero.\n", __func__);
pr_info("%s: NN: Test number, 0=SRCU, 1=SRCU/mutex, 2=SRCU/rwsem, 3=SRCU/Tasks Trace RCU.\n", __func__);
pr_info("%s: L: Cycle length.\n", __func__);
if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU))
pr_info("%s: NN=3 disallowed because kernel is built with CONFIG_TASKS_TRACE_RCU=n\n", __func__);
}
static int __init
rcu_torture_init(void)
{
@ -3501,9 +3708,17 @@ rcu_torture_init(void)
pr_alert("rcu-torture: ->fqs NULL and non-zero fqs_duration, fqs disabled.\n");
fqs_duration = 0;
}
if (nocbs_nthreads != 0 && (cur_ops != &rcu_ops ||
!IS_ENABLED(CONFIG_RCU_NOCB_CPU))) {
pr_alert("rcu-torture types: %s and CONFIG_RCU_NOCB_CPU=%d, nocb toggle disabled.\n",
cur_ops->name, IS_ENABLED(CONFIG_RCU_NOCB_CPU));
nocbs_nthreads = 0;
}
if (cur_ops->init)
cur_ops->init();
rcu_torture_init_srcu_lockdep();
if (nreaders >= 0) {
nrealreaders = nreaders;
} else {
@ -3540,7 +3755,6 @@ rcu_torture_init(void)
atomic_set(&n_rcu_torture_error, 0);
n_rcu_torture_barrier_error = 0;
n_rcu_torture_boost_ktrerror = 0;
n_rcu_torture_boost_rterror = 0;
n_rcu_torture_boost_failure = 0;
n_rcu_torture_boosts = 0;
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++)

View file

@ -1031,7 +1031,7 @@ ref_scale_cleanup(void)
static int
ref_scale_shutdown(void *arg)
{
wait_event(shutdown_wq, shutdown_start);
wait_event_idle(shutdown_wq, shutdown_start);
smp_mb(); // Wake before output.
ref_scale_cleanup();

View file

@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
{
struct rcu_synchronize rs;
srcu_lock_sync(&ssp->dep_map);
RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||

View file

@ -103,7 +103,7 @@ do { \
#define spin_trylock_irqsave_rcu_node(p, flags) \
({ \
bool ___locked = spin_trylock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
bool ___locked = spin_trylock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
\
if (___locked) \
smp_mb__after_unlock_lock(); \
@ -135,8 +135,8 @@ static void init_srcu_struct_data(struct srcu_struct *ssp)
spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
rcu_segcblist_init(&sdp->srcu_cblist);
sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = ssp->srcu_gp_seq;
sdp->srcu_gp_seq_needed_exp = ssp->srcu_gp_seq;
sdp->srcu_gp_seq_needed = ssp->srcu_sup->srcu_gp_seq;
sdp->srcu_gp_seq_needed_exp = ssp->srcu_sup->srcu_gp_seq;
sdp->mynode = NULL;
sdp->cpu = cpu;
INIT_WORK(&sdp->work, srcu_invoke_callbacks);
@ -173,14 +173,14 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
/* Initialize geometry if it has not already been initialized. */
rcu_init_geometry();
ssp->node = kcalloc(rcu_num_nodes, sizeof(*ssp->node), gfp_flags);
if (!ssp->node)
ssp->srcu_sup->node = kcalloc(rcu_num_nodes, sizeof(*ssp->srcu_sup->node), gfp_flags);
if (!ssp->srcu_sup->node)
return false;
/* Work out the overall tree geometry. */
ssp->level[0] = &ssp->node[0];
ssp->srcu_sup->level[0] = &ssp->srcu_sup->node[0];
for (i = 1; i < rcu_num_lvls; i++)
ssp->level[i] = ssp->level[i - 1] + num_rcu_lvl[i - 1];
ssp->srcu_sup->level[i] = ssp->srcu_sup->level[i - 1] + num_rcu_lvl[i - 1];
rcu_init_levelspread(levelspread, num_rcu_lvl);
/* Each pass through this loop initializes one srcu_node structure. */
@ -195,17 +195,17 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
snp->srcu_gp_seq_needed_exp = SRCU_SNP_INIT_SEQ;
snp->grplo = -1;
snp->grphi = -1;
if (snp == &ssp->node[0]) {
if (snp == &ssp->srcu_sup->node[0]) {
/* Root node, special case. */
snp->srcu_parent = NULL;
continue;
}
/* Non-root node. */
if (snp == ssp->level[level + 1])
if (snp == ssp->srcu_sup->level[level + 1])
level++;
snp->srcu_parent = ssp->level[level - 1] +
(snp - ssp->level[level]) /
snp->srcu_parent = ssp->srcu_sup->level[level - 1] +
(snp - ssp->srcu_sup->level[level]) /
levelspread[level - 1];
}
@ -214,7 +214,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
* leaves of the srcu_node tree.
*/
level = rcu_num_lvls - 1;
snp_first = ssp->level[level];
snp_first = ssp->srcu_sup->level[level];
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(ssp->sda, cpu);
sdp->mynode = &snp_first[cpu / levelspread[level]];
@ -225,7 +225,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
}
sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
}
smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
return true;
}
@ -236,36 +236,47 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
*/
static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
{
ssp->srcu_size_state = SRCU_SIZE_SMALL;
ssp->node = NULL;
mutex_init(&ssp->srcu_cb_mutex);
mutex_init(&ssp->srcu_gp_mutex);
if (!is_static)
ssp->srcu_sup = kzalloc(sizeof(*ssp->srcu_sup), GFP_KERNEL);
if (!ssp->srcu_sup)
return -ENOMEM;
if (!is_static)
spin_lock_init(&ACCESS_PRIVATE(ssp->srcu_sup, lock));
ssp->srcu_sup->srcu_size_state = SRCU_SIZE_SMALL;
ssp->srcu_sup->node = NULL;
mutex_init(&ssp->srcu_sup->srcu_cb_mutex);
mutex_init(&ssp->srcu_sup->srcu_gp_mutex);
ssp->srcu_idx = 0;
ssp->srcu_gp_seq = 0;
ssp->srcu_barrier_seq = 0;
mutex_init(&ssp->srcu_barrier_mutex);
atomic_set(&ssp->srcu_barrier_cpu_cnt, 0);
INIT_DELAYED_WORK(&ssp->work, process_srcu);
ssp->sda_is_static = is_static;
ssp->srcu_sup->srcu_gp_seq = 0;
ssp->srcu_sup->srcu_barrier_seq = 0;
mutex_init(&ssp->srcu_sup->srcu_barrier_mutex);
atomic_set(&ssp->srcu_sup->srcu_barrier_cpu_cnt, 0);
INIT_DELAYED_WORK(&ssp->srcu_sup->work, process_srcu);
ssp->srcu_sup->sda_is_static = is_static;
if (!is_static)
ssp->sda = alloc_percpu(struct srcu_data);
if (!ssp->sda)
if (!ssp->sda) {
if (!is_static)
kfree(ssp->srcu_sup);
return -ENOMEM;
}
init_srcu_struct_data(ssp);
ssp->srcu_gp_seq_needed_exp = 0;
ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
if (READ_ONCE(ssp->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) {
ssp->srcu_sup->srcu_gp_seq_needed_exp = 0;
ssp->srcu_sup->srcu_last_gp_end = ktime_get_mono_fast_ns();
if (READ_ONCE(ssp->srcu_sup->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) {
if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) {
if (!ssp->sda_is_static) {
if (!ssp->srcu_sup->sda_is_static) {
free_percpu(ssp->sda);
ssp->sda = NULL;
kfree(ssp->srcu_sup);
return -ENOMEM;
}
} else {
WRITE_ONCE(ssp->srcu_size_state, SRCU_SIZE_BIG);
WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG);
}
}
smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */
ssp->srcu_sup->srcu_ssp = ssp;
smp_store_release(&ssp->srcu_sup->srcu_gp_seq_needed, 0); /* Init done. */
return 0;
}
@ -277,7 +288,6 @@ int __init_srcu_struct(struct srcu_struct *ssp, const char *name,
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)ssp, sizeof(*ssp));
lockdep_init_map(&ssp->dep_map, name, key, 0);
spin_lock_init(&ACCESS_PRIVATE(ssp, lock));
return init_srcu_struct_fields(ssp, false);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
@ -294,7 +304,6 @@ EXPORT_SYMBOL_GPL(__init_srcu_struct);
*/
int init_srcu_struct(struct srcu_struct *ssp)
{
spin_lock_init(&ACCESS_PRIVATE(ssp, lock));
return init_srcu_struct_fields(ssp, false);
}
EXPORT_SYMBOL_GPL(init_srcu_struct);
@ -306,8 +315,8 @@ EXPORT_SYMBOL_GPL(init_srcu_struct);
*/
static void __srcu_transition_to_big(struct srcu_struct *ssp)
{
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
smp_store_release(&ssp->srcu_size_state, SRCU_SIZE_ALLOC);
lockdep_assert_held(&ACCESS_PRIVATE(ssp->srcu_sup, lock));
smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_ALLOC);
}
/*
@ -318,15 +327,15 @@ static void srcu_transition_to_big(struct srcu_struct *ssp)
unsigned long flags;
/* Double-checked locking on ->srcu_size-state. */
if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL)
if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) != SRCU_SIZE_SMALL)
return;
spin_lock_irqsave_rcu_node(ssp, flags);
if (smp_load_acquire(&ssp->srcu_size_state) != SRCU_SIZE_SMALL) {
spin_unlock_irqrestore_rcu_node(ssp, flags);
spin_lock_irqsave_rcu_node(ssp->srcu_sup, flags);
if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) != SRCU_SIZE_SMALL) {
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags);
return;
}
__srcu_transition_to_big(ssp);
spin_unlock_irqrestore_rcu_node(ssp, flags);
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags);
}
/*
@ -337,14 +346,14 @@ static void spin_lock_irqsave_check_contention(struct srcu_struct *ssp)
{
unsigned long j;
if (!SRCU_SIZING_IS_CONTEND() || ssp->srcu_size_state)
if (!SRCU_SIZING_IS_CONTEND() || ssp->srcu_sup->srcu_size_state)
return;
j = jiffies;
if (ssp->srcu_size_jiffies != j) {
ssp->srcu_size_jiffies = j;
ssp->srcu_n_lock_retries = 0;
if (ssp->srcu_sup->srcu_size_jiffies != j) {
ssp->srcu_sup->srcu_size_jiffies = j;
ssp->srcu_sup->srcu_n_lock_retries = 0;
}
if (++ssp->srcu_n_lock_retries <= small_contention_lim)
if (++ssp->srcu_sup->srcu_n_lock_retries <= small_contention_lim)
return;
__srcu_transition_to_big(ssp);
}
@ -361,9 +370,9 @@ static void spin_lock_irqsave_sdp_contention(struct srcu_data *sdp, unsigned lon
if (spin_trylock_irqsave_rcu_node(sdp, *flags))
return;
spin_lock_irqsave_rcu_node(ssp, *flags);
spin_lock_irqsave_rcu_node(ssp->srcu_sup, *flags);
spin_lock_irqsave_check_contention(ssp);
spin_unlock_irqrestore_rcu_node(ssp, *flags);
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, *flags);
spin_lock_irqsave_rcu_node(sdp, *flags);
}
@ -375,9 +384,9 @@ static void spin_lock_irqsave_sdp_contention(struct srcu_data *sdp, unsigned lon
*/
static void spin_lock_irqsave_ssp_contention(struct srcu_struct *ssp, unsigned long *flags)
{
if (spin_trylock_irqsave_rcu_node(ssp, *flags))
if (spin_trylock_irqsave_rcu_node(ssp->srcu_sup, *flags))
return;
spin_lock_irqsave_rcu_node(ssp, *flags);
spin_lock_irqsave_rcu_node(ssp->srcu_sup, *flags);
spin_lock_irqsave_check_contention(ssp);
}
@ -394,15 +403,15 @@ static void check_init_srcu_struct(struct srcu_struct *ssp)
unsigned long flags;
/* The smp_load_acquire() pairs with the smp_store_release(). */
if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_gp_seq_needed))) /*^^^*/
if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq_needed))) /*^^^*/
return; /* Already initialized. */
spin_lock_irqsave_rcu_node(ssp, flags);
if (!rcu_seq_state(ssp->srcu_gp_seq_needed)) {
spin_unlock_irqrestore_rcu_node(ssp, flags);
spin_lock_irqsave_rcu_node(ssp->srcu_sup, flags);
if (!rcu_seq_state(ssp->srcu_sup->srcu_gp_seq_needed)) {
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags);
return;
}
init_srcu_struct_fields(ssp, true);
spin_unlock_irqrestore_rcu_node(ssp, flags);
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags);
}
/*
@ -607,17 +616,18 @@ static unsigned long srcu_get_delay(struct srcu_struct *ssp)
unsigned long gpstart;
unsigned long j;
unsigned long jbase = SRCU_INTERVAL;
struct srcu_usage *sup = ssp->srcu_sup;
if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
if (ULONG_CMP_LT(READ_ONCE(sup->srcu_gp_seq), READ_ONCE(sup->srcu_gp_seq_needed_exp)))
jbase = 0;
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
if (rcu_seq_state(READ_ONCE(sup->srcu_gp_seq))) {
j = jiffies - 1;
gpstart = READ_ONCE(ssp->srcu_gp_start);
gpstart = READ_ONCE(sup->srcu_gp_start);
if (time_after(j, gpstart))
jbase += j - gpstart;
if (!jbase) {
WRITE_ONCE(ssp->srcu_n_exp_nodelay, READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
if (READ_ONCE(ssp->srcu_n_exp_nodelay) > srcu_max_nodelay_phase)
WRITE_ONCE(sup->srcu_n_exp_nodelay, READ_ONCE(sup->srcu_n_exp_nodelay) + 1);
if (READ_ONCE(sup->srcu_n_exp_nodelay) > srcu_max_nodelay_phase)
jbase = 1;
}
}
@ -634,12 +644,13 @@ static unsigned long srcu_get_delay(struct srcu_struct *ssp)
void cleanup_srcu_struct(struct srcu_struct *ssp)
{
int cpu;
struct srcu_usage *sup = ssp->srcu_sup;
if (WARN_ON(!srcu_get_delay(ssp)))
return; /* Just leak it! */
if (WARN_ON(srcu_readers_active(ssp)))
return; /* Just leak it! */
flush_delayed_work(&ssp->work);
flush_delayed_work(&sup->work);
for_each_possible_cpu(cpu) {
struct srcu_data *sdp = per_cpu_ptr(ssp->sda, cpu);
@ -648,21 +659,23 @@ void cleanup_srcu_struct(struct srcu_struct *ssp)
if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist)))
return; /* Forgot srcu_barrier(), so just leak it! */
}
if (WARN_ON(rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) != SRCU_STATE_IDLE) ||
WARN_ON(rcu_seq_current(&ssp->srcu_gp_seq) != ssp->srcu_gp_seq_needed) ||
if (WARN_ON(rcu_seq_state(READ_ONCE(sup->srcu_gp_seq)) != SRCU_STATE_IDLE) ||
WARN_ON(rcu_seq_current(&sup->srcu_gp_seq) != sup->srcu_gp_seq_needed) ||
WARN_ON(srcu_readers_active(ssp))) {
pr_info("%s: Active srcu_struct %p read state: %d gp state: %lu/%lu\n",
__func__, ssp, rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)),
rcu_seq_current(&ssp->srcu_gp_seq), ssp->srcu_gp_seq_needed);
__func__, ssp, rcu_seq_state(READ_ONCE(sup->srcu_gp_seq)),
rcu_seq_current(&sup->srcu_gp_seq), sup->srcu_gp_seq_needed);
return; /* Caller forgot to stop doing call_srcu()? */
}
if (!ssp->sda_is_static) {
kfree(sup->node);
sup->node = NULL;
sup->srcu_size_state = SRCU_SIZE_SMALL;
if (!sup->sda_is_static) {
free_percpu(ssp->sda);
ssp->sda = NULL;
kfree(sup);
ssp->srcu_sup = NULL;
}
kfree(ssp->node);
ssp->node = NULL;
ssp->srcu_size_state = SRCU_SIZE_SMALL;
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
@ -760,23 +773,23 @@ static void srcu_gp_start(struct srcu_struct *ssp)
struct srcu_data *sdp;
int state;
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
else
sdp = this_cpu_ptr(ssp->sda);
lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock));
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
lockdep_assert_held(&ACCESS_PRIVATE(ssp->srcu_sup, lock));
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed));
spin_lock_rcu_node(sdp); /* Interrupts already disabled. */
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&ssp->srcu_gp_seq));
rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq));
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
WRITE_ONCE(ssp->srcu_gp_start, jiffies);
WRITE_ONCE(ssp->srcu_n_exp_nodelay, 0);
WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies);
WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0);
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(&ssp->srcu_gp_seq);
state = rcu_seq_state(ssp->srcu_gp_seq);
rcu_seq_start(&ssp->srcu_sup->srcu_gp_seq);
state = rcu_seq_state(ssp->srcu_sup->srcu_gp_seq);
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
}
@ -849,28 +862,29 @@ static void srcu_gp_end(struct srcu_struct *ssp)
unsigned long sgsne;
struct srcu_node *snp;
int ss_state;
struct srcu_usage *sup = ssp->srcu_sup;
/* Prevent more than one additional grace period. */
mutex_lock(&ssp->srcu_cb_mutex);
mutex_lock(&sup->srcu_cb_mutex);
/* End the current grace period. */
spin_lock_irq_rcu_node(ssp);
idx = rcu_seq_state(ssp->srcu_gp_seq);
spin_lock_irq_rcu_node(sup);
idx = rcu_seq_state(sup->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
if (ULONG_CMP_LT(READ_ONCE(sup->srcu_gp_seq), READ_ONCE(sup->srcu_gp_seq_needed_exp)))
cbdelay = 0;
WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
rcu_seq_end(&ssp->srcu_gp_seq);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, gpseq);
spin_unlock_irq_rcu_node(ssp);
mutex_unlock(&ssp->srcu_gp_mutex);
WRITE_ONCE(sup->srcu_last_gp_end, ktime_get_mono_fast_ns());
rcu_seq_end(&sup->srcu_gp_seq);
gpseq = rcu_seq_current(&sup->srcu_gp_seq);
if (ULONG_CMP_LT(sup->srcu_gp_seq_needed_exp, gpseq))
WRITE_ONCE(sup->srcu_gp_seq_needed_exp, gpseq);
spin_unlock_irq_rcu_node(sup);
mutex_unlock(&sup->srcu_gp_mutex);
/* A new grace period can start at this point. But only one. */
/* Initiate callback invocation as needed. */
ss_state = smp_load_acquire(&ssp->srcu_size_state);
ss_state = smp_load_acquire(&sup->srcu_size_state);
if (ss_state < SRCU_SIZE_WAIT_BARRIER) {
srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, get_boot_cpu_id()),
cbdelay);
@ -879,7 +893,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
srcu_for_each_node_breadth_first(ssp, snp) {
spin_lock_irq_rcu_node(snp);
cbs = false;
last_lvl = snp >= ssp->level[rcu_num_lvls - 1];
last_lvl = snp >= sup->level[rcu_num_lvls - 1];
if (last_lvl)
cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
snp->srcu_have_cbs[idx] = gpseq;
@ -911,18 +925,18 @@ static void srcu_gp_end(struct srcu_struct *ssp)
}
/* Callback initiation done, allow grace periods after next. */
mutex_unlock(&ssp->srcu_cb_mutex);
mutex_unlock(&sup->srcu_cb_mutex);
/* Start a new grace period if needed. */
spin_lock_irq_rcu_node(ssp);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
spin_lock_irq_rcu_node(sup);
gpseq = rcu_seq_current(&sup->srcu_gp_seq);
if (!rcu_seq_state(gpseq) &&
ULONG_CMP_LT(gpseq, ssp->srcu_gp_seq_needed)) {
ULONG_CMP_LT(gpseq, sup->srcu_gp_seq_needed)) {
srcu_gp_start(ssp);
spin_unlock_irq_rcu_node(ssp);
spin_unlock_irq_rcu_node(sup);
srcu_reschedule(ssp, 0);
} else {
spin_unlock_irq_rcu_node(ssp);
spin_unlock_irq_rcu_node(sup);
}
/* Transition to big if needed. */
@ -930,7 +944,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
if (ss_state == SRCU_SIZE_ALLOC)
init_srcu_struct_nodes(ssp, GFP_KERNEL);
else
smp_store_release(&ssp->srcu_size_state, ss_state + 1);
smp_store_release(&sup->srcu_size_state, ss_state + 1);
}
}
@ -950,7 +964,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
if (snp)
for (; snp != NULL; snp = snp->srcu_parent) {
sgsne = READ_ONCE(snp->srcu_gp_seq_needed_exp);
if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) ||
if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_sup->srcu_gp_seq, s)) ||
(!srcu_invl_snp_seq(sgsne) && ULONG_CMP_GE(sgsne, s)))
return;
spin_lock_irqsave_rcu_node(snp, flags);
@ -963,9 +977,9 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
spin_unlock_irqrestore_rcu_node(snp, flags);
}
spin_lock_irqsave_ssp_contention(ssp, &flags);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(ssp, flags);
if (ULONG_CMP_LT(ssp->srcu_sup->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(ssp->srcu_sup->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(ssp->srcu_sup, flags);
}
/*
@ -990,9 +1004,10 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
struct srcu_node *snp;
struct srcu_node *snp_leaf;
unsigned long snp_seq;
struct srcu_usage *sup = ssp->srcu_sup;
/* Ensure that snp node tree is fully initialized before traversing it */
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
if (smp_load_acquire(&sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
snp_leaf = NULL;
else
snp_leaf = sdp->mynode;
@ -1000,7 +1015,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
if (snp_leaf)
/* Each pass through the loop does one level of the srcu_node tree. */
for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
if (WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) && snp != snp_leaf)
if (WARN_ON_ONCE(rcu_seq_done(&sup->srcu_gp_seq, s)) && snp != snp_leaf)
return; /* GP already done and CBs recorded. */
spin_lock_irqsave_rcu_node(snp, flags);
snp_seq = snp->srcu_have_cbs[idx];
@ -1027,20 +1042,20 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
/* Top of tree, must ensure the grace period will be started. */
spin_lock_irqsave_ssp_contention(ssp, &flags);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed, s)) {
if (ULONG_CMP_LT(sup->srcu_gp_seq_needed, s)) {
/*
* Record need for grace period s. Pair with load
* acquire setting up for initialization.
*/
smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/
smp_store_release(&sup->srcu_gp_seq_needed, s); /*^^^*/
}
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
if (!do_norm && ULONG_CMP_LT(sup->srcu_gp_seq_needed_exp, s))
WRITE_ONCE(sup->srcu_gp_seq_needed_exp, s);
/* If grace period not already in progress, start it. */
if (!WARN_ON_ONCE(rcu_seq_done(&ssp->srcu_gp_seq, s)) &&
rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
if (!WARN_ON_ONCE(rcu_seq_done(&sup->srcu_gp_seq, s)) &&
rcu_seq_state(sup->srcu_gp_seq) == SRCU_STATE_IDLE) {
WARN_ON_ONCE(ULONG_CMP_GE(sup->srcu_gp_seq, sup->srcu_gp_seq_needed));
srcu_gp_start(ssp);
// And how can that list_add() in the "else" clause
@ -1049,12 +1064,12 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
// can only be executed during early boot when there is only
// the one boot CPU running with interrupts still disabled.
if (likely(srcu_init_done))
queue_delayed_work(rcu_gp_wq, &ssp->work,
queue_delayed_work(rcu_gp_wq, &sup->work,
!!srcu_get_delay(ssp));
else if (list_empty(&ssp->work.work.entry))
list_add(&ssp->work.work.entry, &srcu_boot_list);
else if (list_empty(&sup->work.work.entry))
list_add(&sup->work.work.entry, &srcu_boot_list);
}
spin_unlock_irqrestore_rcu_node(ssp, flags);
spin_unlock_irqrestore_rcu_node(sup, flags);
}
/*
@ -1085,16 +1100,36 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount)
static void srcu_flip(struct srcu_struct *ssp)
{
/*
* Ensure that if this updater saw a given reader's increment
* from __srcu_read_lock(), that reader was using an old value
* of ->srcu_idx. Also ensure that if a given reader sees the
* new value of ->srcu_idx, this updater's earlier scans cannot
* have seen that reader's increments (which is OK, because this
* grace period need not wait on that reader).
* Because the flip of ->srcu_idx is executed only if the
* preceding call to srcu_readers_active_idx_check() found that
* the ->srcu_unlock_count[] and ->srcu_lock_count[] sums matched
* and because that summing uses atomic_long_read(), there is
* ordering due to a control dependency between that summing and
* the WRITE_ONCE() in this call to srcu_flip(). This ordering
* ensures that if this updater saw a given reader's increment from
* __srcu_read_lock(), that reader was using a value of ->srcu_idx
* from before the previous call to srcu_flip(), which should be
* quite rare. This ordering thus helps forward progress because
* the grace period could otherwise be delayed by additional
* calls to __srcu_read_lock() using that old (soon to be new)
* value of ->srcu_idx.
*
* This sum-equality check and ordering also ensures that if
* a given call to __srcu_read_lock() uses the new value of
* ->srcu_idx, this updater's earlier scans cannot have seen
* that reader's increments, which is all to the good, because
* this grace period need not wait on that reader. After all,
* if those earlier scans had seen that reader, there would have
* been a sum mismatch and this code would not be reached.
*
* This means that the following smp_mb() is redundant, but
* it stays until either (1) Compilers learn about this sort of
* control dependency or (2) Some production workload running on
* a production system is unduly delayed by this slowpath smp_mb().
*/
smp_mb(); /* E */ /* Pairs with B and C. */
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); // Flip the counter.
/*
* Ensure that if the updater misses an __srcu_read_unlock()
@ -1154,18 +1189,18 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
/* First, see if enough time has passed since the last GP. */
t = ktime_get_mono_fast_ns();
tlast = READ_ONCE(ssp->srcu_last_gp_end);
tlast = READ_ONCE(ssp->srcu_sup->srcu_last_gp_end);
if (exp_holdoff == 0 ||
time_in_range_open(t, tlast, tlast + exp_holdoff))
return false; /* Too soon after last GP. */
/* Next, check for probable idleness. */
curseq = rcu_seq_current(&ssp->srcu_gp_seq);
curseq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq);
smp_mb(); /* Order ->srcu_gp_seq with ->srcu_gp_seq_needed. */
if (ULONG_CMP_LT(curseq, READ_ONCE(ssp->srcu_gp_seq_needed)))
if (ULONG_CMP_LT(curseq, READ_ONCE(ssp->srcu_sup->srcu_gp_seq_needed)))
return false; /* Grace period in progress, so not idle. */
smp_mb(); /* Order ->srcu_gp_seq with prior access. */
if (curseq != rcu_seq_current(&ssp->srcu_gp_seq))
if (curseq != rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq))
return false; /* GP # changed, so not idle. */
return true; /* With reasonable probability, idle! */
}
@ -1199,7 +1234,7 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
* sequence number cannot wrap around in the meantime.
*/
idx = __srcu_read_lock_nmisafe(ssp);
ss_state = smp_load_acquire(&ssp->srcu_size_state);
ss_state = smp_load_acquire(&ssp->srcu_sup->srcu_size_state);
if (ss_state < SRCU_SIZE_WAIT_CALL)
sdp = per_cpu_ptr(ssp->sda, get_boot_cpu_id());
else
@ -1208,8 +1243,8 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
if (rhp)
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
s = rcu_seq_snap(&ssp->srcu_gp_seq);
rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
@ -1307,6 +1342,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
{
struct rcu_synchronize rcu;
srcu_lock_sync(&ssp->dep_map);
RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
@ -1420,7 +1457,7 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
// Any prior manipulation of SRCU-protected data must happen
// before the load from ->srcu_gp_seq.
smp_mb();
return rcu_seq_snap(&ssp->srcu_gp_seq);
return rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
}
EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
@ -1467,7 +1504,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
*/
bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
{
if (!rcu_seq_done(&ssp->srcu_gp_seq, cookie))
if (!rcu_seq_done(&ssp->srcu_sup->srcu_gp_seq, cookie))
return false;
// Ensure that the end of the SRCU grace period happens before
// any subsequent code that the caller might execute.
@ -1486,8 +1523,8 @@ static void srcu_barrier_cb(struct rcu_head *rhp)
sdp = container_of(rhp, struct srcu_data, srcu_barrier_head);
ssp = sdp->ssp;
if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt))
complete(&ssp->srcu_barrier_completion);
if (atomic_dec_and_test(&ssp->srcu_sup->srcu_barrier_cpu_cnt))
complete(&ssp->srcu_sup->srcu_barrier_completion);
}
/*
@ -1501,13 +1538,13 @@ static void srcu_barrier_cb(struct rcu_head *rhp)
static void srcu_barrier_one_cpu(struct srcu_struct *ssp, struct srcu_data *sdp)
{
spin_lock_irq_rcu_node(sdp);
atomic_inc(&ssp->srcu_barrier_cpu_cnt);
atomic_inc(&ssp->srcu_sup->srcu_barrier_cpu_cnt);
sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head);
if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
&sdp->srcu_barrier_head)) {
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&ssp->srcu_barrier_cpu_cnt);
atomic_dec(&ssp->srcu_sup->srcu_barrier_cpu_cnt);
}
spin_unlock_irq_rcu_node(sdp);
}
@ -1520,23 +1557,23 @@ void srcu_barrier(struct srcu_struct *ssp)
{
int cpu;
int idx;
unsigned long s = rcu_seq_snap(&ssp->srcu_barrier_seq);
unsigned long s = rcu_seq_snap(&ssp->srcu_sup->srcu_barrier_seq);
check_init_srcu_struct(ssp);
mutex_lock(&ssp->srcu_barrier_mutex);
if (rcu_seq_done(&ssp->srcu_barrier_seq, s)) {
mutex_lock(&ssp->srcu_sup->srcu_barrier_mutex);
if (rcu_seq_done(&ssp->srcu_sup->srcu_barrier_seq, s)) {
smp_mb(); /* Force ordering following return. */
mutex_unlock(&ssp->srcu_barrier_mutex);
mutex_unlock(&ssp->srcu_sup->srcu_barrier_mutex);
return; /* Someone else did our work for us. */
}
rcu_seq_start(&ssp->srcu_barrier_seq);
init_completion(&ssp->srcu_barrier_completion);
rcu_seq_start(&ssp->srcu_sup->srcu_barrier_seq);
init_completion(&ssp->srcu_sup->srcu_barrier_completion);
/* Initial count prevents reaching zero until all CBs are posted. */
atomic_set(&ssp->srcu_barrier_cpu_cnt, 1);
atomic_set(&ssp->srcu_sup->srcu_barrier_cpu_cnt, 1);
idx = __srcu_read_lock_nmisafe(ssp);
if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
if (smp_load_acquire(&ssp->srcu_sup->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER)
srcu_barrier_one_cpu(ssp, per_cpu_ptr(ssp->sda, get_boot_cpu_id()));
else
for_each_possible_cpu(cpu)
@ -1544,12 +1581,12 @@ void srcu_barrier(struct srcu_struct *ssp)
__srcu_read_unlock_nmisafe(ssp, idx);
/* Remove the initial count, at which point reaching zero can happen. */
if (atomic_dec_and_test(&ssp->srcu_barrier_cpu_cnt))
complete(&ssp->srcu_barrier_completion);
wait_for_completion(&ssp->srcu_barrier_completion);
if (atomic_dec_and_test(&ssp->srcu_sup->srcu_barrier_cpu_cnt))
complete(&ssp->srcu_sup->srcu_barrier_completion);
wait_for_completion(&ssp->srcu_sup->srcu_barrier_completion);
rcu_seq_end(&ssp->srcu_barrier_seq);
mutex_unlock(&ssp->srcu_barrier_mutex);
rcu_seq_end(&ssp->srcu_sup->srcu_barrier_seq);
mutex_unlock(&ssp->srcu_sup->srcu_barrier_mutex);
}
EXPORT_SYMBOL_GPL(srcu_barrier);
@ -1575,7 +1612,7 @@ static void srcu_advance_state(struct srcu_struct *ssp)
{
int idx;
mutex_lock(&ssp->srcu_gp_mutex);
mutex_lock(&ssp->srcu_sup->srcu_gp_mutex);
/*
* Because readers might be delayed for an extended period after
@ -1587,39 +1624,39 @@ static void srcu_advance_state(struct srcu_struct *ssp)
* The load-acquire ensures that we see the accesses performed
* by the prior grace period.
*/
idx = rcu_seq_state(smp_load_acquire(&ssp->srcu_gp_seq)); /* ^^^ */
idx = rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq)); /* ^^^ */
if (idx == SRCU_STATE_IDLE) {
spin_lock_irq_rcu_node(ssp);
if (ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)) {
WARN_ON_ONCE(rcu_seq_state(ssp->srcu_gp_seq));
spin_unlock_irq_rcu_node(ssp);
mutex_unlock(&ssp->srcu_gp_mutex);
spin_lock_irq_rcu_node(ssp->srcu_sup);
if (ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed)) {
WARN_ON_ONCE(rcu_seq_state(ssp->srcu_sup->srcu_gp_seq));
spin_unlock_irq_rcu_node(ssp->srcu_sup);
mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex);
return;
}
idx = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
idx = rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq));
if (idx == SRCU_STATE_IDLE)
srcu_gp_start(ssp);
spin_unlock_irq_rcu_node(ssp);
spin_unlock_irq_rcu_node(ssp->srcu_sup);
if (idx != SRCU_STATE_IDLE) {
mutex_unlock(&ssp->srcu_gp_mutex);
mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex);
return; /* Someone else started the grace period. */
}
}
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN1) {
if (rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)) == SRCU_STATE_SCAN1) {
idx = 1 ^ (ssp->srcu_idx & 1);
if (!try_check_zero(ssp, idx, 1)) {
mutex_unlock(&ssp->srcu_gp_mutex);
mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex);
return; /* readers present, retry later. */
}
srcu_flip(ssp);
spin_lock_irq_rcu_node(ssp);
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
ssp->srcu_n_exp_nodelay = 0;
spin_unlock_irq_rcu_node(ssp);
spin_lock_irq_rcu_node(ssp->srcu_sup);
rcu_seq_set_state(&ssp->srcu_sup->srcu_gp_seq, SRCU_STATE_SCAN2);
ssp->srcu_sup->srcu_n_exp_nodelay = 0;
spin_unlock_irq_rcu_node(ssp->srcu_sup);
}
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {
if (rcu_seq_state(READ_ONCE(ssp->srcu_sup->srcu_gp_seq)) == SRCU_STATE_SCAN2) {
/*
* SRCU read-side critical sections are normally short,
@ -1627,10 +1664,10 @@ static void srcu_advance_state(struct srcu_struct *ssp)
*/
idx = 1 ^ (ssp->srcu_idx & 1);
if (!try_check_zero(ssp, idx, 2)) {
mutex_unlock(&ssp->srcu_gp_mutex);
mutex_unlock(&ssp->srcu_sup->srcu_gp_mutex);
return; /* readers present, retry later. */
}
ssp->srcu_n_exp_nodelay = 0;
ssp->srcu_sup->srcu_n_exp_nodelay = 0;
srcu_gp_end(ssp); /* Releases ->srcu_gp_mutex. */
}
}
@ -1656,7 +1693,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
rcu_cblist_init(&ready_cbs);
spin_lock_irq_rcu_node(sdp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
if (sdp->srcu_cblist_invoking ||
!rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) {
spin_unlock_irq_rcu_node(sdp);
@ -1684,7 +1721,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
spin_lock_irq_rcu_node(sdp);
rcu_segcblist_add_len(&sdp->srcu_cblist, -len);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&ssp->srcu_gp_seq));
rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq));
sdp->srcu_cblist_invoking = false;
more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
spin_unlock_irq_rcu_node(sdp);
@ -1700,20 +1737,20 @@ static void srcu_reschedule(struct srcu_struct *ssp, unsigned long delay)
{
bool pushgp = true;
spin_lock_irq_rcu_node(ssp);
if (ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)) {
if (!WARN_ON_ONCE(rcu_seq_state(ssp->srcu_gp_seq))) {
spin_lock_irq_rcu_node(ssp->srcu_sup);
if (ULONG_CMP_GE(ssp->srcu_sup->srcu_gp_seq, ssp->srcu_sup->srcu_gp_seq_needed)) {
if (!WARN_ON_ONCE(rcu_seq_state(ssp->srcu_sup->srcu_gp_seq))) {
/* All requests fulfilled, time to go idle. */
pushgp = false;
}
} else if (!rcu_seq_state(ssp->srcu_gp_seq)) {
} else if (!rcu_seq_state(ssp->srcu_sup->srcu_gp_seq)) {
/* Outstanding request and no GP. Start one. */
srcu_gp_start(ssp);
}
spin_unlock_irq_rcu_node(ssp);
spin_unlock_irq_rcu_node(ssp->srcu_sup);
if (pushgp)
queue_delayed_work(rcu_gp_wq, &ssp->work, delay);
queue_delayed_work(rcu_gp_wq, &ssp->srcu_sup->work, delay);
}
/*
@ -1724,22 +1761,24 @@ static void process_srcu(struct work_struct *work)
unsigned long curdelay;
unsigned long j;
struct srcu_struct *ssp;
struct srcu_usage *sup;
ssp = container_of(work, struct srcu_struct, work.work);
sup = container_of(work, struct srcu_usage, work.work);
ssp = sup->srcu_ssp;
srcu_advance_state(ssp);
curdelay = srcu_get_delay(ssp);
if (curdelay) {
WRITE_ONCE(ssp->reschedule_count, 0);
WRITE_ONCE(sup->reschedule_count, 0);
} else {
j = jiffies;
if (READ_ONCE(ssp->reschedule_jiffies) == j) {
WRITE_ONCE(ssp->reschedule_count, READ_ONCE(ssp->reschedule_count) + 1);
if (READ_ONCE(ssp->reschedule_count) > srcu_max_nodelay)
if (READ_ONCE(sup->reschedule_jiffies) == j) {
WRITE_ONCE(sup->reschedule_count, READ_ONCE(sup->reschedule_count) + 1);
if (READ_ONCE(sup->reschedule_count) > srcu_max_nodelay)
curdelay = 1;
} else {
WRITE_ONCE(ssp->reschedule_count, 1);
WRITE_ONCE(ssp->reschedule_jiffies, j);
WRITE_ONCE(sup->reschedule_count, 1);
WRITE_ONCE(sup->reschedule_jiffies, j);
}
}
srcu_reschedule(ssp, curdelay);
@ -1752,7 +1791,7 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*gp_seq = rcu_seq_current(&ssp->srcu_gp_seq);
*gp_seq = rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq);
}
EXPORT_SYMBOL_GPL(srcutorture_get_gp_data);
@ -1774,14 +1813,14 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
int cpu;
int idx;
unsigned long s0 = 0, s1 = 0;
int ss_state = READ_ONCE(ssp->srcu_size_state);
int ss_state = READ_ONCE(ssp->srcu_sup->srcu_size_state);
int ss_state_idx = ss_state;
idx = ssp->srcu_idx & 0x1;
if (ss_state < 0 || ss_state >= ARRAY_SIZE(srcu_size_state_name))
ss_state_idx = ARRAY_SIZE(srcu_size_state_name) - 1;
pr_alert("%s%s Tree SRCU g%ld state %d (%s)",
tt, tf, rcu_seq_current(&ssp->srcu_gp_seq), ss_state,
tt, tf, rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq), ss_state,
srcu_size_state_name[ss_state_idx]);
if (!ssp->sda) {
// Called after cleanup_srcu_struct(), perhaps.
@ -1838,7 +1877,7 @@ early_initcall(srcu_bootup_announce);
void __init srcu_init(void)
{
struct srcu_struct *ssp;
struct srcu_usage *sup;
/* Decide on srcu_struct-size strategy. */
if (SRCU_SIZING_IS(SRCU_SIZING_AUTO)) {
@ -1858,12 +1897,13 @@ void __init srcu_init(void)
*/
srcu_init_done = true;
while (!list_empty(&srcu_boot_list)) {
ssp = list_first_entry(&srcu_boot_list, struct srcu_struct,
sup = list_first_entry(&srcu_boot_list, struct srcu_usage,
work.work.entry);
list_del_init(&ssp->work.work.entry);
if (SRCU_SIZING_IS(SRCU_SIZING_INIT) && ssp->srcu_size_state == SRCU_SIZE_SMALL)
ssp->srcu_size_state = SRCU_SIZE_ALLOC;
queue_work(rcu_gp_wq, &ssp->work.work);
list_del_init(&sup->work.work.entry);
if (SRCU_SIZING_IS(SRCU_SIZING_INIT) &&
sup->srcu_size_state == SRCU_SIZE_SMALL)
sup->srcu_size_state = SRCU_SIZE_ALLOC;
queue_work(rcu_gp_wq, &sup->work.work);
}
}
@ -1873,13 +1913,14 @@ void __init srcu_init(void)
static int srcu_module_coming(struct module *mod)
{
int i;
struct srcu_struct *ssp;
struct srcu_struct **sspp = mod->srcu_struct_ptrs;
int ret;
for (i = 0; i < mod->num_srcu_structs; i++) {
ret = init_srcu_struct(*(sspp++));
if (WARN_ON_ONCE(ret))
return ret;
ssp = *(sspp++);
ssp->sda = alloc_percpu(struct srcu_data);
if (WARN_ON_ONCE(!ssp->sda))
return -ENOMEM;
}
return 0;
}
@ -1888,10 +1929,17 @@ static int srcu_module_coming(struct module *mod)
static void srcu_module_going(struct module *mod)
{
int i;
struct srcu_struct *ssp;
struct srcu_struct **sspp = mod->srcu_struct_ptrs;
for (i = 0; i < mod->num_srcu_structs; i++)
cleanup_srcu_struct(*(sspp++));
for (i = 0; i < mod->num_srcu_structs; i++) {
ssp = *(sspp++);
if (!rcu_seq_state(smp_load_acquire(&ssp->srcu_sup->srcu_gp_seq_needed)) &&
!WARN_ON_ONCE(!ssp->srcu_sup->sda_is_static))
cleanup_srcu_struct(ssp);
if (!WARN_ON(srcu_readers_active(ssp)))
free_percpu(ssp->sda);
}
}
/* Handle one module, either coming or going. */

View file

@ -136,8 +136,10 @@ static struct rcu_tasks rt_name = \
.kname = #rt_name, \
}
#ifdef CONFIG_TASKS_RCU
/* Track exiting tasks in order to allow them to be waited for. */
DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
#endif
#ifdef CONFIG_TASKS_RCU
/* Report delay in synchronize_srcu() completion in rcu_tasks_postscan(). */

View file

@ -2134,6 +2134,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
break;
}
} else {
// In rcuoc context, so no worries about depriving
// other softirq vectors of CPU cycles.
local_bh_enable();
lockdep_assert_irqs_enabled();
cond_resched_tasks_rcu_qs();

View file

@ -159,7 +159,7 @@ static void osnoise_unregister_instance(struct trace_array *tr)
if (!found)
return;
kvfree_rcu(inst);
kvfree_rcu_mightsleep(inst);
}
/*

View file

@ -1172,7 +1172,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
return -ENOENT;
list_del_rcu(&link->list);
kvfree_rcu(link);
kvfree_rcu_mightsleep(link);
if (list_empty(&tp->event->files))
trace_probe_clear_flag(tp, TP_FLAG_TRACE);

View file

@ -334,7 +334,7 @@ kvfree_rcu_1_arg_vmalloc_test(void)
return -1;
p->array[0] = 'a';
kvfree_rcu(p);
kvfree_rcu_mightsleep(p);
}
return 0;

View file

@ -177,7 +177,7 @@ static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
if (orig_sock_table) {
static_branch_dec(&rps_needed);
static_branch_dec(&rfs_needed);
kvfree_rcu(orig_sock_table);
kvfree_rcu_mightsleep(orig_sock_table);
}
}
}
@ -215,7 +215,7 @@ static int flow_limit_cpu_sysctl(struct ctl_table *table, int write,
lockdep_is_held(&flow_limit_update_mutex));
if (cur && !cpumask_test_cpu(i, mask)) {
RCU_INIT_POINTER(sd->flow_limit, NULL);
kfree_rcu(cur);
kfree_rcu_mightsleep(cur);
} else if (!cur && cpumask_test_cpu(i, mask)) {
cur = kzalloc_node(len, GFP_KERNEL,
cpu_to_node(i));

View file

@ -52,7 +52,7 @@ static int mac802154_scan_cleanup_locked(struct ieee802154_local *local,
request = rcu_replace_pointer(local->scan_req, NULL, 1);
if (!request)
return 0;
kfree_rcu(request);
kvfree_rcu_mightsleep(request);
/* Advertize first, while we know the devices cannot be removed */
if (aborted)
@ -403,7 +403,7 @@ int mac802154_stop_beacons_locked(struct ieee802154_local *local,
request = rcu_replace_pointer(local->beacon_req, NULL, 1);
if (!request)
return 0;
kfree_rcu(request);
kvfree_rcu_mightsleep(request);
nl802154_beaconing_done(wpan_dev);

View file

@ -6388,6 +6388,15 @@ sub process {
}
}
# check for soon-to-be-deprecated single-argument k[v]free_rcu() API
if ($line =~ /\bk[v]?free_rcu\s*\([^(]+\)/) {
if ($line =~ /\bk[v]?free_rcu\s*\([^,]+\)/) {
ERROR("DEPRECATED_API",
"Single-argument k[v]free_rcu() API is deprecated, please pass rcu_head object or call k[v]free_rcu_mightsleep()." . $herecurr);
}
}
# check for unnecessary "Out of Memory" messages
if ($line =~ /^\+.*\b$logFunctions\s*\(/ &&
$prevline =~ /^[ \+]\s*if\s*\(\s*(\!\s*|NULL\s*==\s*)?($Lval)(\s*==\s*NULL\s*)?\s*\)/ &&

26
tools/rcu/extract-stall.sh Normal file → Executable file
View file

@ -1,11 +1,25 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
#
# Extract any RCU CPU stall warnings present in specified file.
# Filter out clocksource lines. Note that preceding-lines excludes the
# initial line of the stall warning but trailing-lines includes it.
#
# Usage: extract-stall.sh dmesg-file [ preceding-lines [ trailing-lines ] ]
usage() {
echo Extract any RCU CPU stall warnings present in specified file.
echo Filter out clocksource lines. Note that preceding-lines excludes the
echo initial line of the stall warning but trailing-lines includes it.
echo
echo Usage: $(basename $0) dmesg-file [ preceding-lines [ trailing-lines ] ]
echo
echo Error: $1
}
# Terminate the script, if the argument is missing
if test -f "$1" && test -r "$1"
then
:
else
usage "Console log file \"$1\" missing or unreadable."
exit 1
fi
echo $1
preceding_lines="${2-3}"

View file

@ -193,7 +193,7 @@ do
qemu_cmd_dir="`dirname "$i"`"
kernel_dir="`echo $qemu_cmd_dir | sed -e 's/\.[0-9]\+$//'`"
jitter_dir="`dirname "$kernel_dir"`"
kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" $dur "$bootargs" < $T/qemu-cmd > $i
kvm-transform.sh "$kernel_dir/bzImage" "$qemu_cmd_dir/console.log" "$jitter_dir" "$dur" "$bootargs" < $T/qemu-cmd > $i
if test -n "$arg_remote"
then
echo "# TORTURE_KCONFIG_GDB_ARG=''" >> $i

View file

@ -0,0 +1,78 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Run SRCU-lockdep tests and report any that fail to meet expectations.
#
# Copyright (C) 2021 Meta Platforms, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
usage () {
echo "Usage: $scriptname optional arguments:"
echo " --datestamp string"
exit 1
}
ds=`date +%Y.%m.%d-%H.%M.%S`-srcu_lockdep
scriptname="$0"
T="`mktemp -d ${TMPDIR-/tmp}/srcu_lockdep.sh.XXXXXX`"
trap 'rm -rf $T' 0
RCUTORTURE="`pwd`/tools/testing/selftests/rcutorture"; export RCUTORTURE
PATH=${RCUTORTURE}/bin:$PATH; export PATH
. functions.sh
while test $# -gt 0
do
case "$1" in
--datestamp)
checkarg --datestamp "(relative pathname)" "$#" "$2" '^[a-zA-Z0-9._/-]*$' '^--'
ds=$2
shift
;;
*)
echo Unknown argument $1
usage
;;
esac
shift
done
err=
nerrs=0
for d in 0 1
do
for t in 0 1 2
do
for c in 1 2 3
do
err=
val=$((d*1000+t*10+c))
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --bootargs "rcutorture.test_srcu_lockdep=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1
ret=$?
mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val"
if test "$d" -ne 0 && test "$ret" -eq 0
then
err=1
echo -n Unexpected success for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
fi
if test "$d" -eq 0 && test "$ret" -ne 0
then
err=1
echo -n Unexpected failure for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
fi
if test -n "$err"
then
grep "rcu_torture_init_srcu_lockdep: test_srcu_lockdep = " "$RCUTORTURE/res/$ds/$val/SRCU-P/console.log" | sed -e 's/^.*rcu_torture_init_srcu_lockdep://' >> "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
cat "$RCUTORTURE/res/$ds/$val/kvm.sh.err"
nerrs=$((nerrs+1))
fi
done
done
done
if test "$nerrs" -ne 0
then
exit 1
fi
exit 0

View file

@ -497,16 +497,16 @@ fi
if test "$do_clocksourcewd" = "yes"
then
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000"
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog"
torture_set "clocksourcewd-1" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1"
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 clocksource.max_cswd_read_retries=1 tsc=watchdog"
torture_set "clocksourcewd-2" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --kconfig "CONFIG_TEST_CLOCKSOURCE_WATCHDOG=y" --trust-make
# In case our work is already done...
if test "$do_rcutorture" != "yes"
then
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000"
torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 tsc=watchdog"
torture_set "clocksourcewd-3" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 45s --configs TREE03 --trust-make
fi
fi

View file

@ -15,3 +15,4 @@ CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
CONFIG_BOOTPARAM_HOTPLUG_CPU0=y

View file

@ -15,3 +15,4 @@ CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
CONFIG_RCU_EQS_DEBUG=y
CONFIG_RCU_LAZY=y