linux-stable/Documentation/locking/seqlock.rst

======================================
Sequence counters and sequential locks
======================================

Introduction
============

Sequence counters are a reader-writer consistency mechanism with
lockless readers (read-only retry loops), and no writer starvation. They
are used for data that's rarely written to (e.g. system time), where the
reader wants a consistent set of information and is willing to retry if
that information changes.

A data set is consistent when the sequence count at the beginning of the
read side critical section is even and the same sequence count value is
read again at the end of the critical section. The data in the set must
be copied out inside the read side critical section. If the sequence
count has changed between the start and the end of the critical section,
the reader must retry.

Writers increment the sequence count at the start and the end of their
critical section. After starting the critical section the sequence count
is odd and indicates to the readers that an update is in progress. At
the end of the write side critical section the sequence count becomes
even again which lets readers make progress.

A sequence counter write side critical section must never be preempted
or interrupted by read side sections. Otherwise the reader will spin for
the entire scheduler tick due to the odd sequence count value and the
interrupted writer. If that reader belongs to a real-time scheduling
class, it can spin forever and the kernel will livelock.

This mechanism cannot be used if the protected data contains pointers,
as the writer can invalidate a pointer that the reader is following.


.. _seqcount_t:

Sequence counters (``seqcount_t``)
==================================

This is the the raw counting mechanism, which does not protect against
multiple writers.  Write side critical sections must thus be serialized
by an external lock.

If the write serialization primitive is not implicitly disabling
preemption, preemption must be explicitly disabled before entering the
write side section. If the read section can be invoked from hardirq or
softirq contexts, interrupts or bottom halves must also be respectively
disabled before entering the write section.

If it's desired to automatically handle the sequence counter
requirements of writer serialization and non-preemptibility, use
:ref:`seqlock_t` instead.

Initialization::

	/* dynamic */
	seqcount_t foo_seqcount;
	seqcount_init(&foo_seqcount);

	/* static */
	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);

	/* C99 struct init */
	struct {
		.seq   = SEQCNT_ZERO(foo.seq),
	} foo;

Write path::

	/* Serialized context with disabled preemption */

	write_seqcount_begin(&foo_seqcount);

	/* ... [[write-side critical section]] ... */

	write_seqcount_end(&foo_seqcount);

Read path::

	do {
		seq = read_seqcount_begin(&foo_seqcount);

		/* ... [[read-side critical section]] ... */

	} while (read_seqcount_retry(&foo_seqcount, seq));


.. _seqcount_locktype_t:

Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
-----------------------------------------------------------------

As discussed at :ref:`seqcount_t`, sequence count write side critical
sections must be serialized and non-preemptible. This variant of
sequence counters associate the lock used for writer serialization at
initialization time, which enables lockdep to validate that the write
side critical sections are properly serialized.

This lock association is a NOOP if lockdep is disabled and has neither
storage nor runtime overhead. If lockdep is enabled, the lock pointer is
stored in struct seqcount and lockdep's "lock is held" assertions are
injected at the beginning of the write side critical section to validate
that it is properly protected.

For lock types which do not implicitly disable preemption, preemption
protection is enforced in the write side function.

The following sequence counters with associated locks are defined:

  - ``seqcount_spinlock_t``
  - ``seqcount_raw_spinlock_t``
  - ``seqcount_rwlock_t``
  - ``seqcount_mutex_t``
  - ``seqcount_ww_mutex_t``

The plain seqcount read and write APIs branch out to the specific
seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
API explosion per each new seqcount LOCKTYPE.

Initialization (replace "LOCKTYPE" with one of the supported locks)::

	/* dynamic */
	seqcount_LOCKTYPE_t foo_seqcount;
	seqcount_LOCKTYPE_init(&foo_seqcount, &lock);

	/* static */
	static seqcount_LOCKTYPE_t foo_seqcount =
		SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);

	/* C99 struct init */
	struct {
		.seq   = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),
	} foo;

Write path: same as in :ref:`seqcount_t`, while running from a context
with the associated LOCKTYPE lock acquired.

Read path: same as in :ref:`seqcount_t`.

.. _seqlock_t:

Sequential locks (``seqlock_t``)
================================

This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
embedded spinlock for writer serialization and non-preemptibility.

If the read side section can be invoked from hardirq or softirq context,
use the write side function variants which disable interrupts or bottom
halves respectively.

Initialization::

	/* dynamic */
	seqlock_t foo_seqlock;
	seqlock_init(&foo_seqlock);

	/* static */
	static DEFINE_SEQLOCK(foo_seqlock);

	/* C99 struct init */
	struct {
		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
	} foo;

Write path::

	write_seqlock(&foo_seqlock);

	/* ... [[write-side critical section]] ... */

	write_sequnlock(&foo_seqlock);

Read path, three categories:

1. Normal Sequence readers which never block a writer but they must
   retry if a writer is in progress by detecting change in the sequence
   number.  Writers do not wait for a sequence reader::

	do {
		seq = read_seqbegin(&foo_seqlock);

		/* ... [[read-side critical section]] ... */

	} while (read_seqretry(&foo_seqlock, seq));

2. Locking readers which will wait if a writer or another locking reader
   is in progress. A locking reader in progress will also block a writer
   from entering its critical section. This read lock is
   exclusive. Unlike rwlock_t, only one locking reader can acquire it::

	read_seqlock_excl(&foo_seqlock);

	/* ... [[read-side critical section]] ... */

	read_sequnlock_excl(&foo_seqlock);

3. Conditional lockless reader (as in 1), or locking reader (as in 2),
   according to a passed marker. This is used to avoid lockless readers
   starvation (too much retry loops) in case of a sharp spike in write
   activity. First, a lockless read is tried (even marker passed). If
   that trial fails (odd sequence counter is returned, which is used as
   the next iteration marker), the lockless read is transformed to a
   full locking read and no retry loop is necessary::

	/* marker; even initialization */
	int seq = 0;
	do {
		read_seqbegin_or_lock(&foo_seqlock, &seq);

		/* ... [[read-side critical section]] ... */

	} while (need_seqretry(&foo_seqlock, seq));
	done_seqretry(&foo_seqlock, seq);


API documentation
=================

.. kernel-doc:: include/linux/seqlock.h
Documentation: locking: Describe seqlock design and usage Proper documentation for the design and usage of sequence counters and sequential locks does not exist. Complete the seqlock.h documentation as follows: - Divide all documentation on a seqcount_t vs. seqlock_t basis. The description for both mechanisms was intermingled, which is incorrect since the usage constrains for each type are vastly different. - Add an introductory paragraph describing the internal design of, and rationale for, sequence counters. - Document seqcount_t writer non-preemptibility requirement, which was not previously documented anywhere, and provide a clear rationale. - Provide template code for seqcount_t and seqlock_t initialization and reader/writer critical sections. - Recommend using seqlock_t by default. It implicitly handles the serialization and non-preemptibility requirements of writers. At seqlock.h: - Remove references to brlocks as they've long been removed from the kernel. - Remove references to gcc-3.x since the kernel's minimum supported gcc version is 4.9. References: 0f6ed63b1707 ("no need to keep brlock macros anymore...") References: 6ec4476ac825 ("Raise gcc version requirement to 4.9") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de 2020-07-20 15:55:07 +00:00			`======================================`
			`Sequence counters and sequential locks`
			`======================================`

			`Introduction`
			`============`

			`Sequence counters are a reader-writer consistency mechanism with`
			`lockless readers (read-only retry loops), and no writer starvation. They`
			`are used for data that's rarely written to (e.g. system time), where the`
			`reader wants a consistent set of information and is willing to retry if`
			`that information changes.`

			`A data set is consistent when the sequence count at the beginning of the`
			`read side critical section is even and the same sequence count value is`
			`read again at the end of the critical section. The data in the set must`
			`be copied out inside the read side critical section. If the sequence`
			`count has changed between the start and the end of the critical section,`
			`the reader must retry.`

			`Writers increment the sequence count at the start and the end of their`
			`critical section. After starting the critical section the sequence count`
			`is odd and indicates to the readers that an update is in progress. At`
			`the end of the write side critical section the sequence count becomes`
			`even again which lets readers make progress.`

			`A sequence counter write side critical section must never be preempted`
			`or interrupted by read side sections. Otherwise the reader will spin for`
			`the entire scheduler tick due to the odd sequence count value and the`
			`interrupted writer. If that reader belongs to a real-time scheduling`
			`class, it can spin forever and the kernel will livelock.`

			`This mechanism cannot be used if the protected data contains pointers,`
			`as the writer can invalidate a pointer that the reader is following.`


			`.. _seqcount_t:`

			Sequence counters (``seqcount_t``)
			`==================================`

			`This is the the raw counting mechanism, which does not protect against`
			`multiple writers. Write side critical sections must thus be serialized`
			`by an external lock.`

			`If the write serialization primitive is not implicitly disabling`
			`preemption, preemption must be explicitly disabled before entering the`
			`write side section. If the read section can be invoked from hardirq or`
			`softirq contexts, interrupts or bottom halves must also be respectively`
			`disabled before entering the write section.`

			`If it's desired to automatically handle the sequence counter`
			`requirements of writer serialization and non-preemptibility, use`
			:ref:`seqlock_t` instead.

			`Initialization::`

			`/* dynamic */`
			`seqcount_t foo_seqcount;`
			`seqcount_init(&foo_seqcount);`

			`/* static */`
			`static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);`

			`/* C99 struct init */`
			`struct {`
			`.seq = SEQCNT_ZERO(foo.seq),`
			`} foo;`

			`Write path::`

			`/* Serialized context with disabled preemption */`

			`write_seqcount_begin(&foo_seqcount);`

			`/* ... [[write-side critical section]] ... */`

			`write_seqcount_end(&foo_seqcount);`

			`Read path::`

			`do {`
			`seq = read_seqcount_begin(&foo_seqcount);`

			`/* ... [[read-side critical section]] ... */`

			`} while (read_seqcount_retry(&foo_seqcount, seq));`


seqlock: Extend seqcount API with associated locks A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the write side critical section. There is no built-in debugging mechanism to verify that the lock used for writer serialization is held and preemption is disabled. Some usage sites like dma-buf have explicit lockdep checks for the writer-side lock, but this covers only a small portion of the sequence counter usage in the kernel. Add new sequence counter types which allows to associate a lock to the sequence counter at initialization time. The seqcount API functions are extended to provide appropriate lockdep assertions depending on the seqcount/lock type. For sequence counters with associated locks that do not implicitly disable preemption, preemption protection is enforced in the sequence counter write side functions. This removes the need to explicitly add preempt_disable/enable() around the write side critical sections: the write_begin/end() functions for these new sequence counter types automatically do this. Introduce the following seqcount types with associated locks: seqcount_spinlock_t seqcount_raw_spinlock_t seqcount_rwlock_t seqcount_mutex_t seqcount_ww_mutex_t Extend the seqcount read and write functions to branch out to the specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such compile-time type detection logic into a new, internal, seqlock header. Document the proper seqcount_LOCKTYPE_t usage, and rationale, at Documentation/locking/seqlock.rst. If lockdep is disabled, this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-10-a.darwish@linutronix.de 2020-07-20 15:55:15 +00:00			`.. _seqcount_locktype_t:`

			Sequence counters with associated locks (``seqcount_LOCKTYPE_t``)
			`-----------------------------------------------------------------`

			As discussed at :ref:`seqcount_t`, sequence count write side critical
			`sections must be serialized and non-preemptible. This variant of`
			`sequence counters associate the lock used for writer serialization at`
			`initialization time, which enables lockdep to validate that the write`
			`side critical sections are properly serialized.`

			`This lock association is a NOOP if lockdep is disabled and has neither`
			`storage nor runtime overhead. If lockdep is enabled, the lock pointer is`
			`stored in struct seqcount and lockdep's "lock is held" assertions are`
			`injected at the beginning of the write side critical section to validate`
			`that it is properly protected.`

			`For lock types which do not implicitly disable preemption, preemption`
			`protection is enforced in the write side function.`

			`The following sequence counters with associated locks are defined:`

			- ``seqcount_spinlock_t``
			- ``seqcount_raw_spinlock_t``
			- ``seqcount_rwlock_t``
			- ``seqcount_mutex_t``
			- ``seqcount_ww_mutex_t``

			`The plain seqcount read and write APIs branch out to the specific`
			`seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel`
			`API explosion per each new seqcount LOCKTYPE.`

			`Initialization (replace "LOCKTYPE" with one of the supported locks)::`

			`/* dynamic */`
			`seqcount_LOCKTYPE_t foo_seqcount;`
			`seqcount_LOCKTYPE_init(&foo_seqcount, &lock);`

			`/* static */`
			`static seqcount_LOCKTYPE_t foo_seqcount =`
			`SEQCNT_LOCKTYPE_ZERO(foo_seqcount, &lock);`

			`/* C99 struct init */`
			`struct {`
			`.seq = SEQCNT_LOCKTYPE_ZERO(foo.seq, &lock),`
			`} foo;`

			Write path: same as in :ref:`seqcount_t`, while running from a context
			`with the associated LOCKTYPE lock acquired.`

			Read path: same as in :ref:`seqcount_t`.

Documentation: locking: Describe seqlock design and usage Proper documentation for the design and usage of sequence counters and sequential locks does not exist. Complete the seqlock.h documentation as follows: - Divide all documentation on a seqcount_t vs. seqlock_t basis. The description for both mechanisms was intermingled, which is incorrect since the usage constrains for each type are vastly different. - Add an introductory paragraph describing the internal design of, and rationale for, sequence counters. - Document seqcount_t writer non-preemptibility requirement, which was not previously documented anywhere, and provide a clear rationale. - Provide template code for seqcount_t and seqlock_t initialization and reader/writer critical sections. - Recommend using seqlock_t by default. It implicitly handles the serialization and non-preemptibility requirements of writers. At seqlock.h: - Remove references to brlocks as they've long been removed from the kernel. - Remove references to gcc-3.x since the kernel's minimum supported gcc version is 4.9. References: 0f6ed63b1707 ("no need to keep brlock macros anymore...") References: 6ec4476ac825 ("Raise gcc version requirement to 4.9") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de 2020-07-20 15:55:07 +00:00			`.. _seqlock_t:`

			Sequential locks (``seqlock_t``)
			`================================`

			This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
			`embedded spinlock for writer serialization and non-preemptibility.`

			`If the read side section can be invoked from hardirq or softirq context,`
			`use the write side function variants which disable interrupts or bottom`
			`halves respectively.`

			`Initialization::`

			`/* dynamic */`
			`seqlock_t foo_seqlock;`
			`seqlock_init(&foo_seqlock);`

			`/* static */`
			`static DEFINE_SEQLOCK(foo_seqlock);`

			`/* C99 struct init */`
			`struct {`
			`.seql = __SEQLOCK_UNLOCKED(foo.seql)`
			`} foo;`

			`Write path::`

			`write_seqlock(&foo_seqlock);`

			`/* ... [[write-side critical section]] ... */`

			`write_sequnlock(&foo_seqlock);`

			`Read path, three categories:`

			`1. Normal Sequence readers which never block a writer but they must`
			`retry if a writer is in progress by detecting change in the sequence`
			`number. Writers do not wait for a sequence reader::`

			`do {`
			`seq = read_seqbegin(&foo_seqlock);`

			`/* ... [[read-side critical section]] ... */`

			`} while (read_seqretry(&foo_seqlock, seq));`

			`2. Locking readers which will wait if a writer or another locking reader`
			`is in progress. A locking reader in progress will also block a writer`
			`from entering its critical section. This read lock is`
			`exclusive. Unlike rwlock_t, only one locking reader can acquire it::`

			`read_seqlock_excl(&foo_seqlock);`

			`/* ... [[read-side critical section]] ... */`

			`read_sequnlock_excl(&foo_seqlock);`

			`3. Conditional lockless reader (as in 1), or locking reader (as in 2),`
			`according to a passed marker. This is used to avoid lockless readers`
			`starvation (too much retry loops) in case of a sharp spike in write`
			`activity. First, a lockless read is tried (even marker passed). If`
			`that trial fails (odd sequence counter is returned, which is used as`
			`the next iteration marker), the lockless read is transformed to a`
			`full locking read and no retry loop is necessary::`

			`/* marker; even initialization */`
			`int seq = 0;`
			`do {`
			`read_seqbegin_or_lock(&foo_seqlock, &seq);`

			`/* ... [[read-side critical section]] ... */`

			`} while (need_seqretry(&foo_seqlock, seq));`
			`done_seqretry(&foo_seqlock, seq);`


			`API documentation`
			`=================`

			`.. kernel-doc:: include/linux/seqlock.h`