mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-08-26 02:39:48 +00:00
LKMM pull request for v6.1
This pull request includes several documentation updates. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmM3clMTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jIEPD/9pYzQdQCvcuOxSjOrQeCayvpVI3J/C HvTkaKpkhL0ms8TZZSWphicAduXqE/ra9Z/GSHHMllsUvOGkdN1AGhk4k1VJxZcj hRXIHDrEC8FNrfC9rySlWdlYeNDcckYJ0U1L/Vx4vpJSsVCGxCZWFjzHu72qIjnB xa62xkgZ9ZanSZGn+FJwMnQyITXFJY/yjrb/K7H2J47rA6yqa1U0xjospWjbgEqB cHUVIB0NT3ALVi2RqDVysZSresLjoI5Q6/YV8DgAo8eoDHtXF9QVvqh1eURJbkt5 tT7p5xLfkigJHNwpJhY7/akeYiP8TbETFcpWRDGWeVhnrYUd7aS+jbn3aDZwtYg4 MJmIBDS0mXUhwfjxmYgbCTr6QZqlBNfOa94NGYsgNOzr4ZV9lE96yH5s2CjfEyGg S8OX93qAlSeMRC00BmsiNAOAGRzOsinNGmoq5wEnrMt03icixsHXnJDjzm17NO7x sgLPUHin9uTyAgLP8HWMD+7YSbD507EMubTOgujME/ZFOxDncG/dXOC0SxHxIe9l SNhQOUVzUvJ6QVjlABg55a5zpirt33pPEwwRCg3kKE1ocSOyJrnPOU6K51LFe6/p r0LEzKf9VQev9cBIyGAs9ixJLEXmMWwWTBoa41kkrp9g80cONAyVMccaDbbqETJW Am+pVGS8imlERQ== =0M+8 -----END PGP SIGNATURE----- Merge tag 'lkmm.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull LKMM (Linux Kernel Memory Model) updates from Paul McKenney: "Several documentation updates" * tag 'lkmm.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model: Clarify LKMM's limitations in litmus-tests.txt docs/memory-barriers.txt: Fixup long lines docs/memory-barriers.txt: Fix confusing name of 'data dependency barrier'
This commit is contained in:
commit
b8fb65e1d3
2 changed files with 122 additions and 92 deletions
|
@ -52,7 +52,7 @@ CONTENTS
|
||||||
|
|
||||||
- Varieties of memory barrier.
|
- Varieties of memory barrier.
|
||||||
- What may not be assumed about memory barriers?
|
- What may not be assumed about memory barriers?
|
||||||
- Data dependency barriers (historical).
|
- Address-dependency barriers (historical).
|
||||||
- Control dependencies.
|
- Control dependencies.
|
||||||
- SMP barrier pairing.
|
- SMP barrier pairing.
|
||||||
- Examples of memory barrier sequences.
|
- Examples of memory barrier sequences.
|
||||||
|
@ -187,9 +187,9 @@ As a further example, consider this sequence of events:
|
||||||
B = 4; Q = P;
|
B = 4; Q = P;
|
||||||
P = &B; D = *Q;
|
P = &B; D = *Q;
|
||||||
|
|
||||||
There is an obvious data dependency here, as the value loaded into D depends on
|
There is an obvious address dependency here, as the value loaded into D depends
|
||||||
the address retrieved from P by CPU 2. At the end of the sequence, any of the
|
on the address retrieved from P by CPU 2. At the end of the sequence, any of
|
||||||
following results are possible:
|
the following results are possible:
|
||||||
|
|
||||||
(Q == &A) and (D == 1)
|
(Q == &A) and (D == 1)
|
||||||
(Q == &B) and (D == 2)
|
(Q == &B) and (D == 2)
|
||||||
|
@ -391,58 +391,62 @@ Memory barriers come in four basic varieties:
|
||||||
memory system as time progresses. All stores _before_ a write barrier
|
memory system as time progresses. All stores _before_ a write barrier
|
||||||
will occur _before_ all the stores after the write barrier.
|
will occur _before_ all the stores after the write barrier.
|
||||||
|
|
||||||
[!] Note that write barriers should normally be paired with read or data
|
[!] Note that write barriers should normally be paired with read or
|
||||||
dependency barriers; see the "SMP barrier pairing" subsection.
|
address-dependency barriers; see the "SMP barrier pairing" subsection.
|
||||||
|
|
||||||
|
|
||||||
(2) Data dependency barriers.
|
(2) Address-dependency barriers (historical).
|
||||||
|
|
||||||
A data dependency barrier is a weaker form of read barrier. In the case
|
An address-dependency barrier is a weaker form of read barrier. In the
|
||||||
where two loads are performed such that the second depends on the result
|
case where two loads are performed such that the second depends on the
|
||||||
of the first (eg: the first load retrieves the address to which the second
|
result of the first (eg: the first load retrieves the address to which
|
||||||
load will be directed), a data dependency barrier would be required to
|
the second load will be directed), an address-dependency barrier would
|
||||||
make sure that the target of the second load is updated after the address
|
be required to make sure that the target of the second load is updated
|
||||||
obtained by the first load is accessed.
|
after the address obtained by the first load is accessed.
|
||||||
|
|
||||||
A data dependency barrier is a partial ordering on interdependent loads
|
An address-dependency barrier is a partial ordering on interdependent
|
||||||
only; it is not required to have any effect on stores, independent loads
|
loads only; it is not required to have any effect on stores, independent
|
||||||
or overlapping loads.
|
loads or overlapping loads.
|
||||||
|
|
||||||
As mentioned in (1), the other CPUs in the system can be viewed as
|
As mentioned in (1), the other CPUs in the system can be viewed as
|
||||||
committing sequences of stores to the memory system that the CPU being
|
committing sequences of stores to the memory system that the CPU being
|
||||||
considered can then perceive. A data dependency barrier issued by the CPU
|
considered can then perceive. An address-dependency barrier issued by
|
||||||
under consideration guarantees that for any load preceding it, if that
|
the CPU under consideration guarantees that for any load preceding it,
|
||||||
load touches one of a sequence of stores from another CPU, then by the
|
if that load touches one of a sequence of stores from another CPU, then
|
||||||
time the barrier completes, the effects of all the stores prior to that
|
by the time the barrier completes, the effects of all the stores prior to
|
||||||
touched by the load will be perceptible to any loads issued after the data
|
that touched by the load will be perceptible to any loads issued after
|
||||||
dependency barrier.
|
the address-dependency barrier.
|
||||||
|
|
||||||
See the "Examples of memory barrier sequences" subsection for diagrams
|
See the "Examples of memory barrier sequences" subsection for diagrams
|
||||||
showing the ordering constraints.
|
showing the ordering constraints.
|
||||||
|
|
||||||
[!] Note that the first load really has to have a _data_ dependency and
|
[!] Note that the first load really has to have an _address_ dependency and
|
||||||
not a control dependency. If the address for the second load is dependent
|
not a control dependency. If the address for the second load is dependent
|
||||||
on the first load, but the dependency is through a conditional rather than
|
on the first load, but the dependency is through a conditional rather than
|
||||||
actually loading the address itself, then it's a _control_ dependency and
|
actually loading the address itself, then it's a _control_ dependency and
|
||||||
a full read barrier or better is required. See the "Control dependencies"
|
a full read barrier or better is required. See the "Control dependencies"
|
||||||
subsection for more information.
|
subsection for more information.
|
||||||
|
|
||||||
[!] Note that data dependency barriers should normally be paired with
|
[!] Note that address-dependency barriers should normally be paired with
|
||||||
write barriers; see the "SMP barrier pairing" subsection.
|
write barriers; see the "SMP barrier pairing" subsection.
|
||||||
|
|
||||||
|
[!] Kernel release v5.9 removed kernel APIs for explicit address-
|
||||||
|
dependency barriers. Nowadays, APIs for marking loads from shared
|
||||||
|
variables such as READ_ONCE() and rcu_dereference() provide implicit
|
||||||
|
address-dependency barriers.
|
||||||
|
|
||||||
(3) Read (or load) memory barriers.
|
(3) Read (or load) memory barriers.
|
||||||
|
|
||||||
A read barrier is a data dependency barrier plus a guarantee that all the
|
A read barrier is an address-dependency barrier plus a guarantee that all
|
||||||
LOAD operations specified before the barrier will appear to happen before
|
the LOAD operations specified before the barrier will appear to happen
|
||||||
all the LOAD operations specified after the barrier with respect to the
|
before all the LOAD operations specified after the barrier with respect to
|
||||||
other components of the system.
|
the other components of the system.
|
||||||
|
|
||||||
A read barrier is a partial ordering on loads only; it is not required to
|
A read barrier is a partial ordering on loads only; it is not required to
|
||||||
have any effect on stores.
|
have any effect on stores.
|
||||||
|
|
||||||
Read memory barriers imply data dependency barriers, and so can substitute
|
Read memory barriers imply address-dependency barriers, and so can
|
||||||
for them.
|
substitute for them.
|
||||||
|
|
||||||
[!] Note that read barriers should normally be paired with write barriers;
|
[!] Note that read barriers should normally be paired with write barriers;
|
||||||
see the "SMP barrier pairing" subsection.
|
see the "SMP barrier pairing" subsection.
|
||||||
|
@ -550,17 +554,21 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
|
||||||
Documentation/core-api/dma-api.rst
|
Documentation/core-api/dma-api.rst
|
||||||
|
|
||||||
|
|
||||||
DATA DEPENDENCY BARRIERS (HISTORICAL)
|
ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
|
||||||
-------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
|
As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for
|
||||||
DEC Alpha, which means that about the only people who need to pay attention
|
DEC Alpha, which means that about the only people who need to pay attention
|
||||||
to this section are those working on DEC Alpha architecture-specific code
|
to this section are those working on DEC Alpha architecture-specific code
|
||||||
and those working on READ_ONCE() itself. For those who need it, and for
|
and those working on READ_ONCE() itself. For those who need it, and for
|
||||||
those who are interested in the history, here is the story of
|
those who are interested in the history, here is the story of
|
||||||
data-dependency barriers.
|
address-dependency barriers.
|
||||||
|
|
||||||
The usage requirements of data dependency barriers are a little subtle, and
|
[!] While address dependencies are observed in both load-to-load and
|
||||||
|
load-to-store relations, address-dependency barriers are not necessary
|
||||||
|
for load-to-store situations.
|
||||||
|
|
||||||
|
The requirement of address-dependency barriers is a little subtle, and
|
||||||
it's not always obvious that they're needed. To illustrate, consider the
|
it's not always obvious that they're needed. To illustrate, consider the
|
||||||
following sequence of events:
|
following sequence of events:
|
||||||
|
|
||||||
|
@ -570,11 +578,14 @@ following sequence of events:
|
||||||
B = 4;
|
B = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
WRITE_ONCE(P, &B);
|
WRITE_ONCE(P, &B);
|
||||||
Q = READ_ONCE(P);
|
Q = READ_ONCE_OLD(P);
|
||||||
D = *Q;
|
D = *Q;
|
||||||
|
|
||||||
There's a clear data dependency here, and it would seem that by the end of the
|
[!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
|
||||||
sequence, Q must be either &A or &B, and that:
|
doesn't imply an address-dependency barrier.
|
||||||
|
|
||||||
|
There's a clear address dependency here, and it would seem that by the end of
|
||||||
|
the sequence, Q must be either &A or &B, and that:
|
||||||
|
|
||||||
(Q == &A) implies (D == 1)
|
(Q == &A) implies (D == 1)
|
||||||
(Q == &B) implies (D == 4)
|
(Q == &B) implies (D == 4)
|
||||||
|
@ -588,8 +599,8 @@ While this may seem like a failure of coherency or causality maintenance, it
|
||||||
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
|
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
|
||||||
Alpha).
|
Alpha).
|
||||||
|
|
||||||
To deal with this, a data dependency barrier or better must be inserted
|
To deal with this, READ_ONCE() provides an implicit address-dependency barrier
|
||||||
between the address load and the data load:
|
since kernel release v4.15:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=============== ===============
|
||||||
|
@ -598,7 +609,7 @@ between the address load and the data load:
|
||||||
<write barrier>
|
<write barrier>
|
||||||
WRITE_ONCE(P, &B);
|
WRITE_ONCE(P, &B);
|
||||||
Q = READ_ONCE(P);
|
Q = READ_ONCE(P);
|
||||||
<data dependency barrier>
|
<implicit address-dependency barrier>
|
||||||
D = *Q;
|
D = *Q;
|
||||||
|
|
||||||
This enforces the occurrence of one of the two implications, and prevents the
|
This enforces the occurrence of one of the two implications, and prevents the
|
||||||
|
@ -615,13 +626,13 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
|
||||||
but the old value of the variable B (2).
|
but the old value of the variable B (2).
|
||||||
|
|
||||||
|
|
||||||
A data-dependency barrier is not required to order dependent writes
|
An address-dependency barrier is not required to order dependent writes
|
||||||
because the CPUs that the Linux kernel supports don't do writes
|
because the CPUs that the Linux kernel supports don't do writes until they
|
||||||
until they are certain (1) that the write will actually happen, (2)
|
are certain (1) that the write will actually happen, (2) of the location of
|
||||||
of the location of the write, and (3) of the value to be written.
|
the write, and (3) of the value to be written.
|
||||||
But please carefully read the "CONTROL DEPENDENCIES" section and the
|
But please carefully read the "CONTROL DEPENDENCIES" section and the
|
||||||
Documentation/RCU/rcu_dereference.rst file: The compiler can and does
|
Documentation/RCU/rcu_dereference.rst file: The compiler can and does break
|
||||||
break dependencies in a great many highly creative ways.
|
dependencies in a great many highly creative ways.
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=============== ===============
|
=============== ===============
|
||||||
|
@ -629,12 +640,12 @@ break dependencies in a great many highly creative ways.
|
||||||
B = 4;
|
B = 4;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
WRITE_ONCE(P, &B);
|
WRITE_ONCE(P, &B);
|
||||||
Q = READ_ONCE(P);
|
Q = READ_ONCE_OLD(P);
|
||||||
WRITE_ONCE(*Q, 5);
|
WRITE_ONCE(*Q, 5);
|
||||||
|
|
||||||
Therefore, no data-dependency barrier is required to order the read into
|
Therefore, no address-dependency barrier is required to order the read into
|
||||||
Q with the store into *Q. In other words, this outcome is prohibited,
|
Q with the store into *Q. In other words, this outcome is prohibited,
|
||||||
even without a data-dependency barrier:
|
even without an implicit address-dependency barrier of modern READ_ONCE():
|
||||||
|
|
||||||
(Q == &B) && (B == 4)
|
(Q == &B) && (B == 4)
|
||||||
|
|
||||||
|
@ -645,12 +656,12 @@ can be used to record rare error conditions and the like, and the CPUs'
|
||||||
naturally occurring ordering prevents such records from being lost.
|
naturally occurring ordering prevents such records from being lost.
|
||||||
|
|
||||||
|
|
||||||
Note well that the ordering provided by a data dependency is local to
|
Note well that the ordering provided by an address dependency is local to
|
||||||
the CPU containing it. See the section on "Multicopy atomicity" for
|
the CPU containing it. See the section on "Multicopy atomicity" for
|
||||||
more information.
|
more information.
|
||||||
|
|
||||||
|
|
||||||
The data dependency barrier is very important to the RCU system,
|
The address-dependency barrier is very important to the RCU system,
|
||||||
for example. See rcu_assign_pointer() and rcu_dereference() in
|
for example. See rcu_assign_pointer() and rcu_dereference() in
|
||||||
include/linux/rcupdate.h. This permits the current target of an RCU'd
|
include/linux/rcupdate.h. This permits the current target of an RCU'd
|
||||||
pointer to be replaced with a new modified target, without the replacement
|
pointer to be replaced with a new modified target, without the replacement
|
||||||
|
@ -667,20 +678,21 @@ not understand them. The purpose of this section is to help you prevent
|
||||||
the compiler's ignorance from breaking your code.
|
the compiler's ignorance from breaking your code.
|
||||||
|
|
||||||
A load-load control dependency requires a full read memory barrier, not
|
A load-load control dependency requires a full read memory barrier, not
|
||||||
simply a data dependency barrier to make it work correctly. Consider the
|
simply an (implicit) address-dependency barrier to make it work correctly.
|
||||||
following bit of code:
|
Consider the following bit of code:
|
||||||
|
|
||||||
q = READ_ONCE(a);
|
q = READ_ONCE(a);
|
||||||
|
<implicit address-dependency barrier>
|
||||||
if (q) {
|
if (q) {
|
||||||
<data dependency barrier> /* BUG: No data dependency!!! */
|
/* BUG: No address dependency!!! */
|
||||||
p = READ_ONCE(b);
|
p = READ_ONCE(b);
|
||||||
}
|
}
|
||||||
|
|
||||||
This will not have the desired effect because there is no actual data
|
This will not have the desired effect because there is no actual address
|
||||||
dependency, but rather a control dependency that the CPU may short-circuit
|
dependency, but rather a control dependency that the CPU may short-circuit
|
||||||
by attempting to predict the outcome in advance, so that other CPUs see
|
by attempting to predict the outcome in advance, so that other CPUs see
|
||||||
the load from b as having happened before the load from a. In such a
|
the load from b as having happened before the load from a. In such a case
|
||||||
case what's actually required is:
|
what's actually required is:
|
||||||
|
|
||||||
q = READ_ONCE(a);
|
q = READ_ONCE(a);
|
||||||
if (q) {
|
if (q) {
|
||||||
|
@ -927,9 +939,9 @@ General barriers pair with each other, though they also pair with most
|
||||||
other types of barriers, albeit without multicopy atomicity. An acquire
|
other types of barriers, albeit without multicopy atomicity. An acquire
|
||||||
barrier pairs with a release barrier, but both may also pair with other
|
barrier pairs with a release barrier, but both may also pair with other
|
||||||
barriers, including of course general barriers. A write barrier pairs
|
barriers, including of course general barriers. A write barrier pairs
|
||||||
with a data dependency barrier, a control dependency, an acquire barrier,
|
with an address-dependency barrier, a control dependency, an acquire barrier,
|
||||||
a release barrier, a read barrier, or a general barrier. Similarly a
|
a release barrier, a read barrier, or a general barrier. Similarly a
|
||||||
read barrier, control dependency, or a data dependency barrier pairs
|
read barrier, control dependency, or an address-dependency barrier pairs
|
||||||
with a write barrier, an acquire barrier, a release barrier, or a
|
with a write barrier, an acquire barrier, a release barrier, or a
|
||||||
general barrier:
|
general barrier:
|
||||||
|
|
||||||
|
@ -948,7 +960,7 @@ Or:
|
||||||
a = 1;
|
a = 1;
|
||||||
<write barrier>
|
<write barrier>
|
||||||
WRITE_ONCE(b, &a); x = READ_ONCE(b);
|
WRITE_ONCE(b, &a); x = READ_ONCE(b);
|
||||||
<data dependency barrier>
|
<implicit address-dependency barrier>
|
||||||
y = *x;
|
y = *x;
|
||||||
|
|
||||||
Or even:
|
Or even:
|
||||||
|
@ -968,8 +980,8 @@ Basically, the read barrier always has to be there, even though it can be of
|
||||||
the "weaker" type.
|
the "weaker" type.
|
||||||
|
|
||||||
[!] Note that the stores before the write barrier would normally be expected to
|
[!] Note that the stores before the write barrier would normally be expected to
|
||||||
match the loads after the read barrier or the data dependency barrier, and vice
|
match the loads after the read barrier or the address-dependency barrier, and
|
||||||
versa:
|
vice versa:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
=================== ===================
|
=================== ===================
|
||||||
|
@ -1021,8 +1033,8 @@ STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
|
||||||
V
|
V
|
||||||
|
|
||||||
|
|
||||||
Secondly, data dependency barriers act as partial orderings on data-dependent
|
Secondly, address-dependency barriers act as partial orderings on address-
|
||||||
loads. Consider the following sequence of events:
|
dependent loads. Consider the following sequence of events:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
======================= =======================
|
======================= =======================
|
||||||
|
@ -1067,8 +1079,8 @@ effectively random order, despite the write barrier issued by CPU 1:
|
||||||
In the above example, CPU 2 perceives that B is 7, despite the load of *C
|
In the above example, CPU 2 perceives that B is 7, despite the load of *C
|
||||||
(which would be B) coming after the LOAD of C.
|
(which would be B) coming after the LOAD of C.
|
||||||
|
|
||||||
If, however, a data dependency barrier were to be placed between the load of C
|
If, however, an address-dependency barrier were to be placed between the load
|
||||||
and the load of *C (ie: B) on CPU 2:
|
of C and the load of *C (ie: B) on CPU 2:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
======================= =======================
|
======================= =======================
|
||||||
|
@ -1078,7 +1090,7 @@ and the load of *C (ie: B) on CPU 2:
|
||||||
<write barrier>
|
<write barrier>
|
||||||
STORE C = &B LOAD X
|
STORE C = &B LOAD X
|
||||||
STORE D = 4 LOAD C (gets &B)
|
STORE D = 4 LOAD C (gets &B)
|
||||||
<data dependency barrier>
|
<address-dependency barrier>
|
||||||
LOAD *C (reads B)
|
LOAD *C (reads B)
|
||||||
|
|
||||||
then the following will occur:
|
then the following will occur:
|
||||||
|
@ -1101,7 +1113,7 @@ then the following will occur:
|
||||||
| +-------+ | |
|
| +-------+ | |
|
||||||
| | X->9 |------>| |
|
| | X->9 |------>| |
|
||||||
| +-------+ | |
|
| +-------+ | |
|
||||||
Makes sure all effects ---> \ ddddddddddddddddd | |
|
Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
|
||||||
prior to the store of C \ +-------+ | |
|
prior to the store of C \ +-------+ | |
|
||||||
are perceptible to ----->| B->2 |------>| |
|
are perceptible to ----->| B->2 |------>| |
|
||||||
subsequent loads +-------+ | |
|
subsequent loads +-------+ | |
|
||||||
|
@ -1292,7 +1304,7 @@ Which might appear as this:
|
||||||
LOAD with immediate effect : : +-------+
|
LOAD with immediate effect : : +-------+
|
||||||
|
|
||||||
|
|
||||||
Placing a read barrier or a data dependency barrier just before the second
|
Placing a read barrier or an address-dependency barrier just before the second
|
||||||
load:
|
load:
|
||||||
|
|
||||||
CPU 1 CPU 2
|
CPU 1 CPU 2
|
||||||
|
@ -1816,20 +1828,20 @@ which may then reorder things however it wishes.
|
||||||
CPU MEMORY BARRIERS
|
CPU MEMORY BARRIERS
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
The Linux kernel has eight basic CPU memory barriers:
|
The Linux kernel has seven basic CPU memory barriers:
|
||||||
|
|
||||||
TYPE MANDATORY SMP CONDITIONAL
|
TYPE MANDATORY SMP CONDITIONAL
|
||||||
=============== ======================= ===========================
|
======================= =============== ===============
|
||||||
GENERAL mb() smp_mb()
|
GENERAL mb() smp_mb()
|
||||||
WRITE wmb() smp_wmb()
|
WRITE wmb() smp_wmb()
|
||||||
READ rmb() smp_rmb()
|
READ rmb() smp_rmb()
|
||||||
DATA DEPENDENCY READ_ONCE()
|
ADDRESS DEPENDENCY READ_ONCE()
|
||||||
|
|
||||||
|
|
||||||
All memory barriers except the data dependency barriers imply a compiler
|
All memory barriers except the address-dependency barriers imply a compiler
|
||||||
barrier. Data dependencies do not impose any additional compiler ordering.
|
barrier. Address dependencies do not impose any additional compiler ordering.
|
||||||
|
|
||||||
Aside: In the case of data dependencies, the compiler would be expected
|
Aside: In the case of address dependencies, the compiler would be expected
|
||||||
to issue the loads in the correct order (eg. `a[b]` would have to load
|
to issue the loads in the correct order (eg. `a[b]` would have to load
|
||||||
the value of b before loading a[b]), however there is no guarantee in
|
the value of b before loading a[b]), however there is no guarantee in
|
||||||
the C specification that the compiler may not speculate the value of b
|
the C specification that the compiler may not speculate the value of b
|
||||||
|
@ -2749,7 +2761,8 @@ is discarded from the CPU's cache and reloaded. To deal with this, the
|
||||||
appropriate part of the kernel must invalidate the overlapping bits of the
|
appropriate part of the kernel must invalidate the overlapping bits of the
|
||||||
cache on each CPU.
|
cache on each CPU.
|
||||||
|
|
||||||
See Documentation/core-api/cachetlb.rst for more information on cache management.
|
See Documentation/core-api/cachetlb.rst for more information on cache
|
||||||
|
management.
|
||||||
|
|
||||||
|
|
||||||
CACHE COHERENCY VS MMIO
|
CACHE COHERENCY VS MMIO
|
||||||
|
@ -2889,8 +2902,8 @@ AND THEN THERE'S THE ALPHA
|
||||||
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
|
The DEC Alpha CPU is one of the most relaxed CPUs there is. Not only that,
|
||||||
some versions of the Alpha CPU have a split data cache, permitting them to have
|
some versions of the Alpha CPU have a split data cache, permitting them to have
|
||||||
two semantically-related cache lines updated at separate times. This is where
|
two semantically-related cache lines updated at separate times. This is where
|
||||||
the data dependency barrier really becomes necessary as this synchronises both
|
the address-dependency barrier really becomes necessary as this synchronises
|
||||||
caches with the memory coherence system, thus making it seem like pointer
|
both caches with the memory coherence system, thus making it seem like pointer
|
||||||
changes vs new data occur in the right order.
|
changes vs new data occur in the right order.
|
||||||
|
|
||||||
The Alpha defines the Linux kernel's memory model, although as of v4.15
|
The Alpha defines the Linux kernel's memory model, although as of v4.15
|
||||||
|
|
|
@ -946,22 +946,39 @@ Limitations of the Linux-kernel memory model (LKMM) include:
|
||||||
carrying a dependency, then the compiler can break that dependency
|
carrying a dependency, then the compiler can break that dependency
|
||||||
by substituting a constant of that value.
|
by substituting a constant of that value.
|
||||||
|
|
||||||
Conversely, LKMM sometimes doesn't recognize that a particular
|
Conversely, LKMM will sometimes overestimate the amount of
|
||||||
optimization is not allowed, and as a result, thinks that a
|
reordering compilers and CPUs can carry out, leading it to miss
|
||||||
dependency is not present (because the optimization would break it).
|
some pretty obvious cases of ordering. A simple example is:
|
||||||
The memory model misses some pretty obvious control dependencies
|
|
||||||
because of this limitation. A simple example is:
|
|
||||||
|
|
||||||
r1 = READ_ONCE(x);
|
r1 = READ_ONCE(x);
|
||||||
if (r1 == 0)
|
if (r1 == 0)
|
||||||
smp_mb();
|
smp_mb();
|
||||||
WRITE_ONCE(y, 1);
|
WRITE_ONCE(y, 1);
|
||||||
|
|
||||||
There is a control dependency from the READ_ONCE to the WRITE_ONCE,
|
The WRITE_ONCE() does not depend on the READ_ONCE(), and as a
|
||||||
even when r1 is nonzero, but LKMM doesn't realize this and thinks
|
result, LKMM does not claim ordering. However, even though no
|
||||||
that the write may execute before the read if r1 != 0. (Yes, that
|
dependency is present, the WRITE_ONCE() will not be executed before
|
||||||
doesn't make sense if you think about it, but the memory model's
|
the READ_ONCE(). There are two reasons for this:
|
||||||
intelligence is limited.)
|
|
||||||
|
The presence of the smp_mb() in one of the branches
|
||||||
|
prevents the compiler from moving the WRITE_ONCE()
|
||||||
|
up before the "if" statement, since the compiler has
|
||||||
|
to assume that r1 will sometimes be 0 (but see the
|
||||||
|
comment below);
|
||||||
|
|
||||||
|
CPUs do not execute stores before po-earlier conditional
|
||||||
|
branches, even in cases where the store occurs after the
|
||||||
|
two arms of the branch have recombined.
|
||||||
|
|
||||||
|
It is clear that it is not dangerous in the slightest for LKMM to
|
||||||
|
make weaker guarantees than architectures. In fact, it is
|
||||||
|
desirable, as it gives compilers room for making optimizations.
|
||||||
|
For instance, suppose that a 0 value in r1 would trigger undefined
|
||||||
|
behavior elsewhere. Then a clever compiler might deduce that r1
|
||||||
|
can never be 0 in the if condition. As a result, said clever
|
||||||
|
compiler might deem it safe to optimize away the smp_mb(),
|
||||||
|
eliminating the branch and any ordering an architecture would
|
||||||
|
guarantee otherwise.
|
||||||
|
|
||||||
2. Multiple access sizes for a single variable are not supported,
|
2. Multiple access sizes for a single variable are not supported,
|
||||||
and neither are misaligned or partially overlapping accesses.
|
and neither are misaligned or partially overlapping accesses.
|
||||||
|
|
Loading…
Reference in a new issue