mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2024-09-29 05:44:11 +00:00
docs: filesystems: convert xfs-delayed-logging-design.txt to ReST
- Add a SPDX header; - Adjust document and section titles; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/2233c248f12e7b465cd27ee30a86f96eb632946a.1588021877.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
ec4551f450
commit
c3d2f6cb4c
3 changed files with 40 additions and 28 deletions
|
@ -117,4 +117,5 @@ Documentation for filesystem implementations.
|
||||||
udf
|
udf
|
||||||
virtiofs
|
virtiofs
|
||||||
vfat
|
vfat
|
||||||
|
xfs-delayed-logging-design
|
||||||
zonefs
|
zonefs
|
||||||
|
|
|
@ -1,8 +1,11 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==========================
|
||||||
XFS Delayed Logging Design
|
XFS Delayed Logging Design
|
||||||
--------------------------
|
==========================
|
||||||
|
|
||||||
Introduction to Re-logging in XFS
|
Introduction to Re-logging in XFS
|
||||||
---------------------------------
|
=================================
|
||||||
|
|
||||||
XFS logging is a combination of logical and physical logging. Some objects,
|
XFS logging is a combination of logical and physical logging. Some objects,
|
||||||
such as inodes and dquots, are logged in logical format where the details
|
such as inodes and dquots, are logged in logical format where the details
|
||||||
|
@ -25,7 +28,7 @@ changes in the new transaction that is written to the log.
|
||||||
That is, if we have a sequence of changes A through to F, and the object was
|
That is, if we have a sequence of changes A through to F, and the object was
|
||||||
written to disk after change D, we would see in the log the following series
|
written to disk after change D, we would see in the log the following series
|
||||||
of transactions, their contents and the log sequence number (LSN) of the
|
of transactions, their contents and the log sequence number (LSN) of the
|
||||||
transaction:
|
transaction::
|
||||||
|
|
||||||
Transaction Contents LSN
|
Transaction Contents LSN
|
||||||
A A X
|
A A X
|
||||||
|
@ -85,7 +88,7 @@ IO permanently. Hence the XFS journalling subsystem can be considered to be IO
|
||||||
bound.
|
bound.
|
||||||
|
|
||||||
Delayed Logging: Concepts
|
Delayed Logging: Concepts
|
||||||
-------------------------
|
=========================
|
||||||
|
|
||||||
The key thing to note about the asynchronous logging combined with the
|
The key thing to note about the asynchronous logging combined with the
|
||||||
relogging technique XFS uses is that we can be relogging changed objects
|
relogging technique XFS uses is that we can be relogging changed objects
|
||||||
|
@ -154,9 +157,10 @@ The fundamental requirements for delayed logging in XFS are simple:
|
||||||
6. No performance regressions for synchronous transaction workloads.
|
6. No performance regressions for synchronous transaction workloads.
|
||||||
|
|
||||||
Delayed Logging: Design
|
Delayed Logging: Design
|
||||||
-----------------------
|
=======================
|
||||||
|
|
||||||
Storing Changes
|
Storing Changes
|
||||||
|
---------------
|
||||||
|
|
||||||
The problem with accumulating changes at a logical level (i.e. just using the
|
The problem with accumulating changes at a logical level (i.e. just using the
|
||||||
existing log item dirty region tracking) is that when it comes to writing the
|
existing log item dirty region tracking) is that when it comes to writing the
|
||||||
|
@ -194,30 +198,30 @@ asynchronous transactions to the log. The differences between the existing
|
||||||
formatting method and the delayed logging formatting can be seen in the
|
formatting method and the delayed logging formatting can be seen in the
|
||||||
diagram below.
|
diagram below.
|
||||||
|
|
||||||
Current format log vector:
|
Current format log vector::
|
||||||
|
|
||||||
Object +---------------------------------------------+
|
Object +---------------------------------------------+
|
||||||
Vector 1 +----+
|
Vector 1 +----+
|
||||||
Vector 2 +----+
|
Vector 2 +----+
|
||||||
Vector 3 +----------+
|
Vector 3 +----------+
|
||||||
|
|
||||||
After formatting:
|
After formatting::
|
||||||
|
|
||||||
Log Buffer +-V1-+-V2-+----V3----+
|
Log Buffer +-V1-+-V2-+----V3----+
|
||||||
|
|
||||||
Delayed logging vector:
|
Delayed logging vector::
|
||||||
|
|
||||||
Object +---------------------------------------------+
|
Object +---------------------------------------------+
|
||||||
Vector 1 +----+
|
Vector 1 +----+
|
||||||
Vector 2 +----+
|
Vector 2 +----+
|
||||||
Vector 3 +----------+
|
Vector 3 +----------+
|
||||||
|
|
||||||
After formatting:
|
After formatting::
|
||||||
|
|
||||||
Memory Buffer +-V1-+-V2-+----V3----+
|
Memory Buffer +-V1-+-V2-+----V3----+
|
||||||
Vector 1 +----+
|
Vector 1 +----+
|
||||||
Vector 2 +----+
|
Vector 2 +----+
|
||||||
Vector 3 +----------+
|
Vector 3 +----------+
|
||||||
|
|
||||||
The memory buffer and associated vector need to be passed as a single object,
|
The memory buffer and associated vector need to be passed as a single object,
|
||||||
but still need to be associated with the parent object so if the object is
|
but still need to be associated with the parent object so if the object is
|
||||||
|
@ -242,6 +246,7 @@ relogged in memory.
|
||||||
|
|
||||||
|
|
||||||
Tracking Changes
|
Tracking Changes
|
||||||
|
----------------
|
||||||
|
|
||||||
Now that we can record transactional changes in memory in a form that allows
|
Now that we can record transactional changes in memory in a form that allows
|
||||||
them to be used without limitations, we need to be able to track and accumulate
|
them to be used without limitations, we need to be able to track and accumulate
|
||||||
|
@ -278,6 +283,7 @@ done for convenience/sanity of the developers.
|
||||||
|
|
||||||
|
|
||||||
Delayed Logging: Checkpoints
|
Delayed Logging: Checkpoints
|
||||||
|
----------------------------
|
||||||
|
|
||||||
When we have a log synchronisation event, commonly known as a "log force",
|
When we have a log synchronisation event, commonly known as a "log force",
|
||||||
all the items in the CIL must be written into the log via the log buffers.
|
all the items in the CIL must be written into the log via the log buffers.
|
||||||
|
@ -341,7 +347,7 @@ Hence log vectors need to be able to be chained together to allow them to be
|
||||||
detached from the log items. That is, when the CIL is flushed the memory
|
detached from the log items. That is, when the CIL is flushed the memory
|
||||||
buffer and log vector attached to each log item needs to be attached to the
|
buffer and log vector attached to each log item needs to be attached to the
|
||||||
checkpoint context so that the log item can be released. In diagrammatic form,
|
checkpoint context so that the log item can be released. In diagrammatic form,
|
||||||
the CIL would look like this before the flush:
|
the CIL would look like this before the flush::
|
||||||
|
|
||||||
CIL Head
|
CIL Head
|
||||||
|
|
|
|
||||||
|
@ -362,7 +368,7 @@ the CIL would look like this before the flush:
|
||||||
-> vector array
|
-> vector array
|
||||||
|
|
||||||
And after the flush the CIL head is empty, and the checkpoint context log
|
And after the flush the CIL head is empty, and the checkpoint context log
|
||||||
vector list would look like:
|
vector list would look like::
|
||||||
|
|
||||||
Checkpoint Context
|
Checkpoint Context
|
||||||
|
|
|
|
||||||
|
@ -411,6 +417,7 @@ compare" situation that can be done after a working and reviewed implementation
|
||||||
is in the dev tree....
|
is in the dev tree....
|
||||||
|
|
||||||
Delayed Logging: Checkpoint Sequencing
|
Delayed Logging: Checkpoint Sequencing
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
One of the key aspects of the XFS transaction subsystem is that it tags
|
One of the key aspects of the XFS transaction subsystem is that it tags
|
||||||
committed transactions with the log sequence number of the transaction commit.
|
committed transactions with the log sequence number of the transaction commit.
|
||||||
|
@ -474,6 +481,7 @@ force the log at the LSN of that transaction) and so the higher level code
|
||||||
behaves the same regardless of whether delayed logging is being used or not.
|
behaves the same regardless of whether delayed logging is being used or not.
|
||||||
|
|
||||||
Delayed Logging: Checkpoint Log Space Accounting
|
Delayed Logging: Checkpoint Log Space Accounting
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
The big issue for a checkpoint transaction is the log space reservation for the
|
The big issue for a checkpoint transaction is the log space reservation for the
|
||||||
transaction. We don't know how big a checkpoint transaction is going to be
|
transaction. We don't know how big a checkpoint transaction is going to be
|
||||||
|
@ -491,7 +499,7 @@ the size of the transaction and the number of regions being logged (the number
|
||||||
of log vectors in the transaction).
|
of log vectors in the transaction).
|
||||||
|
|
||||||
An example of the differences would be logging directory changes versus logging
|
An example of the differences would be logging directory changes versus logging
|
||||||
inode changes. If you modify lots of inode cores (e.g. chmod -R g+w *), then
|
inode changes. If you modify lots of inode cores (e.g. ``chmod -R g+w *``), then
|
||||||
there are lots of transactions that only contain an inode core and an inode log
|
there are lots of transactions that only contain an inode core and an inode log
|
||||||
format structure. That is, two vectors totaling roughly 150 bytes. If we modify
|
format structure. That is, two vectors totaling roughly 150 bytes. If we modify
|
||||||
10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
|
10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
|
||||||
|
@ -565,6 +573,7 @@ which is once every 30s.
|
||||||
|
|
||||||
|
|
||||||
Delayed Logging: Log Item Pinning
|
Delayed Logging: Log Item Pinning
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
Currently log items are pinned during transaction commit while the items are
|
Currently log items are pinned during transaction commit while the items are
|
||||||
still locked. This happens just after the items are formatted, though it could
|
still locked. This happens just after the items are formatted, though it could
|
||||||
|
@ -605,6 +614,7 @@ object, we have a race with CIL being flushed between the check and the pin
|
||||||
lock to guarantee that we pin the items correctly.
|
lock to guarantee that we pin the items correctly.
|
||||||
|
|
||||||
Delayed Logging: Concurrent Scalability
|
Delayed Logging: Concurrent Scalability
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
A fundamental requirement for the CIL is that accesses through transaction
|
A fundamental requirement for the CIL is that accesses through transaction
|
||||||
commits must scale to many concurrent commits. The current transaction commit
|
commits must scale to many concurrent commits. The current transaction commit
|
||||||
|
@ -683,8 +693,9 @@ woken by the wrong event.
|
||||||
|
|
||||||
|
|
||||||
Lifecycle Changes
|
Lifecycle Changes
|
||||||
|
-----------------
|
||||||
|
|
||||||
The existing log item life cycle is as follows:
|
The existing log item life cycle is as follows::
|
||||||
|
|
||||||
1. Transaction allocate
|
1. Transaction allocate
|
||||||
2. Transaction reserve
|
2. Transaction reserve
|
||||||
|
@ -729,7 +740,7 @@ at the same time. If the log item is in the AIL or between steps 6 and 7
|
||||||
and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
|
and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
|
||||||
are entered and completed is the object considered clean.
|
are entered and completed is the object considered clean.
|
||||||
|
|
||||||
With delayed logging, there are new steps inserted into the life cycle:
|
With delayed logging, there are new steps inserted into the life cycle::
|
||||||
|
|
||||||
1. Transaction allocate
|
1. Transaction allocate
|
||||||
2. Transaction reserve
|
2. Transaction reserve
|
|
@ -18533,7 +18533,7 @@ W: http://xfs.org/
|
||||||
T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
|
T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
|
||||||
F: Documentation/ABI/testing/sysfs-fs-xfs
|
F: Documentation/ABI/testing/sysfs-fs-xfs
|
||||||
F: Documentation/admin-guide/xfs.rst
|
F: Documentation/admin-guide/xfs.rst
|
||||||
F: Documentation/filesystems/xfs-delayed-logging-design.txt
|
F: Documentation/filesystems/xfs-delayed-logging-design.rst
|
||||||
F: Documentation/filesystems/xfs-self-describing-metadata.txt
|
F: Documentation/filesystems/xfs-self-describing-metadata.txt
|
||||||
F: fs/xfs/
|
F: fs/xfs/
|
||||||
F: include/uapi/linux/dqblk_xfs.h
|
F: include/uapi/linux/dqblk_xfs.h
|
||||||
|
|
Loading…
Reference in a new issue