ext4: Orphan file documentation

Add documentation about the orphan file feature.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210816095713.16537-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This commit is contained in:
Jan Kara 2021-08-16 11:57:07 +02:00 committed by Theodore Ts'o
parent 02f310fcf4
commit 3a6541e97c
5 changed files with 89 additions and 6 deletions

View file

@ -11,3 +11,4 @@ have static metadata at fixed locations.
.. include:: bitmaps.rst
.. include:: mmp.rst
.. include:: journal.rst
.. include:: orphan.rst

View file

@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
modification time (mtime), and deletion time (dtime). The four fields
are 32-bit signed integers that represent seconds since the Unix epoch
(1970-01-01 00:00:00 GMT), which means that the fields will overflow in
January 2038. For inodes that are not linked from any directory but are
still open (orphan inodes), the dtime field is overloaded for use with
the orphan list. The superblock field ``s_last_orphan`` points to the
first inode in the orphan list; dtime is then the number of the next
orphaned inode, or zero if there are no more orphans.
January 2038. If the filesystem does not have orphan_file feature, inodes
that are not linked from any directory but are still open (orphan inodes) have
the dtime field overloaded for use with the orphan list. The superblock field
``s_last_orphan`` points to the first inode in the orphan list; dtime is then
the number of the next orphaned inode, or zero if there are no more orphans.
If the inode structure size ``sb->s_inode_size`` is larger than 128
bytes and the ``i_inode_extra`` field is large enough to encompass the

View file

@ -0,0 +1,52 @@
.. SPDX-License-Identifier: GPL-2.0
Orphan file
-----------
In unix there can inodes that are unlinked from directory hierarchy but that
are still alive because they are open. In case of crash the filesystem has to
clean up these inodes as otherwise they (and the blocks referenced from them)
would leak. Similarly if we truncate or extend the file, we need not be able
to perform the operation in a single journalling transaction. In such case we
track the inode as orphan so that in case of crash extra blocks allocated to
the file get truncated.
Traditionally ext4 tracks orphan inodes in a form of single linked list where
superblock contains the inode number of the last orphan inode (s\_last\_orphan
field) and then each inode contains inode number of the previously orphaned
inode (we overload i\_dtime inode field for this). However this filesystem
global single linked list is a scalability bottleneck for workloads that result
in heavy creation of orphan inodes. When orphan file feature
(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
(referenced from the superblock through s\_orphan_file_inum) with several
blocks. Each of these blocks has a structure:
.. list-table::
:widths: 8 8 24 40
:header-rows: 1
* - Offset
- Type
- Name
- Description
* - 0x0
- Array of \_\_le32 entries
- Orphan inode entries
- Each \_\_le32 entry is either empty (0) or it contains inode number of
an orphan inode.
* - blocksize - 8
- \_\_le32
- ob\_magic
- Magic value stored in orphan block tail (0x0b10ca04)
* - blocksize - 4
- \_\_le32
- ob\_checksum
- Checksum of the orphan block.
When a filesystem with orphan file feature is writeably mounted, we set
RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
be valid orphan entries. In case we see this feature when mounting the
filesystem, we read the whole orphan file and process all orphan inodes found
there as usual. When cleanly unmounting the filesystem we remove the
RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
file and also make the filesystem fully compatible with older kernels.

View file

@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
* - 11
- Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
Note that there are also some inodes allocated from non-reserved inode numbers
for other filesystem features which are not referenced from standard directory
hierarchy. These are generally reference from the superblock. They are:
.. list-table::
:widths: 20 50
:header-rows: 1
* - Superblock field
- Description
* - s\_lpf\_ino
- Inode number of lost+found directory.
* - s\_prj\_quota\_inum
- Inode number of quota file tracking project quotas
* - s\_orphan\_file\_inum
- Inode number of file tracking orphan inodes.

View file

@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
- Filename charset encoding flags.
* - 0x280
- \_\_le32
- s\_reserved[95]
- s\_orphan\_file\_inum
- Orphan file inode number.
* - 0x284
- \_\_le32
- s\_reserved[94]
- Padding to the end of the block.
* - 0x3FC
- \_\_le32
@ -603,6 +607,11 @@ following:
the journal, JBD2 incompat feature
(JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
set (COMPAT\_FAST\_COMMIT).
* - 0x1000
- Orphan file allocated. This is the special file for more efficient
tracking of unlinked but still open inodes. When there may be any
entries in the file, we additionally set proper rocompat feature
(RO\_COMPAT\_ORPHAN\_PRESENT).
.. _super_incompat:
@ -713,6 +722,10 @@ the following:
- Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT)
* - 0x8000
- Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY)
* - 0x10000
- Indicates orphan file may have valid orphan entries and thus we need
to clean them up when mounting the filesystem
(RO\_COMPAT\_ORPHAN\_PRESENT).
.. _super_def_hash: