Commit graph

4073 commits

Author SHA1 Message Date
Theodore Ts'o
a54c4613da ext4: fix race writing to an inline_data file while its xattrs are changing
The location of the system.data extended attribute can change whenever
xattr_sem is not taken.  So we need to recalculate the i_inline_off
field since it mgiht have changed between ext4_write_begin() and
ext4_write_end().

This means that caching i_inline_off is probably not helpful, so in
the long run we should probably get rid of it and shrink the in-memory
ext4 inode slightly, but let's fix the race the simple way for now.

Cc: stable@kernel.org
Fixes: f19d5870cb ("ext4: add normal write support for inline data")
Reported-by: syzbot+13146364637c7363a7de@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-30 23:36:50 -04:00
Theodore Ts'o
a5fda11338 ext4: fix sparse warnings
Add sparse annotations to suppress false positive context imbalance
warnings, and use NULL instead of 0 in EXT_MAX_{EXTENT,INDEX}.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-30 23:36:50 -04:00
Jan Kara
bd2c38cf17 ext4: Make sure quota files are not grabbed accidentally
If ext4 filesystem is corrupted so that quota files are linked from
directory hirerarchy, bad things can happen. E.g. quota files can get
corrupted or deleted. Make sure we are not grabbing quota file inodes
when we expect normal inodes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210812133122.26360-1-jack@suse.cz
2021-08-30 23:36:50 -04:00
Jan Kara
b2bbb92f70 ext4: fix e2fsprogs checksum failure for mounted filesystem
Commit 81414b4dd4 ("ext4: remove redundant sb checksum
recomputation") removed checksum recalculation after updating
superblock free space / inode counters in ext4_fill_super() based on
the fact that we will recalculate the checksum on superblock
writeout.

That is correct assumption but until the writeout happens (which can
take a long time) the checksum is incorrect in the buffer cache and if
programs such as tune2fs or resize2fs is called shortly after a file
system is mounted can fail.  So return back the checksum recalculation
and add a comment explaining why.

Fixes: 81414b4dd4 ("ext4: remove redundant sb checksum recomputation")
Cc: stable@kernel.org
Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210812124737.21981-1-jack@suse.cz
2021-08-30 23:36:50 -04:00
Theodore Ts'o
308c57ccf4 ext4: if zeroout fails fall back to splitting the extent node
If the underlying storage device is using thin-provisioning, it's
possible for a zeroout operation to return ENOSPC.

Commit df22291ff0 ("ext4: Retry block allocation if we have free blocks
left") added logic to retry block allocation since we might get free block
after we commit a transaction. But the ENOSPC from thin-provisioning
will confuse ext4, and lead to an infinite loop.

Since using zeroout instead of splitting the extent node is an
optimization, if it fails, we might as well fall back to splitting the
extent node.

Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-30 23:36:50 -04:00
Guoqing Jiang
facec450a8 ext4: reduce arguments of ext4_fc_add_dentry_tlv
Let's pass fc_dentry directly since those arguments (tag, parent_ino and
ino etc) can be deferenced from it.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210727080708.3708814-1-guoqing.jiang@linux.dev
2021-08-30 23:36:50 -04:00
Wang Jianchao
5036ab8df2 ext4: flush background discard kwork when retry allocation
The background discard kwork tries to mark blocks used and issue
discard. This can make filesystem suffer from NOSPC error, xfstest
generic/371 can fail due to it. Fix it by flushing discard kwork
in ext4_should_retry_alloc. At the same time, give up discard at
the moment.

Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Link: https://lore.kernel.org/r/20210830075246.12516-6-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-30 23:35:53 -04:00
Wang Jianchao
55cdd0af2b ext4: get discard out of jbd2 commit kthread contex
Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.

One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.

This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
 - mark blocks to be discarded as used if they have not been allocated
 - do discard
 - mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.

Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Link: https://lore.kernel.org/r/20210830075246.12516-5-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-30 23:34:52 -04:00
Wang Jianchao
b6f5558c30 ext4: remove the repeated comment of ext4_trim_all_free
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-4-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-12 13:11:26 -04:00
Wang Jianchao
6920b39132 ext4: add new helper interface ext4_try_to_trim_range()
There is no functional change in this patch but just split the
codes, which serachs free block and does trim, into a new function
ext4_try_to_trim_range. This is preparing for the following async
backgroup discard.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-3-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-12 13:11:25 -04:00
Wang Jianchao
bd2eea8d0a ext4: remove the 'group' parameter of ext4_trim_extent
Get rid of the 'group' parameter of ext4_trim_extent as we can get
it from the 'e4b'.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-2-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-12 13:11:25 -04:00
Theodore Ts'o
877ba3f729 ext4: fix potential htree corruption when growing large_dir directories
Commit b5776e7524 ("ext4: fix potential htree index checksum
corruption) removed a required restart when multiple levels of index
nodes need to be split.  Fix this to avoid directory htree corruptions
when using the large_dir feature.

Cc: stable@kernel.org # v5.11
Cc: Благодаренко Артём <artem.blagodarenko@gmail.com>
Fixes: b5776e7524 ("ext4: fix potential htree index checksum corruption)
Reported-by: Denis <denis@voxelsoft.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-06 13:00:49 -04:00
Guoqing Jiang
73dc707161 ext4: remove conflicting comment from __ext4_forget
We do a bforget and return for no journal case, so let's remove this
conflict comment.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Link: https://lore.kernel.org/r/20210714055940.1553705-1-guoqing.jiang@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-23 07:33:10 -04:00
Ye Bin
b665414228 ext4: fix potential uninitialized access to retval in kmmpd
if (!ext4_has_feature_mmp(sb)) then retval can be unitialized before
we jump to the wait_to_exit label.

Fixes: 61bb4a1c41 ("ext4: fix possible UAF when remounting r/o a mmp-protected file system")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20210713022728.2533770-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-23 07:31:29 -04:00
Linus Torvalds
e49d68ce7c Ext4 regression and bug fixes for v5.14-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmDnU2AACgkQ8vlZVpUN
 gaOBIAgApIAIeGbppf7aFjRN4h4wxfRpr7w6lux3GVmz7D+6djRi21X5dT5xq01m
 u6DkLAcKrCATIidyP6qHlvBbxxcPt2PX1FcQbruj9WcnSng1Ngl7RW8BEqp/eIRo
 Nb7MY0pg8HIJVMEniWQcdEjFWKDL3ksWR9+X3V3nhSzp+0kXFF1ySjk+TWi/ZGSn
 T/Q1sEyeUOiVfV75cIW5JbKoJEgvCvrclFvGJLYVcIAYeqJfQKQ0+tlkhDeYnWfQ
 nZgh1UU350bO629LGIhbRAkLbAloEb0d57mOQCrATo0JFrAZ52+0ZCkrTXtIyoOF
 TUILVf3zsqgdO8HLDkbH1G+lGn9WOA==
 =qU+W
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Ext4 regression and bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: inline jbd2_journal_[un]register_shrinker()
  ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
  ext4: fix possible UAF when remounting r/o a mmp-protected file system
  ext4: use ext4_grp_locked_error in mb_find_extent
  ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
  Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
2021-07-09 09:57:27 -07:00
Theodore Ts'o
0705e8d1e2 ext4: inline jbd2_journal_[un]register_shrinker()
The function jbd2_journal_unregister_shrinker() was getting called
twice when the file system was getting unmounted.  On Power and ARM
platforms this was causing kernel crash when unmounting the file
system, when a percpu_counter was destroyed twice.

Fix this by removing jbd2_journal_[un]register_shrinker() functions,
and inlining the shrinker setup and teardown into
journal_init_common() and jbd2_journal_destroy().  This means that
ext4 and ocfs2 now no longer need to know about registering and
unregistering jbd2's shrinker.

Also, while we're at it, rename the percpu counter from
j_jh_shrink_count to j_checkpoint_jh_count, since this makes it
clearer what this counter is intended to track.

Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu
Fixes: 4ba3fcdde7 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-08 08:37:31 -04:00
Theodore Ts'o
0955901908 ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
Use the correct bitmask when checking for any not-yet-supported flags.

Link: https://lore.kernel.org/r/20210702173425.1276158-1-tytso@mit.edu
Fixes: 351a0a3fbc ("ext4: add ioctl EXT4_IOC_CHECKPOINT")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Leah Rumancik <leah.rumancik@gmail.com>
2021-07-08 08:37:31 -04:00
Theodore Ts'o
61bb4a1c41 ext4: fix possible UAF when remounting r/o a mmp-protected file system
After commit 618f003199 ("ext4: fix memory leak in
ext4_fill_super"), after the file system is remounted read-only, there
is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to
point at freed memory, which the call to ext4_stop_mmpd() can trip
over.

Fix this by only allowing kmmpd() to exit when it is stopped via
ext4_stop_mmpd().

Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu
Reported-by: Ye Bin <yebin10@huawei.com>
Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2021-07-08 08:36:09 -04:00
Stephen Brennan
cd84bbbac1 ext4: use ext4_grp_locked_error in mb_find_extent
Commit 5d1b1b3f49 ("ext4: fix BUG when calling ext4_error with locked
block group") introduces ext4_grp_locked_error to handle unlocking a
group in error cases. Otherwise, there is a possibility of a sleep while
atomic. However, since 43c73221b3 ("ext4: replace BUG_ON with WARN_ON
in mb_find_extent()"), mb_find_extent() has contained a ext4_error()
call while a group spinlock is held. Replace this with
ext4_grp_locked_error.

Fixes: 43c73221b3 ("ext4: replace BUG_ON with WARN_ON in mb_find_extent()")
Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20210623232114.34457-1-stephen.s.brennan@oracle.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-01 17:14:07 -04:00
Ye Bin
558d6450c7 ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
If a writeback of the superblock fails with an I/O error, the buffer
is marked not uptodate.  However, this can cause a WARN_ON to trigger
when we attempt to write superblock a second time.  (Which might
succeed this time, for cerrtain types of block devices such as iSCSI
devices over a flaky network.)

Try to detect this case in flush_stashed_error_work(), and also change
__ext4_handle_dirty_metadata() so we always set the uptodate flag, not
just in the nojournal case.

Before this commit, this problem can be repliciated via:

1. dmsetup  create dust1 --table  '0 2097152 dust /dev/sdc 0 4096'
2. mount  /dev/mapper/dust1  /home/test
3. dmsetup message dust1 0 addbadblock 0 10
4. cd /home/test
5. echo "XXXXXXX" > t

After a few seconds, we got following warning:

[   80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0
[   80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write
[   85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata
[   91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0
[   91.417038] ------------[ cut here ]------------
[   91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e
[   91.440322] Call Trace:
[   91.440652]  __jbd2_journal_temp_unlink_buffer+0x135/0x220
[   91.441354]  __jbd2_journal_unfile_buffer+0x24/0x90
[   91.441981]  __jbd2_journal_refile_buffer+0x134/0x1d0
[   91.442628]  jbd2_journal_commit_transaction+0x249a/0x3240
[   91.443336]  ? put_prev_entity+0x2a/0x200
[   91.443856]  ? kjournald2+0x12e/0x510
[   91.444324]  kjournald2+0x12e/0x510
[   91.444773]  ? woken_wake_function+0x30/0x30
[   91.445326]  kthread+0x150/0x1b0
[   91.445739]  ? commit_timeout+0x20/0x20
[   91.446258]  ? kthread_flush_worker+0xb0/0xb0
[   91.446818]  ret_from_fork+0x1f/0x30
[   91.447293] ---[ end trace 66f0b6bf3d1abade ]---

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-01 10:58:04 -04:00
Linus Torvalds
a6ecc2a491 In addition to bug fixes and cleanups, there are two new features for
ext4 in 5.14:
  - Allow applications to poll on changes to /sys/fs/ext4/*/errors_count
  - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be
    checkpointed, truncated and discarded or zero'ed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmDcjRgACgkQ8vlZVpUN
 gaMAMQgAjRYUQ+tdJVZzInFwukudhgLyuCP9AdCx76fisaH22yNCakQ7M2XGz59i
 /YbJerLaueYpHZzpA9p5+sSjVhMwILO3scBSJbOwdsbrFAsFLzcgQKQhGGqK2KvX
 IAOEArC8/hm1wnVb7sfQYdBHlWyeJpI8hd/8WZPlYtySlRnP1TZCd+X7y7lmNs1H
 QU1KECwstI2t8Lug0QeKx2B9PI9AWcCs0lTJ4LfcANZAh3HIJi9aUCk4SFDRkf3/
 8AazvMqTHJD9yc+BNyZOro2ykDFCStkNqf0cDYTzvKrr66CHScPUtyI0oAEdspxN
 +SNNARPGZgNOuR3ZRbGivtwgEB+GpQ==
 =jSd4
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "In addition to bug fixes and cleanups, there are two new features for
  ext4 in 5.14:

   - Allow applications to poll on changes to
     /sys/fs/ext4/*/errors_count

   - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be
     checkpointed, truncated and discarded or zero'ed"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
  jbd2: export jbd2_journal_[un]register_shrinker()
  ext4: notify sysfs on errors_count value change
  fs: remove bdev_try_to_free_page callback
  ext4: remove bdev_try_to_free_page() callback
  jbd2: simplify journal_clean_one_cp_list()
  jbd2,ext4: add a shrinker to release checkpointed buffers
  jbd2: remove redundant buffer io error checks
  jbd2: don't abort the journal when freeing buffers
  jbd2: ensure abort the journal if detect IO error when writing original buffer back
  jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
  ext4: no need to verify new add extent block
  jbd2: clean up misleading comments for jbd2_fc_release_bufs
  ext4: add check to prevent attempting to resize an fs with sparse_super2
  ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
  ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
  ext4: fsmap: fix the block/inode bitmap comment
  ext4: fix comment for s_hash_unsigned
  ext4: use local variable ei instead of EXT4_I() macro
  ext4: fix avefreec in find_group_orlov
  ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
  ...
2021-06-30 19:37:39 -07:00
Theodore Ts'o
8813587a99 Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
The function ext4_resize_begin() gets called from three different
places, and online resize for bigalloc file systems is disallowed from
the old-style online resize (EXT4_IOC_GROUP_ADD and
EXT4_IOC_GROUP_EXTEND), but it *is* supposed to be allowed via
EXT4_IOC_RESIZE_FS.

This reverts commit e9f9f61d0c.
2021-06-30 20:54:22 -04:00
Jonathan Davies
d578b99443 ext4: notify sysfs on errors_count value change
After s_error_count is incremented, signal the change in the
corresponding sysfs attribute via sysfs_notify. This allows userspace to
poll() on changes to /sys/fs/ext4/*/errors_count.

[ Moved call of ext4_notify_error_sysfs() to flush_stashed_error_work()
  to avoid BUG's caused by calling sysfs_notify trying to sleep after
  being called from an invalid context. -- TYT ]

Signed-off-by: Jonathan Davies <jonathan.davies@nutanix.com>
Link: https://lore.kernel.org/r/20210611140209.28903-1-jonathan.davies@nutanix.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-29 21:06:02 -04:00
Matthew Wilcox (Oracle)
b82a96c925 fs: remove noop_set_page_dirty()
Use __set_page_dirty_no_writeback() instead.  This will set the dirty bit
on the page, which will be used to avoid calling set_page_dirty() in the
future.  It will have no effect on actually writing the page back, as the
pages are not on any LRU lists.

[akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules]

Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:48 -07:00
Zhang Yi
3b672e3aed ext4: remove bdev_try_to_free_page() callback
After we introduce a jbd2 shrinker to release checkpointed buffer's
journal head, we could free buffer without bdev_try_to_free_page()
under memory pressure. So this patch remove the whole
bdev_try_to_free_page() callback directly. It also remove many
use-after-free issues relate to it together.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-8-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:55:41 -04:00
Zhang Yi
4ba3fcdde7 jbd2,ext4: add a shrinker to release checkpointed buffers
Current metadata buffer release logic in bdev_try_to_free_page() have
a lot of use-after-free issues when umount filesystem concurrently, and
it is difficult to fix directly because ext4 is the only user of
s_op->bdev_try_to_free_page callback and we may have to add more special
refcount or lock that is only used by ext4 into the common vfs layer,
which is unacceptable.

One better solution is remove the bdev_try_to_free_page callback, but
the real problem is we cannot easily release journal_head on the
checkpointed buffer, so try_to_free_buffers() cannot release buffers and
page under memory pressure, which is more likely to trigger
out-of-memory. So we cannot remove the callback directly before we find
another way to release journal_head.

This patch introduce a shrinker to free journal_head on the checkpointed
transaction. After the journal_head got freed, try_to_free_buffers()
could free buffer properly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:54:49 -04:00
yangerkun
0caaefbaf2 ext4: no need to verify new add extent block
ext4_ext_grow_indepth will add a new extent block which has init the
expected content. We can mark this buffer as verified so to stop a
useless check in __read_extent_tree_block.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210609075545.1442160-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:29:13 -04:00
Josh Triplett
b1489186cc ext4: add check to prevent attempting to resize an fs with sparse_super2
The in-kernel ext4 resize code doesn't support filesystem with the
sparse_super2 feature. It fails with errors like this and doesn't finish
the resize:
EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks
EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770)
EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize
EXT4-fs (loop0): resized filesystem to 2097152

To reproduce:
mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M
truncate -s 30G ext4.img
mount ext4.img /mnt
python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)'
dmesg | tail
e2fsck ext4.img

The userspace resize2fs tool has a check for this case: it checks if the
filesystem has sparse_super2 set and if the kernel provides
/sys/fs/ext4/features/sparse_super2. However, the former check requires
manually reading and parsing the filesystem superblock.

Detect this case in ext4_resize_begin and error out early with a clear
error message.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Josh Triplett
e9f9f61d0c ext4: consolidate checks for resize of bigalloc into ext4_resize_begin
Two different places checked for attempts to resize a filesystem with
the bigalloc feature. Move the check into ext4_resize_begin, which both
places already call.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Link: https://lore.kernel.org/r/bee03303d999225ecb3bfa5be8576b2f4c6edbe6.1623093259.git.josh@joshtriplett.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:22:36 -04:00
Ritesh Harjani
310c097c2b ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
ext4_xattr_ibody_inline_set() & ext4_xattr_ibody_set() have the exact
same definition.  Hence remove ext4_xattr_ibody_inline_set() and all
its call references. Convert the callers of it to call
ext4_xattr_ibody_set() instead.

[ Modified to preserve ext4_xattr_ibody_set() and remove
  ext4_xattr_ibody_inline_set() instead. -- TYT ]

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/fd566b799bbbbe9b668eb5eecde5b5e319e3694f.1622685482.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 10:09:39 -04:00
Ritesh Harjani
ee00d6b3c7 ext4: fsmap: fix the block/inode bitmap comment
While debugging fstest ext4/027 failure, found below comment to be wrong and
confusing. Hence fix it while we are at it.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/e79134132db7ea42f15747b5c669ee91cc1aacdf.1622432690.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 09:48:29 -04:00
Eric Biggers
6d2424a845 ext4: fix comment for s_hash_unsigned
Fix the comment for s_hash_unsigned to not be the opposite of what it
actually is.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20210527235557.2377525-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-24 09:42:41 -04:00
chenyichong
f9505c72b2 ext4: use local variable ei instead of EXT4_I() macro
Signed-off-by: chenyichong <chenyichong@uniontech.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20210526052930.11278-1-chenyichong@uniontech.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:17 -04:00
Pan Dong
c89849cc02 ext4: fix avefreec in find_group_orlov
The avefreec should be average free clusters instead
of average free blocks, otherwize Orlov's allocator
will not work properly when bigalloc enabled.

Cc: stable@kernel.org
Signed-off-by: Pan Dong <pandong.peter@bytedance.com>
Link: https://lore.kernel.org/r/20210525073656.31594-1-pandong.peter@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:17 -04:00
Zhang Yi
4fb7c70a88 ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
The cache_cnt parameter of tracepoint ext4_es_shrink_exit means the
remaining cache count after shrink, but now it is the cache count before
shrink, fix it by read sbi->s_extent_cache_cnt again.

Fixes: 1ab6c4997e ("fs: convert fs shrinkers to new scan/count API")
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210522103045.690103-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:17 -04:00
Zhang Yi
e5e7010e54 ext4: remove check for zero nr_to_scan in ext4_es_scan()
After converting fs shrinkers to new scan/count API, we are no longer
pass zero nr_to_scan parameter to detect the number of objects to free,
just remove this check.

Fixes: 1ab6c4997e ("fs: convert fs shrinkers to new scan/count API")
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210522103045.690103-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:17 -04:00
Tian Tao
b2d2e75735 ext4: remove set but rewrite variables
In the ext4_dx_add_entry function, the at variable is assigned but will
reset just after “again:” label. So delete the unnecessary assignment.
this will not chang the logic.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Link: https://lore.kernel.org/r/1621493752-36890-1-git-send-email-tiantao6@hisilicon.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:17 -04:00
Leah Rumancik
351a0a3fbc ext4: add ioctl EXT4_IOC_CHECKPOINT
ioctl EXT4_IOC_CHECKPOINT checkpoints and flushes the journal. This
includes forcing all the transactions to the log, checkpointing the
transactions, and flushing the log to disk. This ioctl takes u32 "flags"
as an argument. Three flags are supported. EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN
can be used to verify input to the ioctl. It returns error if there is any
invalid input, otherwise it returns success without performing
any checkpointing. The other two flags, EXT4_IOC_CHECKPOINT_FLAG_DISCARD
and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT, can be used to issue requests to
discard or zeroout the journal logs blocks, respectively. At this
point, EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT is primarily added to enable
testing of this codepath on devices that don't support discard.
EXT4_IOC_CHECKPOINT_FLAG_DISCARD and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT
cannot both be set.

Systems that wish to achieve content deletion SLO can set up a daemon
that calls this ioctl at a regular interval such that it matches with the
SLO requirement. Thus, with this patch, the ext4_dir_entry2 wipeout
patch[1], and the Ext4 "-o discard" mount option set, Ext4 can now
guarantee that all file contents, file metatdata, and filenames will not
be accessible through the filesystem and will have had discard or
zeroout requests issued for corresponding device blocks.

The __jbd2_journal_erase function could also be used to discard or
zero-fill the journal during journal load after recovery. This would
provide a potential solution to a journal replay bug reported earlier this
year[2]. After a successful journal recovery, e2fsck can call this ioctl to
discard the journal as well.

[1] https://lore.kernel.org/linux-ext4/YIHknqxngB1sUdie@mit.edu/
[2] https://lore.kernel.org/linux-ext4/YDZoaacIYStFQT8g@mit.edu/

Link: https://lore.kernel.org/r/20210518151327.130198-2-leah.rumancik@gmail.com
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 21:34:08 -04:00
Leah Rumancik
01d5d96542 ext4: add discard/zeroout flags to journal flush
Add a flags argument to jbd2_journal_flush to enable discarding or
zero-filling the journal blocks while flushing the journal.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-22 19:27:10 -04:00
Yang Yingliang
8f6840c4fd ext4: return error code when ext4_fill_flex_info() fails
After commit c89128a008 ("ext4: handle errors on
ext4_commit_super"), 'ret' may be set to 0 before calling
ext4_fill_flex_info(), if ext4_fill_flex_info() fails ext4_mount()
doesn't return error code, it makes 'root' is null which causes crash
in legacy_get_tree().

Fixes: c89128a008 ("ext4: handle errors on ext4_commit_super")
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20210510111051.55650-1-yangyingliang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:20 -04:00
Zhang Yi
b9a037b7f3 ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
In ext4_orphan_cleanup(), if ext4_truncate() failed to get a transaction
handle, it didn't remove the inode from the in-core orphan list, which
may probably trigger below error dump in ext4_destroy_inode() during the
final iput() and could lead to memory corruption on the later orphan
list changes.

 EXT4-fs (sda): Inode 6291467 (00000000b8247c67): orphan list check failed!
 00000000b8247c67: 0001f30a 00000004 00000000 00000023  ............#...
 00000000e24cde71: 00000006 014082a3 00000000 00000000  ......@.........
 0000000072c6a5ee: 00000000 00000000 00000000 00000000  ................
 ...

This patch fix this by cleanup in-core orphan list manually if
ext4_truncate() return error.

Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210507071904.160808-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Anirudh Rayabharam
ce3aba4359 ext4: fix kernel infoleak via ext4_extent_header
Initialize eh_generation of struct ext4_extent_header to prevent leaking
info to userspace. Fixes KMSAN kernel-infoleak bug reported by syzbot at:
http://syzkaller.appspot.com/bug?id=78e9ad0e6952a3ca16e8234724b2fa92d041b9b8

Cc: stable@kernel.org
Reported-by: syzbot+2dcfeaf8cb49b05e8f1a@syzkaller.appspotmail.com
Fixes: a86c618126 ("[PATCH] ext3: add extent map support")
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20210506185655.7118-1-mail@anirudhrb.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Pavel Skripkin
618f003199 ext4: fix memory leak in ext4_fill_super
static int kthread(void *_create) will return -ENOMEM
or -EINTR in case of internal failure or
kthread_stop() call happens before threadfn call.

To prevent fancy error checking and make code
more straightforward we moved all cleanup code out
of kmmpd threadfn.

Also, dropped struct mmpd_data at all. Now struct super_block
is a threadfn data and struct buffer_head embedded into
struct ext4_sb_info.

Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20210430185046.15742-1-paskripkin@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Jiapeng Chong
1fc57ca5a2 ext4: remove redundant assignment to error
Variable error is set to zero but this value is never read as it's not
used later on, hence it is a redundant assignment and can be removed.

Cleans up the following clang-analyzer warning:

fs/ext4/ioctl.c:657:3: warning: Value stored to 'error' is never read
[clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619691409-83160-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Joseph Qi
5c680150d7 ext4: remove redundant check buffer_uptodate()
Now set_buffer_uptodate() will test first and then set, so we don't have
to check buffer_uptodate() first, remove it to simplify code.

Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619418587-5580-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Jan Kara
d0b040f5f2 ext4: fix overflow in ext4_iomap_alloc()
A code in iomap alloc may overflow block number when converting it to
byte offset. Luckily this is mostly harmless as we will just use more
expensive method of writing using unwritten extents even though we are
writing beyond i_size.

Cc: stable@kernel.org
Fixes: 378f32bab3 ("ext4: introduce direct I/O write using iomap infrastructure")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210412102333.2676-4-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-17 10:53:19 -04:00
Linus Torvalds
20e41d9bc8 Miscellaneous ext4 bug fixes for v5.13
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmC82AQACgkQ8vlZVpUN
 gaOkAgf+KH57P/P0sB6aVBHpAzqa9jTKJWMA5kpCqYUDkYlfF7n2hwsjMzWpJ5MY
 ZvFpKAflmRnve/ULUZQX6+zrcbieNs3e+6VFZrZ0PmxN0dupyISLY7jnvCRDleA7
 BFO34AcH+QEst9zXJmgta9eoy3LA8sawhQ/d7ujVY+IRFk40m26fuAMiaGznlQJ5
 dmrx7pHZWKFIDFIg2TdFlP+Voqbxs2VTT16gmWpGBdTyWYHKjbSOLKJFc9DwYeE9
 aANf6iIzwXz7y9pZiOnTrGuKDEJcIZNESkbIqw62YgqsoObLbsbCZNmNcqxyHpYQ
 Mh3L59KtmjANW3iOxQfyxkNTugxchw==
 =BSnf
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
  ext4: fix no-key deletion for encrypt+casefold
  ext4: fix memory leak in ext4_fill_super
  ext4: fix fast commit alignment issues
  ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
  ext4: fix accessing uninit percpu counter variable with fast_commit
  ext4: fix memory leak in ext4_mb_init_backend on error path.
2021-06-06 14:24:13 -07:00
Daniel Rosenberg
e71f99f2df ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
Encrypted casefolding is only supported when both encryption and
casefolding are both enabled in the config.

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Daniel Rosenberg
63e7f12893 ext4: fix no-key deletion for encrypt+casefold
commit 471fbbea7f ("ext4: handle casefolding with encryption") is
missing a few checks for the encryption key which are needed to
support deleting enrypted casefolded files when the key is not
present.

This bug made it impossible to delete encrypted+casefolded directories
without the encryption key, due to errors like:

    W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key

Repro steps in kvm-xfstests test appliance:
      mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
      mount /vdc
      mkdir /vdc/dir
      chattr +F /vdc/dir
      keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
      xfs_io -c "set_encpolicy $keyid" /vdc/dir
      for i in `seq 1 100`; do
          mkdir /vdc/dir/$i
      done
      xfs_io -c "rm_enckey $keyid" /vdc
      rm -rf /vdc/dir # fails with the bug

Fixes: 471fbbea7f ("ext4: handle casefolding with encryption")
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06 10:10:23 -04:00
Alexey Makhalov
afd09b617d ext4: fix memory leak in ext4_fill_super
Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733ce9 ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec11 ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2021-06-06 10:10:23 -04:00