Commit graph

81800 commits

Author SHA1 Message Date
Joel Granados
b8cbc0855a sysctl: Remove register_sysctl_table
This is part of the general push to deprecate register_sysctl_paths and
register_sysctl_table. After removing all the calling functions, we
remove both the register_sysctl_table function and the documentation
check that appeared in check-sysctl-docs awk script.

We save 595 bytes with this change:

./scripts/bloat-o-meter vmlinux.1.refactor-base-paths vmlinux.2.remove-sysctl-table
add/remove: 2/8 grow/shrink: 1/0 up/down: 1154/-1749 (-595)
Function                                     old     new   delta
count_subheaders                               -     983    +983
unregister_sysctl_table                       29     184    +155
__pfx_count_subheaders                         -      16     +16
__pfx_unregister_sysctl_table.part            16       -     -16
__pfx_register_leaf_sysctl_tables.constprop   16       -     -16
__pfx_count_subheaders.part                   16       -     -16
__pfx___register_sysctl_base                  16       -     -16
unregister_sysctl_table.part                 136       -    -136
__register_sysctl_base                       478       -    -478
register_leaf_sysctl_tables.constprop        524       -    -524
count_subheaders.part                        547       -    -547
Total: Before=21257652, After=21257057, chg -0.00%

[mcgrof: remove register_leaf_sysctl_tables and append_path too and
 add bloat-o-meter stats]

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
2023-05-23 21:43:26 -07:00
Joel Granados
2f5edd03ca sysctl: Refactor base paths registrations
This is part of the general push to deprecate register_sysctl_paths and
register_sysctl_table. The old way of doing this through
register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a
call to register_sysctl_init. The 5 base paths affected are: "kernel",
"vm", "debug", "dev" and "fs".

We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE
macro since they are no longer needed.

In order to quickly acertain that the paths did not actually change I
executed `find /proc/sys/ | sha1sum` and made sure that the sha was the
same before and after the commit.

We end up saving 563 bytes with this change:

./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths
add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563)
Function                                     old     new   delta
sysctl_init_bases                             55     111     +56
init_fs_sysctls                               12      33     +21
vm_base_table                                128       -    -128
kernel_base_table                            128       -    -128
fs_base_table                                128       -    -128
dev_base_table                               128       -    -128
debug_base_table                             128       -    -128
Total: Before=21258215, After=21257652, chg -0.00%

[mcgrof: modified to use register_sysctl_init() over register_sysctl()
 and add bloat-o-meter stats]

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Christian Brauner <brauner@kernel.org>
2023-05-23 21:43:26 -07:00
Joel Granados
19c4e618a1 sysctl: stop exporting register_sysctl_table
We make register_sysctl_table static because the only function calling
it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it
from the sysctl.h header and modify the documentation in both the header
and proc_sysctl.c files to mention "register_sysctl" instead of
"register_sysctl_table".

This plus the commits that remove register_sysctl_table from parport
save 217 bytes:

./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new
add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217)
Function                                     old     new   delta
__register_sysctl_base                         8     286    +278
parport_proc_register                        268     379    +111
parport_device_proc_register                 195     247     +52
kzalloc.constprop                            598     608     +10
parport_default_proc_register                 62      69      +7
register_sysctl_table                        291       -    -291
parport_sysctl_template                     1288     904    -384
Total: Before=8603076, After=8602859, chg -0.00%

Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-23 21:43:26 -07:00
Linus Torvalds
bb7c241fae Some ext4 bug fixes (mostly to address Syzbot reports) for v6.4-rc2.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmRgCfAACgkQ8vlZVpUN
 gaOaOgf5AbFUBsjb95Aq2Y6SKvlyO2xFd2OqJXu6+bGaJScQ8qeoW2byihN4vD/e
 i5V5vivpk764k1uOUe9fq5BlkaTuvFJI8d81eEnJC3LW4s7r6Gv586dwbE5lr0Bq
 cZKCVMYdgwz3admGtPXrN0CVgg+Y/wHb1ZmGtt2nAqZfNqYfpX0waDyGr6JebhkO
 04VE8QQCvMkO6oOIR9ZfbJmVm5vrGqQVLW4T0hXVTj9r3gUu/61qAkt2XYAu5tKJ
 ENIoMv2ix0asAgFSbcIzY6YnCzSY9hiV/K6Twtusf63r22T+r6+LXBqUe+8hMx4E
 Vh8L+5wkeNkCXD8HwnHizPx5r0nLqw==
 =ouFA
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Some ext4 bug fixes (mostly to address Syzbot reports)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: bail out of ext4_xattr_ibody_get() fails for any reason
  ext4: add bounds checking in get_max_inline_xattr_value_size()
  ext4: add indication of ro vs r/w mounts in the mount message
  ext4: fix deadlock when converting an inline directory in nojournal mode
  ext4: improve error recovery code paths in __ext4_remount()
  ext4: improve error handling from ext4_dirhash()
  ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
  ext4: check iomap type only if ext4_iomap_begin() does not fail
  ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
  ext4: fix data races when using cached status extents
  ext4: avoid deadlock in fs reclaim with page writeback
  ext4: fix invalid free tracking in ext4_xattr_move_to_block()
  ext4: remove a BUG_ON in ext4_mb_release_group_pa()
  ext4: allow ext4_get_group_info() to fail
  ext4: fix lockdep warning when enabling MMP
  ext4: fix WARNING in mb_find_extent
2023-05-13 17:45:39 -07:00
Theodore Ts'o
2a534e1d0d ext4: bail out of ext4_xattr_ibody_get() fails for any reason
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any
reason, it's best if we just fail as opposed to stumbling on,
especially if the failure is EFSCORRUPTED.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
2220eaf909 ext4: add bounds checking in get_max_inline_xattr_value_size()
Normally the extended attributes in the inode body would have been
checked when the inode is first opened, but if someone is writing to
the block device while the file system is mounted, it's possible for
the inode table to get corrupted.  Add bounds checking to avoid
reading beyond the end of allocated memory if this happens.

Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
6dcc98fbc4 ext4: add indication of ro vs r/w mounts in the mount message
Whether the file system is mounted read-only or read/write is more
important than the quota mode, which we are already printing.  Add the
ro vs r/w indication since this can be helpful in debugging problems
from the console log.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
f4ce24f54d ext4: fix deadlock when converting an inline directory in nojournal mode
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock.  There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.

A simple reproducer demonstrating the problem:

    mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
    mount -t ext4 -o dirsync /dev/vdc /vdc
    cd /vdc
    mkdir file0
    cd file0
    touch file0
    touch file1
    attr -s BurnSpaceInEA -V abcde .
    touch supercalifragilisticexpialidocious

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
4c0b4818b1 ext4: improve error recovery code paths in __ext4_remount()
If there are failures while changing the mount options in
__ext4_remount(), we need to restore the old mount options.

This commit fixes two problem.  The first is there is a chance that we
will free the old quota file names before a potential failure leading
to a use-after-free.  The second problem addressed in this commit is
if there is a failed read/write to read-only transition, if the quota
has already been suspended, we need to renable quota handling.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
4b3cb1d108 ext4: improve error handling from ext4_dirhash()
The ext4_dirhash() will *almost* never fail, especially when the hash
tree feature was first introduced.  However, with the addition of
support of encrypted, casefolded file names, that function can most
certainly fail today.

So make sure the callers of ext4_dirhash() properly check for
failures, and reflect the errors back up to their callers.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+394aa8a792cb99dbc837@syzkaller.appspotmail.com
Reported-by: syzbot+344aaa8697ebd232bfc8@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
a44be64bbe ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the

	WARN_ON_ONCE(dquot_initialize_needed(inode));

in ext4_xattr_block_set(), with the following stack trace:

   WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
   RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
   Call Trace:
    ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
    ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
    security_inode_init_security+0x2df/0x3f0 security/security.c:1147
    __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
    ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
    vfs_mkdir+0x29d/0x450 fs/namei.c:4038
    do_mkdirat+0x264/0x520 fs/namei.c:4061
    __do_sys_mkdirat fs/namei.c:4076 [inline]
    __se_sys_mkdirat fs/namei.c:4074 [inline]
    __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Baokun Li
fa83c34e3e ext4: check iomap type only if ext4_iomap_begin() does not fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Cc: stable@kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Tudor Ambarus
4f04351888 ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
When modifying the block device while it is mounted by the filesystem,
syzbot reported the following:

BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 crc16+0x206/0x280 lib/crc16.c:58
 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
 ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
 ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
 ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
 ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
 notify_change+0xe50/0x1100 fs/attr.c:482
 do_truncate+0x200/0x2f0 fs/open.c:65
 handle_truncate fs/namei.c:3216 [inline]
 do_open fs/namei.c:3561 [inline]
 path_openat+0x272b/0x2dd0 fs/namei.c:3714
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_creat fs/open.c:1402 [inline]
 __se_sys_creat fs/open.c:1396 [inline]
 __x64_sys_creat+0x11f/0x160 fs/open.c:1396
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000

Replace
	le16_to_cpu(sbi->s_es->s_desc_size)
with
	sbi->s_desc_size

It reduces ext4's compiled text size, and makes the code more efficient
(we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside. It also avoids the
potential KASAN / syzkaller failure, as a bonus.

Reported-by: syzbot+fc51227e7100c9294894@syzkaller.appspotmail.com
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42
Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3
Link: https://lore.kernel.org/all/000000000000ece18705f3b20934@google.com/
Fixes: 717d50e497 ("Ext4: Uninitialized Block Groups")
Cc: stable@vger.kernel.org
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://lore.kernel.org/r/20230504121525.3275886-1-tudor.ambarus@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Jan Kara
492888df0c ext4: fix data races when using cached status extents
When using cached extent stored in extent status tree in tree->cache_es
another process holding ei->i_es_lock for reading can be racing with us
setting new value of tree->cache_es. If the compiler would decide to
refetch tree->cache_es at an unfortunate moment, it could result in a
bogus in_range() check. Fix the possible race by using READ_ONCE() when
using tree->cache_es only under ei->i_es_lock for reading.

Cc: stable@kernel.org
Reported-by: syzbot+4a03518df1e31b537066@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000d3b33905fa0fd4a6@google.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504125524.10802-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Jan Kara
00d873c17e ext4: avoid deadlock in fs reclaim with page writeback
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:

CPU0                            CPU1

ext4_writepages()
  percpu_down_read(sbi->s_writepages_rwsem);
                                ext4_change_inode_journal_flag()
                                  percpu_down_write(sbi->s_writepages_rwsem);
                                    - blocks, all readers block from now on
  ext4_do_writepages()
    ext4_init_io_end()
      kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
        fs_reclaim frees dentry...
          dentry_unlink_inode()
            iput() - last ref =>
              iput_final() - inode dirty =>
                write_inode_now()...
                  ext4_writepages() tries to acquire sbi->s_writepages_rwsem
                    and blocks forever

Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock.

Reported-by: syzbot+6898da502aef574c5f8a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com
Fixes: c8585c6fca ("ext4: fix races between changing inode journal mode and ext4_writepages")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
b87c7cdf2b ext4: fix invalid free tracking in ext4_xattr_move_to_block()
In ext4_xattr_move_to_block(), the value of the extended attribute
which we need to move to an external block may be allocated by
kvmalloc() if the value is stored in an external inode.  So at the end
of the function the code tried to check if this was the case by
testing entry->e_value_inum.

However, at this point, the pointer to the xattr entry is no longer
valid, because it was removed from the original location where it had
been stored.  So we could end up calling kvfree() on a pointer which
was not allocated by kvmalloc(); or we could also potentially leak
memory by not freeing the buffer when it should be freed.  Fix this by
storing whether it should be freed in a separate variable.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430160426.581366-1-tytso@mit.edu
Link: https://syzkaller.appspot.com/bug?id=5c2aee8256e30b55ccf57312c16d88417adbd5e1
Link: https://syzkaller.appspot.com/bug?id=41a6b5d4917c0412eb3b3c3c604965bed7d7420b
Reported-by: syzbot+64b645917ce07d89bde5@syzkaller.appspotmail.com
Reported-by: syzbot+0d042627c4f2ad332195@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
463808f237 ext4: remove a BUG_ON in ext4_mb_release_group_pa()
If a malicious fuzzer overwrites the ext4 superblock while it is
mounted such that the s_first_data_block is set to a very large
number, the calculation of the block group can underflow, and trigger
a BUG_ON check.  Change this to be an ext4_warning so that we don't
crash the kernel.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430154311.579720-3-tytso@mit.edu
Reported-by: syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
5354b2af34 ext4: allow ext4_get_group_info() to fail
Previously, ext4_get_group_info() would treat an invalid group number
as BUG(), since in theory it should never happen.  However, if a
malicious attaker (or fuzzer) modifies the superblock via the block
device while it is the file system is mounted, it is possible for
s_first_data_block to get set to a very large number.  In that case,
when calculating the block group of some block number (such as the
starting block of a preallocation region), could result in an
underflow and very large block group number.  Then the BUG_ON check in
ext4_get_group_info() would fire, resutling in a denial of service
attack that can be triggered by root or someone with write access to
the block device.

For a quality of implementation perspective, it's best that even if
the system administrator does something that they shouldn't, that it
will not trigger a BUG.  So instead of BUG'ing, ext4_get_group_info()
will call ext4_error and return NULL.  We also add fallback code in
all of the callers of ext4_get_group_info() that it might NULL.

Also, since ext4_get_group_info() was already borderline to be an
inline function, un-inline it.  The results in a next reduction of the
compiled text size of ext4 by roughly 2k.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430154311.579720-2-tytso@mit.edu
Reported-by: syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2023-05-13 18:02:46 -04:00
Linus Torvalds
76c7f8873a for-6.4-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRebDIACgkQxWXV+ddt
 WDu3vA//RNyRGjEz0HgfhTc1119DXJLwK6j544waYLrzRcMtBK4xKByiaFkAA4tL
 PQidGX+nAQPm+pZl0jcK30cBMObik5GXJwoSOZGl7/ectx4O7aFfXqiSfwPTyqZU
 3fTavoqoJxbxJCVbifcXOPNhsUxMlEGYJmA3CVRsllLviXY+3HMpX2ZpWZ7vch+N
 MLENNBfUo1HVdWaxOYfQif/qT5iR9G7D8dBjX9DUK0kVwrbwBB0rolJy4fPrY6z5
 gBLED9Ks3FBgyU3mYq4qrfPmbfF8mPiaU0+1j+B46vw3PdPtIwjIForR+91GsZ1v
 iHojbykf6VWTQV+gO78mgv4O4vRtn3C+UJaGxLL86OMOaiQQHFYdSETn9arPmoho
 p1wCBidI82tvfIOGYXgrTGorLN27hhyPJinHe/2Bqo+1wUL8/J8mwCWunIox7a8z
 rxO5QhDIDFX7gamsvYjkW3tBkYuGiGvBjx+Ic2cBHTkVp9wSPL9PCvqNNru2qexA
 t0BpAL9DxvN+T1xO1thC3qsm2Ogx0QEmgdDfRglbEVASnRZKZZsJEMO90FzFbkFg
 vLbs0KnT7yS7mTwq4NklDrgHZ0eiiJLZVCb8bR8xkzVW+ADrUmZuDM8WOcCgJAUp
 fUoMmFsJZi5zsdAOygDWr1bBHorLV5szrY0bSB5L2eHwJjYZ6KE=
 =uWUN
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:

 - fix incorrect number of bitmap entries for space cache if loading is
   interrupted by some error

 - fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
   is used in deduplication tools

 - zoned mode fixes:
      - properly finish zone reserved for relocation
      - correctly calculate super block zone end on ZNS
      - properly initialize new extent buffer for redirty

 - make mount option clear_cache work with block-group-tree, to rebuild
   free-space-tree instead of temporarily disabling it that would lead
   to a forced read-only mount

 - fix alignment check for offset when printing extent item

* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: make clear_cache mount option to rebuild FST without disabling it
  btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
  btrfs: zoned: fix full zone super block reading on ZNS
  btrfs: zoned: zone finish data relocation BG with last IO
  btrfs: fix backref walking not returning all inode refs
  btrfs: fix space cache inconsistency after error loading it from disk
  btrfs: print-tree: parent bytenr must be aligned to sector size
2023-05-12 17:10:32 -05:00
Linus Torvalds
fd88f147cb 6 smb3 client fixes, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRd3pcACgkQiiy9cAdy
 T1FrEAwAj6M0DnI13diLdjZVs9WAVRIiu5Fk1l8oA6bco74UepbJjZmTf8K8G0pM
 fQDTXZWDa5g11tbYuXFb4Ue9Dv7hu4RuUthWZCxoDnKcvH00VtAtYplE1LJstWBh
 mZwHm8iEBRPs2oB1YZ7KnQ8kFSjs2VgwV2PQAI24QDP0QWTT4jtw/nPqLxKLxyef
 Rr9YkOK0LZPhhGEAZ5ABwkPH97U5CUMMwRTqhGDO/mm91FZ8sD+vAsqM7GdLU73w
 wGHp3M1RKBVBubNCgNT0uRMlnlqBcvoThqhYgJ6PYnS4uE23a2uLDNI8vj9LlMRo
 csoYdM+Kltpvki8hFIWiFxfHFfECHD6h/QOpRMSv9InU0ly6o7zwnl9VtuWcRyrM
 vyz3xnlI6wG6TCy1HGezq3HSaEbEbVpwLCxyG/xFiAHV/VdkmjmRUb0RJV4X3X8M
 /KUT+1wj0MAMmVUTIpIHuotRikScAPnP/NDMorIsgEUp/2TFh2m+FQZ15o401sGv
 96G8v+9I
 =ax5r
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:

 - fix for copy_file_range bug for very large files that are multiples
   of rsize

 - do not ignore "isolated transport" flag if set on share

 - set rasize default better

 - three fixes related to shutdown and freezing (fixes 4 xfstests, and
   closes deferred handles faster in some places that were missed)

* tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: release leases for deferred close handles when freezing
  smb3: fix problem remounting a share after shutdown
  SMB3: force unmount was failing to close deferred close files
  smb3: improve parallel reads of large files
  do not reuse connection if share marked as isolated
  cifs: fix pcchunk length type in smb2_copychunk_range
2023-05-12 17:01:36 -05:00
Linus Torvalds
df8c2d13e2 vfs/v6.4-rc1/pipe
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZF5YpQAKCRCRxhvAZXjc
 oqZpAQCuXA/XDf4Cyhfk87C0gbzy1Akjf3Xpia8q1mWtvN4I+AD+PE54krWgxGRi
 mu5Ae2X95tQ1v+MUAcmWOMdITK8HewE=
 =8+GL
 -----END PGP SIGNATURE-----

Merge tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
 "During the pipe nonblock rework the check for both O_NONBLOCK and
  IOCB_NOWAIT was dropped. Both checks need to be performed to ensure
  that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing
  to or reading from a pipe.

  This just contains the fix adding the check for IOCB_NOWAIT back in"

* tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
2023-05-12 16:56:09 -05:00
Jens Axboe
c04fe8e32f
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in. The latter isn't currently true, ensure we check for both
before waiting on data or space.

Fixes: afed6271f5 ("pipe: set FMODE_NOWAIT on pipes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <e5946d67-4e5e-b056-ba80-656bab12d9f6@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-12 17:17:27 +02:00
Linus Torvalds
849a4f0973 xfs: bug fixes for 6.4-rc2
o fixes for inode garbage collection shutdown racing with work queue
   updates
 o ensure inodegc workers run on the CPU they are supposed to
 o disable counter scrubbing until we can exclusively freeze the
   filesystem from the kernel
 o Regression fixes for new allocation related bugs
 o a couple of minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmRcSIsUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h2Y8xAAxtsTdOx71XtDuNyfBOiqzZgTCq6b
 6LsckJIDQa1AXjUNq9G3zWcUcWBcRWcw+CWbkqjqQ9W47K/ijLuoKnjRsQ+5B4DU
 TBUctVq+/Zk2lBlb6HKuKdzqDGnIFWGVKVd7u8KlowqnXuzUeQ0vFkT7ZHTepUKG
 P+midgGNVT4+tykq7oH0H8WxoTyNPZhKiAUcZjneBgA60IAoQWHA2iUt+SKpbrkL
 1HyK+/edVMTXiDXtyHfXmDaH9Pgy6NCpw3TNkPDhuL1UDpLhg/zgT39rFZGBsAUt
 gaDM3wN5jBrot/mvJE3rH9bdZhkcf+NQKPx/1DDg3DL8plS/1/LUC4cImdolBJ3w
 RNmgJv1lK+AlE4MUJ/bUDlEpHUmwAjnnsxBXwEvnYNfj+9V6/mDB+HqKiY7/XxVK
 vF77s6z+CWvefdnZavJ4/72pVVJNkcDYCYmvh/donRP6vtnwZyzocFUeBeNMInV1
 /s3WMrF9hwmJqAClKG7p1fnszWp658yFIuw/TXVs+NrjTtQgXwMpl2cEYYvUZEJN
 Trq2p0xH/JSwcnOPSPJO6WHb8UPoqrM6lgGFaJVWJx1AWt1i1CFLf5eA5X+XisDV
 AJKgpqlnDg02bBMQ0tMFGZUaNx/1S1mwtxcZsyEFTutpUNxqJKDaMohpxxrWb0WC
 ppSqDvyJN4wtlFI=
 =qok2
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs bug fixes from Dave Chinner:
 "Largely minor bug fixes and cleanups, th emost important of which are
  probably the fixes for regressions in the extent allocation code:

   - fixes for inode garbage collection shutdown racing with work queue
     updates

   - ensure inodegc workers run on the CPU they are supposed to

   - disable counter scrubbing until we can exclusively freeze the
     filesystem from the kernel

   - regression fixes for new allocation related bugs

   - a couple of minor cleanups"

* tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix xfs_inodegc_stop racing with mod_delayed_work
  xfs: disable reaping in fscounters scrub
  xfs: check that per-cpu inodegc workers actually run on that cpu
  xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
  xfs: fix negative array access in xfs_getbmap
  xfs: don't allocate into the data fork for an unshare request
  xfs: flush dirty data and drain directios before scrubbing cow fork
  xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
  xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
2023-05-11 16:51:11 -05:00
Steve French
d39fc592ef cifs: release leases for deferred close handles when freezing
We should not be caching closed files when freeze is invoked on an fs
(so we can release resources more gracefully).

Fixes xfstests generic/068 generic/390 generic/491

Reviewed-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-10 17:48:30 -05:00
Linus Torvalds
d295b66a7b \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmRcFLAACgkQnJ2qBz9k
 QNmSuQf+P+hFOHVpCb83S4TrNsfPC9M7Pv/M1Rw926p1gESNLez1ElAx2B35OIBZ
 y4ijqrSHgXbCsC3PvbxR9YfBPJZwIhtmKwjPC+9KlhAFB536Bvt/nt34fjRCWJiS
 nABnexegkirk7EQufAGVZQ1gtm1p0qSCA1B/l5AFl4KeLIin31U/08F2eE1OAo3K
 TmxhXzlC+JRXjf/X/X8mzJ8nz/hY5039BDvs7EpVdOsZcLFRQcCUygUCrOdV94uI
 1wyzMoGNaiIsZHpHPf3NtphUoE9IFpZQJGXMTR4ehBgFYkkOHYxsjW+wn8YOmaks
 qzRRXb9w0WdgAXChDGu1xYXV6Wpjiw==
 =kqSe
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull inotify fix from Jan Kara:
 "A fix for possibly reporting invalid watch descriptor with inotify
  event"

* tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify: Avoid reporting event with invalid wd
2023-05-10 17:07:42 -05:00
Linus Torvalds
2a78769da3 gfs2 fix
- Fix a NULL pointer dereference when mounting corrupted filesystems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmRbtSYUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrQSg/6A7xD25DpU1CSL18CEKMxnViXPNLP
 IYiTtkByCsRkgQKR4KMtVgPD/lEyWbhpuZJzSF4mZkslpnngjL/o+Rj4Sds9GvPL
 +66CSJAqjpJUjgGDvAoiJcZYOejRyqZLCHzKF54h0oJ3k9JCd3xN86EX8vKGoc2e
 F/As+E1Wnf0PP4jiERDyFhhg95MNxFuNVJ/JJ7mwtWpumx+60SpGzzUp575AHcoe
 4zHkVHlj6FAz2G8l8NADYAc0wMuhQ05g19psHzRDr2bB9zHBy9kWzXN0MYrhRvuY
 HuQ3CNDbXCJPXFjrNEmLV0f7Y8yCk/YNrYPbo4zV+AW0l7NzesqzrwkP/qgNfySD
 pnQJ/mQ76P/h728fln0pYvO8GnFUFoKKoCR27jH4F/Mntr4hgR4FZNDhvK6BMQNc
 PpipJcXWT8AUY45Sg3TLKPEhjp37K+abiMKfyTEFQRvLT83O6WyCUx2HYpZI7l87
 /aTIsidlYkb2ZSeLeXEfBplRXwF7NQNvvRJCGQKHAN7UbTB1o2RVpC5bn6UIghb1
 eQQZObs4AUfmzHthF2OaeJy08AiFidbOVLmbpZuj2iOWbJzmjJea8YT/GdntxhMT
 swmkHJqdmZEYoSKKvDmemT6l7qtOwijg7sTQ/d2D77lk8QPPIHdUw6g1OVvMqoNE
 oESpIA4O2fshlpU=
 =/s3O
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix a NULL pointer dereference when mounting corrupted filesystems

* tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't deref jdesc in evict
2023-05-10 16:48:35 -05:00
Bob Peterson
504a10d9e4 gfs2: Don't deref jdesc in evict
On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:

init_journal()
...
fail_jindex:
   gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
      if (gfs2_holder_initialized(&ji_gh))
         gfs2_glock_dq_uninit(&ji_gh);
fail:
   iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
      evict()
         gfs2_evict_inode()
            evict_linked_inode()
               ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.

The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.

This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.

Reported-by: Yang Lan <lanyang0908@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-05-10 17:15:18 +02:00
Qu Wenruo
1d6a4fc857 btrfs: make clear_cache mount option to rebuild FST without disabling it
Previously clear_cache mount option would simply disable free-space-tree
feature temporarily then re-enable it to rebuild the whole free space
tree.

But this is problematic for block-group-tree feature, as we have an
artificial dependency on free-space-tree feature.

If we go the existing method, after clearing the free-space-tree
feature, we would flip the filesystem to read-only mode, as we detect a
super block write with block-group-tree but no free-space-tree feature.

This patch would change the behavior by properly rebuilding the free
space tree without disabling this feature, thus allowing clear_cache
mount option to work with block group tree.

Now we can mount a filesystem with block-group-tree feature and
clear_mount option:

  $ mkfs.btrfs  -O block-group-tree /dev/test/scratch1  -f
  $ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache
  $ sudo dmesg -t | head -n 5
  BTRFS info (device dm-1): force clearing of disk cache
  BTRFS info (device dm-1): using free space tree
  BTRFS info (device dm-1): auto enabling async discard
  BTRFS info (device dm-1): rebuilding free space tree
  BTRFS info (device dm-1): checking UUID tree

CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:51:27 +02:00
Christoph Hellwig
c83b56d1dd btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs_redirty_list_add zeroes the buffer data and sets the
EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus
header.  But it does that after already marking the buffer dirty, which
means that writeback could already be looking at the buffer.

Switch the order of operations around so that the buffer is only marked
dirty when we're ready to write it.

Fixes: d3575156f6 ("btrfs: zoned: redirty released extent buffers")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:29 +02:00
Naohiro Aota
02ca9e6fb5 btrfs: zoned: fix full zone super block reading on ZNS
When both of the superblock zones are full, we need to check which
superblock is newer. The calculation of last superblock position is wrong
as it does not consider zone_capacity and uses the length.

Fixes: 9658b72ef3 ("btrfs: zoned: locate superblock position using zone capacity")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:22 +02:00
Naohiro Aota
f84353c7c2 btrfs: zoned: zone finish data relocation BG with last IO
For data block groups, we zone finish a zone (or, just deactivate it) when
seeing the last IO in btrfs_finish_ordered_io(). That is only called for
IOs using ZONE_APPEND, but we use a regular WRITE command for data
relocation IOs. Detect it and call btrfs_zone_finish_endio() properly.

Fixes: be1a1d7a5d ("btrfs: zoned: finish fully written block group")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:12 +02:00
Filipe Manana
0cad8f14d7 btrfs: fix backref walking not returning all inode refs
When using the logical to ino ioctl v2, if the flag to ignore offsets of
file extent items (BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET) is given, the
backref walking code ends up not returning references for all file offsets
of an inode that point to the given logical bytenr. This happens since
kernel 6.2, commit 6ce6ba5344 ("btrfs: use a single argument for extent
offset in backref walking functions") because:

1) It mistakenly skipped the search for file extent items in a leaf that
   point to the target extent if that flag is given. Instead it should
   only skip the filtering done by check_extent_in_eb() - that is, it
   should not avoid the calls to that function (or find_extent_in_eb(),
   which uses it).

2) It was also not building a list of inode extent elements (struct
   extent_inode_elem) if we have multiple inode references for an extent
   when the ignore offset flag is given to the logical to ino ioctl - it
   would leave a single element, only the last one that was found.

These stem from the confusing old interface for backref walking functions
where we had an extent item offset argument that was a pointer to a u64
and another boolean argument that indicated if the offset should be
ignored, but the pointer could be NULL. That NULL case is used by
relocation, qgroup extent accounting and fiemap, simply to avoid building
the inode extent list for each reference, as it's not necessary for those
use cases and therefore avoids memory allocations and some computations.

Fix this by adding a boolean argument to the backref walk context
structure to indicate that the inode extent list should not be built,
make relocation set that argument to true and fix the backref walking
logic to skip the calls to check_extent_in_eb() and find_extent_in_eb()
only if this new argument is true, instead of 'ignore_extent_item_pos'
being true.

A test case for fstests will be added soon, to provide cover not only
for these cases but to the logical to ino ioctl in general as well, as
currently we do not have a test case for it.

Reported-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Link: https://lore.kernel.org/linux-btrfs/CAHhfkvwo=nmzrJSqZ2qMfF-rZB-ab6ahHnCD_sq9h4o8v+M7QQ@mail.gmail.com/
Fixes: 6ce6ba5344 ("btrfs: use a single argument for extent offset in backref walking functions")
CC: stable@vger.kernel.org # 6.2+
Tested-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:09:11 +02:00
Filipe Manana
0004ff15ea btrfs: fix space cache inconsistency after error loading it from disk
When loading a free space cache from disk, at __load_free_space_cache(),
if we fail to insert a bitmap entry, we still increment the number of
total bitmaps in the btrfs_free_space_ctl structure, which is incorrect
since we failed to add the bitmap entry. On error we then empty the
cache by calling __btrfs_remove_free_space_cache(), which will result
in getting the total bitmaps counter set to 1.

A failure to load a free space cache is not critical, so if a failure
happens we just rebuild the cache by scanning the extent tree, which
happens at block-group.c:caching_thread(). Yet the failure will result
in having the total bitmaps of the btrfs_free_space_ctl always bigger
by 1 then the number of bitmap entries we have. So fix this by having
the total bitmaps counter be incremented only if we successfully added
the bitmap entry.

Fixes: a67509c300 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:08:05 +02:00
Anastasia Belova
c87f318e6f btrfs: print-tree: parent bytenr must be aligned to sector size
Check nodesize to sectorsize in alignment check in print_extent_item.
The comment states that and this is correct, similar check is done
elsewhere in the functions.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: ea57788eb7 ("btrfs: require only sector size alignment for parent eb bytenr")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:07:40 +02:00
Linus Torvalds
16a8829130 nfs: fix another case of NULL/IS_ERR confusion wrt folio pointers
Dan has been improving on the smatch error pointer checks, and pointed
at another case where the __filemap_get_folio() conversion to error
pointers had been overlooked.  This time because it was hidden behind
the filemap_grab_folio() helper function that is a wrapper around it.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-09 10:22:13 -07:00
Linus Torvalds
1dc3731daf for-6.4-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRaYhYACgkQxWXV+ddt
 WDvCRQ/+MjRuInALh+N34mMneF8jjPOlUQBZbaC43XYJ0ss9drSzvE3STmPVrjdK
 IHyzRYipKI6vdTtzYbyGwxJ9oazsuXTQXC3w/qMW1hO1EAQ0a9tbnTSIQ+BDbU63
 BW7rJ3JuM6hKxKK+e9Dserhks0lOgQc+xKT1CUELvAHp3UykD4OrNczguaIT2lGR
 YXL+9B3ex2SooCqrQStkqEtjD/kxbaYUkK7yWA2FssXWqU5SjZwUOsuY3ZPOWrm1
 ULNI67gIxkMkSynV3aYka7nY3xc9oGIfk9WPeylWcOcH3+pWabeptjk617XbA0KI
 4biz1zZ/qTRXWlCLDv3ukUa5EIVAWQ1kxVE/hAt3SzqJvoqB/ymML/2LeQNdyx2i
 adMTZQ95JkhQNU9Lp9QOtpgfZonhhjxnL9KE7eMVo28zJFdYjge3egINjimY+mLz
 qzrzUBI3bqCNYG0LRR1EvuN0feBd/9nNMFjLBi2mkDqsWtzvTxxzWvVlV5EEcoJe
 xrozGh00Y5ioP6ZanKuZRib+u2ligbD66dYhKSU74D6B5kuZPic3Kkn9qICjRByM
 uBGBze/7GT/3ouhPOwxVPtGZstiFhbAxE7mApROrIxAx8I9rZjBdHgFJQklolXNy
 HSKNf3u98XZBVVcku/O1hyoeTLnApPfApxD4lv3qlRmgdEnAp6I=
 =K25T
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix backward leaf iteration which could possibly return the same key

 - fix assertion when device add and balance race for exclusive
   operation

 - fix regression when freeing device, state tree would leak after
   device replace

 - fix attempt to clear space cache v1 when block-group-tree is enabled

 - fix potential i_size corruption when encoded write races with send v2
   and enabled no-holes (the race is hard to hit though, the window is a
   few instructions wide)

 - fix wrong bitmap API use when checking empty zones, parameters were
   swapped but not causing a bug due to other code

 - prevent potential qgroup leak if subvolume create does not commit
   transaction (which is pending in the development queue)

 - error handling and reporting:
     - abort transaction when sibling keys check fails for leaves
     - print extent buffers when sibling keys check fails

* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't free qgroup space unless specified
  btrfs: fix encoded write i_size corruption with no-holes
  btrfs: zoned: fix wrong use of bitops API in btrfs_ensure_empty_zones
  btrfs: properly reject clear_cache and v1 cache for block-group-tree
  btrfs: print extent buffers when sibling keys check fails
  btrfs: abort transaction when sibling keys check fails for leaves
  btrfs: fix leak of source device allocation state after device replace
  btrfs: fix assertion of exclop condition when starting balance
  btrfs: fix btrfs_prev_leaf() to not return the same key twice
2023-05-09 09:53:41 -07:00
Steve French
716a3cf317 smb3: fix problem remounting a share after shutdown
xfstests generic/392 showed a problem where even after a
shutdown call was made on a mount, we would still attempt
to use the (now inaccessible) superblock if another mount
was attempted for the same share.

Reported-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 087f757b01 ("cifs: add shutdown support")
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-09 09:54:58 -05:00
Steve French
2cb6f96877 SMB3: force unmount was failing to close deferred close files
In investigating a failure with xfstest generic/392 it
was noticed that mounts were reusing a superblock that should
already have been freed. This turned out to be related to
deferred close files keeping a reference count until the
closetimeo expired.

Currently the only way an fs knows that mount is beginning is
when force unmount is called, but when this, ie umount_begin(),
is called all deferred close files on the share (tree
connection) should be closed immediately (unless shared by
another mount) to avoid using excess resources on the server
and to avoid reusing a superblock which should already be freed.

In umount_begin, close all deferred close handles for that
share if this is the last mount using that share on this
client (ie send the SMB3 close request over the wire for those
that have been already closed by the app but that we have
kept a handle lease open for and have not sent closes to the
server for yet).

Reported-by: David Howells <dhowells@redhat.com>
Acked-by: Bharath SM <bharathsm@microsoft.com>
Cc: <stable@vger.kernel.org>
Fixes: 78c09634f7 ("Cifs: Fix kernel oops caused by deferred close for files.")
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-09 09:53:59 -05:00
Steve French
ba8c2b75b0 smb3: improve parallel reads of large files
rasize (ra_pages) should be set higher than read size by default
to allow parallel reads when reading large files in order to
improve performance (otherwise there is much dead time on the
network when doing readahead of large files).  Default rasize
to twice readsize.

Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-09 00:59:48 -05:00
Steve French
cbd4cbabef do not reuse connection if share marked as isolated
"SHAREFLAG_ISOLATED_TRANSPORT" indicates that we should not reuse the socket
for this share (for future mounts).  Mark the socket as server->nosharesock if
share flags returned include SHAREFLAG_ISOLATED_TRANSPORT.

See MS-SMB2 MS-SMB2 2.2.10 and 3.2.5.5

Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-08 12:00:47 -05:00
Pawel Witek
d66cde50c3 cifs: fix pcchunk length type in smb2_copychunk_range
Change type of pcchunk->Length from u32 to u64 to match
smb2_copychunk_range arguments type. Fixes the problem where performing
server-side copy with CIFS_IOC_COPYCHUNK_FILE ioctl resulted in incomplete
copy of large files while returning -EINVAL.

Fixes: 9bf0c9cd43 ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pawel Witek <pawel.ireneusz.witek@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-08 12:00:47 -05:00
Jan Kara
949f95ff39 ext4: fix lockdep warning when enabling MMP
When we enable MMP in ext4_multi_mount_protect() during mount or
remount, we end up calling sb_start_write() from write_mmp_block(). This
triggers lockdep warning because freeze protection ranks above s_umount
semaphore we are holding during mount / remount. The problem is harmless
because we are guaranteed the filesystem is not frozen during mount /
remount but still let's fix the warning by not grabbing freeze
protection from ext4_multi_mount_protect().

Cc: stable@kernel.org
Reported-by: syzbot+6b7df7d5506b32467149@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ab7e5b6f400b7778d46f01841422e5718fb81843
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230411121019.21940-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-07 21:11:18 -04:00
Ye Bin
fa08a7b61d ext4: fix WARNING in mb_find_extent
Syzbot found the following issue:

EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!
EXT4-fs (loop0): orphan cleanup on readonly fs
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5067 at fs/ext4/mballoc.c:1869 mb_find_extent+0x8a1/0xe30
Modules linked in:
CPU: 1 PID: 5067 Comm: syz-executor307 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:mb_find_extent+0x8a1/0xe30 fs/ext4/mballoc.c:1869
RSP: 0018:ffffc90003c9e098 EFLAGS: 00010293
RAX: ffffffff82405731 RBX: 0000000000000041 RCX: ffff8880783457c0
RDX: 0000000000000000 RSI: 0000000000000041 RDI: 0000000000000040
RBP: 0000000000000040 R08: ffffffff82405723 R09: ffffed10053c9402
R10: ffffed10053c9402 R11: 1ffff110053c9401 R12: 0000000000000000
R13: ffffc90003c9e538 R14: dffffc0000000000 R15: ffffc90003c9e2cc
FS:  0000555556665300(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056312f6796f8 CR3: 0000000022437000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ext4_mb_complex_scan_group+0x353/0x1100 fs/ext4/mballoc.c:2307
 ext4_mb_regular_allocator+0x1533/0x3860 fs/ext4/mballoc.c:2735
 ext4_mb_new_blocks+0xddf/0x3db0 fs/ext4/mballoc.c:5605
 ext4_ext_map_blocks+0x1868/0x6880 fs/ext4/extents.c:4286
 ext4_map_blocks+0xa49/0x1cc0 fs/ext4/inode.c:651
 ext4_getblk+0x1b9/0x770 fs/ext4/inode.c:864
 ext4_bread+0x2a/0x170 fs/ext4/inode.c:920
 ext4_quota_write+0x225/0x570 fs/ext4/super.c:7105
 write_blk fs/quota/quota_tree.c:64 [inline]
 get_free_dqblk+0x34a/0x6d0 fs/quota/quota_tree.c:130
 do_insert_tree+0x26b/0x1aa0 fs/quota/quota_tree.c:340
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 dq_insert_tree fs/quota/quota_tree.c:401 [inline]
 qtree_write_dquot+0x3b6/0x530 fs/quota/quota_tree.c:420
 v2_write_dquot+0x11b/0x190 fs/quota/quota_v2.c:358
 dquot_acquire+0x348/0x670 fs/quota/dquot.c:444
 ext4_acquire_dquot+0x2dc/0x400 fs/ext4/super.c:6740
 dqget+0x999/0xdc0 fs/quota/dquot.c:914
 __dquot_initialize+0x3d0/0xcf0 fs/quota/dquot.c:1492
 ext4_process_orphan+0x57/0x2d0 fs/ext4/orphan.c:329
 ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474
 __ext4_fill_super fs/ext4/super.c:5516 [inline]
 ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644
 get_tree_bdev+0x400/0x620 fs/super.c:1282
 vfs_get_tree+0x88/0x270 fs/super.c:1489
 do_new_mount+0x289/0xad0 fs/namespace.c:3145
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Add some debug information:
mb_find_extent: mb_find_extent block=41, order=0 needed=64 next=0 ex=0/41/1@3735929054 64 64 7
block_bitmap: ff 3f 0c 00 fc 01 00 00 d2 3d 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Acctually, blocks per group is 64, but block bitmap indicate at least has
128 blocks. Now, ext4_validate_block_bitmap() didn't check invalid block's
bitmap if set.
To resolve above issue, add check like fsck "Padding at end of block bitmap is
not set".

Cc: stable@kernel.org
Reported-by: syzbot+68223fe9f6c95ad43bed@syzkaller.appspotmail.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230116020015.1506120-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-07 21:03:24 -04:00
Linus Torvalds
63342b1dd5 9 smb3 client fixes, mostly DFS or reconnect related
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRWpDAACgkQiiy9cAdy
 T1Gm1gv+IZhwnTNWJ5OLaFKVjKsM5flAaj4To15hFxHDUgfkTlG+c0T2jbrjQZq5
 OEdnXn/zgVrq/JTU6TIrMA6netmkNKEvEW/MPb5sKgbXdkM8dM8AkCYjs2WvaHLV
 IBrmo9URUr0/I0rBMCvEFHGQvvggAizKrkvqVZq1DmQ+8Y7WXMWBuINux6i2+NmJ
 uDgBx/FcEIqpc0i5STD7hQ3kZ/MnBOVcd4lxUAZk6GtpiKZe/u6veWpyt9FqD8N7
 AXQhhfoLXEPNda1N5P1tfCIsWO+70XzgF5eetr95/71f3MIWPNbdOyJ9MFyr8DzV
 dUA8uReV8RVF7uCEC6AnXmOlteZ855trzpk8EK54ey5fmcnelzMLsIKNbzxgcJZP
 y4EDNbP57b+G28SnWTLsdXfrBxicO7ExBPOiys2hfJ3VCC5xSkN04IDlRnsgZKiE
 Tbwo/eaZy31Z6EN5BiqjFI/RgNZJaEdl9GeGZJFum3AGB+wKDSWl5Es77i11PBuI
 5qZgahwT
 =ZsoO
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "smb3 client fixes, mostly DFS or reconnect related:

   - Two DFS connection sharing fixes

   - DFS refresh fix

   - Reconnect fix

   - Two potential use after free fixes

   - Also print prefix patch in mount debug msg

   - Two small cleanup fixes"

* tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Remove unneeded semicolon
  cifs: fix sharing of DFS connections
  cifs: avoid potential races when handling multiple dfs tcons
  cifs: protect access of TCP_Server_Info::{origin,leaf}_fullpath
  cifs: fix potential race when tree connecting ipc
  cifs: fix potential use-after-free bugs in TCP_Server_Info::hostname
  cifs: print smb3_fs_context::source when mounting
  cifs: protect session status check in smb2_reconnect()
  SMB3.1.1: correct definition for app_instance_id create contexts
2023-05-07 10:46:21 -07:00
Linus Torvalds
706ce3caea Five hotfixes. Three are cc:stable, two pertain to post-6.3 merge window
changes.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFaSnAAKCRDdBJ7gKXxA
 jskTAPwIkL6xVMIEdtWfQNIy751o0onzfOQAIWgM3dSL83cMJQEA5OXxXt4LjrJw
 4q0zf4GLYXjMjhnfYr8zXF39y05iyQA=
 =nIhV
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-05-06-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "Five hotfixes.

  Three are cc:stable, two pertain to merge window changes"

* tag 'mm-hotfixes-stable-2023-05-06-10-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  afs: fix the afs_dir_get_folio return value
  nilfs2: do not write dirty data after degenerating to read-only
  mm: do not reclaim private data from pinned page
  nilfs2: fix infinite loop in nilfs_mdt_get_block()
  mm/mmap/vma_merge: always check invariants
2023-05-06 11:25:03 -07:00
Linus Torvalds
994e2419f1 nfs: fix mis-merged __filemap_get_folio() error check
Fix another case of an incorrect check for the returned 'folio' value
from __filemap_get_folio().

The failure case used to return NULL, but was changed by commit
66dabbb65d ("mm: return an ERR_PTR from __filemap_get_folio").

But in the meantime, commit ec108d3cc7 ("NFS: Convert readdir page
array functions to use a folio") added a new user of that function.

And my merge of the two did not fix this up correctly.

The ext4 merge had the same issue, but that one had been caught in
linux-next and got properly fixed while merging.

Fixes: 0127f25b5d ("Merge tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs")
Cc: Anna Schumaker <anna@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-06 10:16:19 -07:00
Christoph Hellwig
58f5f6698a afs: fix the afs_dir_get_folio return value
Keep returning NULL on failure instead of letting an ERR_PTR escape to
callers that don't expect it.

Link: https://lkml.kernel.org/r/20230503154526.1223095-2-hch@lst.de
Fixes: 66dabbb65d ("mm: return an ERR_PTR from __filemap_get_folio")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-06 10:10:08 -07:00
Ryusuke Konishi
28a65b49eb nilfs2: do not write dirty data after degenerating to read-only
According to syzbot's report, mark_buffer_dirty() called from
nilfs_segctor_do_construct() outputs a warning with some patterns after
nilfs2 detects metadata corruption and degrades to read-only mode.

After such read-only degeneration, page cache data may be cleared through
nilfs_clear_dirty_page() which may also clear the uptodate flag for their
buffer heads.  However, even after the degeneration, log writes are still
performed by unmount processing etc., which causes mark_buffer_dirty() to
be called for buffer heads without the "uptodate" flag and causes the
warning.

Since any writes should not be done to a read-only file system in the
first place, this fixes the warning in mark_buffer_dirty() by letting
nilfs_segctor_do_construct() abort early if in read-only mode.

This also changes the retry check of nilfs_segctor_write_out() to avoid
unnecessary log write retries if it detects -EROFS that
nilfs_segctor_do_construct() returned.

Link: https://lkml.kernel.org/r/20230427011526.13457-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+2af3bc9585be7f23f290@syzkaller.appspotmail.com
  Link: https://syzkaller.appspot.com/bug?extid=2af3bc9585be7f23f290
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-06 10:10:07 -07:00
Ryusuke Konishi
a6a491c048 nilfs2: fix infinite loop in nilfs_mdt_get_block()
If the disk image that nilfs2 mounts is corrupted and a virtual block
address obtained by block lookup for a metadata file is invalid,
nilfs_bmap_lookup_at_level() may return the same internal return code as
-ENOENT, meaning the block does not exist in the metadata file.

This duplication of return codes confuses nilfs_mdt_get_block(), causing
it to read and create a metadata block indefinitely.

In particular, if this happens to the inode metadata file, ifile,
semaphore i_rwsem can be left held, causing task hangs in lock_mount.

Fix this issue by making nilfs_bmap_lookup_at_level() treat virtual block
address translation failures with -ENOENT as metadata corruption instead
of returning the error code.

Link: https://lkml.kernel.org/r/20230430193046.6769-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+221d75710bde87fa0e97@syzkaller.appspotmail.com
  Link: https://syzkaller.appspot.com/bug?extid=221d75710bde87fa0e97
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-06 10:10:07 -07:00
Linus Torvalds
a3b111b046 for-6.4/block-2023-05-06
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRWLQYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnwhD/4xRYfAY4O0oGZCITiKEEFxiSfDHCPMEBji
 zTEtideDRjcrpFmjmZ411C9prW32MxYQ3wQTf4O7w4t906xTYVr9FQy8g3Et4izI
 zglPcsa2jPzYCadQ0Ye4n9dRuYOH9FDJzDDLC+smu8zQKKmAqEAN/1ftpnADdVrY
 qX1sHyfz4RQRAgTHg2WqgKOi2O9VwGSMwOfBocmzAnruv9oLUlypcGnFPRCSZH/a
 OKpUZlvQhCBZTKScvVxQeJMg2Tl5yokQ0TH+gkQsdav9XcPJktqXiuD+c4h6q5ux
 oTlysEqcrwcaAafOcV0w9u80SFNlFACUYsNmEnFJaPXFTqAdNHvo1DJNsmxiHJDU
 bGo5ktlo5b/VZ51niOoWvGxursavq16G4yIYlGGHc7f4wGs12oc5ZP/yM3GRUY+C
 PdezwEvvQufxP7sFokfpgAS4SuH+tBlrhFXMsYaI4NukZQW4TK1zzbMrzOkdxFhW
 BOx17VFUKWtUnRmxinFGIA8Vj+FXN+E+ND+FoDsbrMyJD4maKDdJapPchG0J0Vbs
 pDcsB4c0pBC6H2xrobKiA1CuSq2t2qvyvwe1Zl2Xd+RVW9vBB5SI6HXYrC+UtxwY
 7LfX8F13cFD1E6iJ9Nta6x8fOunGnOVBdW5O0k4hDWEuZduvHItEDn2c3Ehqp4Jw
 P8dFBbk8SQ==
 =gAYf
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

 - MD pull request via Song:
      - Improve raid5 sequential IO performance on spinning disks, which
        fixes a regression since v6.0 (Jan Kara)
      - Fix bitmap offset types, which fixes an issue introduced in this
        merge window (Jonathan Derrick)

 - Cleanup of hweight type used for cgroup writeback (Maxim)

 - Fix a regression with the "has_submit_bio" changes across partitions
   (Ming)

 - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing.

   We used to set this flag on queues non blk-mq queues, and hence some
   drivers clear it unconditionally. Since all of these have since been
   converted to true blk-mq drivers, drop the useless clear as the bit
   is not set (Chaitanya)

 - Fix the flags being set in a bio for a flush for drbd (Christoph)

 - Cleanup and deduplication of the code handling setting block device
   capacity (Damien)

 - Fix for ublk handling IO timeouts (Ming)

 - Fix for a regression in blk-cgroup teardown (Tao)

 - NBD documentation and code fixes (Eric)

 - Convert blk-integrity to using device_attributes rather than a second
   kobject to manage lifetimes (Thomas)

* tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux:
  ublk: add timeout handler
  drbd: correctly submit flush bio on barrier
  mailmap: add mailmap entries for Jens Axboe
  block: Skip destroyed blkg when restart in blkg_destroy_all()
  writeback: fix call of incorrect macro
  md: Fix bitmap offset type in sb writer
  md/raid5: Improve performance for sequential IO
  docs nbd: userspace NBD now favors github over sourceforge
  block nbd: use req.cookie instead of req.handle
  uapi nbd: add cookie alias to handle
  uapi nbd: improve doc links to userspace spec
  blk-integrity: register sysfs attributes on struct device
  blk-integrity: convert to struct device_attribute
  blk-integrity: use sysfs_emit
  block/drivers: remove dead clear of random flag
  block: sync part's ->bd_has_submit_bio with disk's
  block: Cleanup set_capacity()/bdev_set_nr_sectors()
2023-05-06 08:28:58 -07:00