Commit graph

72893 commits

Author SHA1 Message Date
Chen Jingwen
9b2f72cc0a elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
In commit b212921b13 ("elf: don't use MAP_FIXED_NOREPLACE for elf
executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
load_elf_interp.

Unfortunately, this will cause kernel to fail to start with:

    1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
    Failed to execute /init (error -17)

The reason is that the elf interpreter (ld.so) has overlapping segments.

  readelf -l ld-2.31.so
  Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                   FileSiz            MemSiz              Flags  Align
    LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                   0x000000000002c94c 0x000000000002c94c  R E    0x10000
    LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                   0x00000000000021e8 0x0000000000002320  RW     0x10000
    LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                   0x00000000000011ac 0x0000000000001328  RW     0x10000

The reason for this problem is the same as described in commit
ad55eac74f ("elf: enforce MAP_FIXED on overlaying elf segments").

Not only executable binaries, elf interpreters (e.g. ld.so) can have
overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
back to MAP_FIXED in load_elf_interp.

Fixes: 4ed2863951 ("fs, elf: drop MAP_FIXED usage from elf_map")
Cc: <stable@vger.kernel.org> # v4.19
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-03 14:02:58 -07:00
Linus Torvalds
ca3cef466f Fix a number of ext4 bugs in fast_commit, inline data, and delayed
allocation.  Also fix error handling code paths in ext4_dx_readdir()
 and ext4_fill_super().  Finally, avoid a grabbing a journal head in
 the delayed allocation write in the common cases where we are
 overwriting an pre-existing block or appending to an inode.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmFZ2SsACgkQ8vlZVpUN
 gaN6DAgAkIeisL1EfQT0VwshEs8y7N6IoX8dydLSRLpNf5oWiJOv2CTY9Qpi6X/C
 qNfuLsbJ2NXChvhIAM2hD82hvX21rYc6iqPxgho02VF4eYIP7NzLjwTFKnKbHPB5
 TiF498nJTnkcmSrJUEXmSAEdLoCwa5THH9+9HVHXZrkLXPULBtOOJ85mDAcIzVhV
 Zqb7yfbpWl0gnF0S0YjNATPtbhcC9EiC4MOVYVesRlgT9B3+k5q4fmVU0euTU9OH
 F2H6TNG+Mg/19gTnDP5acB9+eXHvYEqMpe+CaDifR9iFE9PTG/Edhxr6z9roXhHr
 kBvEVHSFH+YTEJXghnpS9YDd9Lwc9w==
 =WKzd
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
  allocation.

  Also fix error handling code paths in ext4_dx_readdir() and
  ext4_fill_super().

  Finally, avoid a grabbing a journal head in the delayed allocation
  write in the common cases where we are overwriting a pre-existing
  block or appending to an inode"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: recheck buffer uptodate bit under buffer lock
  ext4: fix potential infinite loop in ext4_dx_readdir()
  ext4: flush s_error_work before journal destroy in ext4_fill_super
  ext4: fix loff_t overflow in ext4_max_bitmap_size()
  ext4: fix reserved space counter leakage
  ext4: limit the number of blocks in one ADD_RANGE TLV
  ext4: enforce buffer head state assertion in ext4_da_map_blocks
  ext4: remove extent cache entries when truncating inline data
  ext4: drop unnecessary journal handle in delalloc write
  ext4: factor out write end code of inline file
  ext4: correct the error path of ext4_write_inline_data_end()
  ext4: check and update i_disksize properly
  ext4: add error checking to ext4_ext_replay_set_iblocks()
2021-10-03 13:56:53 -07:00
Linus Torvalds
84928ce3bb Driver core fixes for 5.15-rc4
Here are some driver core and kernfs fixes for reported issues for
 5.15-rc4.  These fixes include:
 	- kernfs positive dentry bugfix
 	- debugfs_create_file_size error path fix
 	- cpumask sysfs file bugfix to preserve the user/kernel abi (has
 	  been reported multiple times.)
 	- devlink fixes for mdiobus devices as reported by the subsystem
 	  maintainers.
 
 Also included in here are some devlink debugging changes to make it
 easier for people to report problems when asked.  They have already
 helped with the mdiobus and other subsystems reporting issues.
 
 All of these have been linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYVmC1A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykfCQCgtQ0kdPoPdUG6mPCn45rrbJQLBY4AnRL5JyhO
 zB60l6C2EHHnLnRxMnQq
 =gygB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core and kernfs fixes for reported issues for
  5.15-rc4. These fixes include:

   - kernfs positive dentry bugfix

   - debugfs_create_file_size error path fix

   - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
     reported multiple times.)

   - devlink fixes for mdiobus devices as reported by the subsystem
     maintainers.

  Also included in here are some devlink debugging changes to make it
  easier for people to report problems when asked. They have already
  helped with the mdiobus and other subsystems reporting issues.

  All of these have been linux-next for a while with no reported issues"

* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: also call kernfs_set_rev() for positive dentry
  driver core: Add debug logs when fwnode links are added/deleted
  driver core: Create __fwnode_link_del() helper function
  driver core: Set deferred probe reason when deferred by driver core
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  driver core: fw_devlink: Improve handling of cyclic dependencies
  cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error
2021-10-03 11:10:09 -07:00
Linus Torvalds
e25ca045c3 Eleven fixes for the ksmbd kernel server, including an important fix disabling weak NTLMv1 authentication, and seven security (improved buffer overflow checks) fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmFXwIYACgkQiiy9cAdy
 T1EWfQv/fSSoymQpFZrnzd0ELS7J14IvPbjpL5wWLWUdtrHLIk5Fcg1rxNXDZVsY
 o932TGAo/X3qEgMXWbD812Q4MoB+Rupj0NmHReLL+UxwrgCHexFVnzr0SH0YQfWA
 59xa+2BVzInqnejika1H4HewJqGKt6npGiAg0Rzx+nJiFlX0CAPupW8oC90UM5Co
 3vJNG4orZILlGLRIdMpSashW8Z5dbXY95k/VqF/vYqHgfy37L1m+pDCjRjEbXtFY
 fuqFGeAcsnRWnu6ECvuujTyh+hQMSdwb/5F6uovVUrdChdvbfWi+rjtHDx0HpD2t
 UKMnRQPk/BWD1/6zFriObCt4QDpufZdvLlVNyir4BdIT2OhzkwkZ4qXz/du+IWKm
 4/4nYEaD2lFN4pEAy73NGGt9eJrAjbnaswNDPTZIDpJ7IyZiakDFjOD8iLFncBS7
 xL6hUcMvc4njaqxcB9LHFZ8w67cjwR5aw0+wr8DKfCh13lJgSvxoXEP/D+4fxINv
 QULcIhF/
 =X733
 -----END PGP SIGNATURE-----

Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:
 "Eleven fixes for the ksmbd kernel server, mostly security related:

   - an important fix for disabling weak NTLMv1 authentication

   - seven security (improved buffer overflow checks) fixes

   - fix for wrong infolevel struct used in some getattr/setattr paths

   - two small documentation fixes"

* tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: missing check for NULL in convert_to_nt_pathname()
  ksmbd: fix transform header validation
  ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
  ksmbd: add validation in smb2 negotiate
  ksmbd: add request buffer validation in smb2_set_info
  ksmbd: use correct basic info level in set_file_basic_info()
  ksmbd: remove NTLMv1 authentication
  ksmbd: fix documentation for 2 functions
  MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
  ksmbd: fix invalid request buffer access in compound
  ksmbd: remove RFC1002 check in smb2 request
2021-10-02 17:43:54 -07:00
Linus Torvalds
65893b49d8 io_uring-5.15-2021-10-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFXvEwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjALD/4/SZyJD+VcPM5Zx7kp2JMLm5GM6ZFnRTpI
 ZZoRsWnKZ39AAXS4GvGWGb+lEqI7LkLUy0SeUWbK2G3TIdKDU1Sza15knie3uzQ0
 Np/No3kv/2vl49gY64A/IVhcqDaHCVXZjRLe2cIK6kULmEH5AwNuTMvYS84g0+38
 XCDaWqzGbwTvtJ1RBjxlvXwl9EpmLwY+fr57KOxRgjIvfe68vZcSQBKPVJro3uUS
 2toSfiZuRIBk8pIwZ+WCnEv0zbvicjaQVFL6kGpANO4zMsBfkgjeB2Bz+sS8vtfB
 pQ1RR24cl3vtoGwYOLOBeVMxyaekkBxruXYmOogRNLq0Uq+EHJn2Wu0ifvk5dw9u
 yMXFUEC2TPPV4ah8lFRkVmU6bVeOVmaTn/Yp2nQRR830PkZVNST4vcXeJTNpHa2k
 b2uKPCJlkQzpq1ea29FbJ6RN/IrTOcTog+RWKXF4A3PMEkdUPP64a/lFtmMbIu9S
 mg00P1d3qA6rmrOX+Igw6zbtxyEn1pVEjexFfi49dQMM1wphpQPOHq3EgcQi/UDM
 SbKo6f6RsYDhlte8YVz70KSjW1HwTbepm4z6Zuts7Lbbbu1LUJrO9yf/LMEIJxaH
 no2fv82BPTSHUqWh0aqiVW+2DthaX3oZtQe6HpK2+SpV+Rre/RktnEFTN2JuSNYI
 DzCNRAJa+A==
 =qP3G
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes in here:

   - The signal issue that was discussed start of this week (me).

   - Kill dead fasync support in io_uring. Looks like it was broken
     since io_uring was initially merged, and given that nobody has ever
     complained about it, let's just kill it (Pavel)"

* tag 'io_uring-5.15-2021-10-01' of git://git.kernel.dk/linux-block:
  io_uring: kill fasync
  io-wq: exclusively gate signal based exit on get_signal() return
2021-10-02 10:26:19 -07:00
Pavel Begunkov
3f008385d4 io_uring: kill fasync
We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-01 11:16:02 -06:00
Zhang Yi
f2c7797350 ext4: recheck buffer uptodate bit under buffer lock
Commit 8e33fadf94 ("ext4: remove an unnecessary if statement in
__ext4_get_inode_loc()") forget to recheck buffer's uptodate bit again
under buffer lock, which may overwrite the buffer if someone else have
already brought it uptodate and changed it.

Fixes: 8e33fadf94 ("ext4: remove an unnecessary if statement in __ext4_get_inode_loc()")
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210910080316.70421-1-yi.zhang@huawei.com
2021-10-01 00:10:28 -04:00
yangerkun
42cb447410 ext4: fix potential infinite loop in ext4_dx_readdir()
When ext4_htree_fill_tree() fails, ext4_dx_readdir() can run into an
infinite loop since if info->last_pos != ctx->pos this will reset the
directory scan and reread the failing entry.  For example:

1. a dx_dir which has 3 block, block 0 as dx_root block, block 1/2 as
   leaf block which own the ext4_dir_entry_2
2. block 1 read ok and call_filldir which will fill the dirent and update
   the ctx->pos
3. block 2 read fail, but we has already fill some dirent, so we will
   return back to userspace will a positive return val(see ksys_getdents64)
4. the second ext4_dx_readdir will reset the world since info->last_pos
   != ctx->pos, and will also init the curr_hash which pos to block 1
5. So we will read block1 too, and once block2 still read fail, we can
   only fill one dirent because the hash of the entry in block1(besides
   the last one) won't greater than curr_hash
6. this time, we forget update last_pos too since the read for block2
   will fail, and since we has got the one entry, ksys_getdents64 can
   return success
7. Latter we will trapped in a loop with step 4~6

Cc: stable@kernel.org
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210914111415.3921954-1-yangerkun@huawei.com
2021-10-01 00:05:09 -04:00
yangerkun
bb9464e083 ext4: flush s_error_work before journal destroy in ext4_fill_super
The error path in ext4_fill_super forget to flush s_error_work before
journal destroy, and it may trigger the follow bug since
flush_stashed_error_work can run concurrently with journal destroy
without any protection for sbi->s_journal.

[32031.740193] EXT4-fs (loop66): get root inode failed
[32031.740484] EXT4-fs (loop66): mount failed
[32031.759805] ------------[ cut here ]------------
[32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
[32031.760075] invalid opcode: 0000 [#1] SMP PTI
[32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded
4.18.0
[32031.765112] Call Trace:
[32031.765375]  ? __switch_to_asm+0x35/0x70
[32031.765635]  ? __switch_to_asm+0x41/0x70
[32031.765893]  ? __switch_to_asm+0x35/0x70
[32031.766148]  ? __switch_to_asm+0x41/0x70
[32031.766405]  ? _cond_resched+0x15/0x40
[32031.766665]  jbd2__journal_start+0xf1/0x1f0 [jbd2]
[32031.766934]  jbd2_journal_start+0x19/0x20 [jbd2]
[32031.767218]  flush_stashed_error_work+0x30/0x90 [ext4]
[32031.767487]  process_one_work+0x195/0x390
[32031.767747]  worker_thread+0x30/0x390
[32031.768007]  ? process_one_work+0x390/0x390
[32031.768265]  kthread+0x10d/0x130
[32031.768521]  ? kthread_flush_work_fn+0x10/0x10
[32031.768778]  ret_from_fork+0x35/0x40

static int start_this_handle(...)
    BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this

Besides, after we enable fast commit, ext4_fc_replay can add work to
s_error_work but return success, so the latter journal destroy in
ext4_load_journal can trigger this problem too.

Fix this problem with two steps:
1. Call ext4_commit_super directly in ext4_handle_error for the case
   that called from ext4_fc_replay
2. Since it's hard to pair the init and flush for s_error_work, we'd
   better add a extras flush_work before journal destroy in
   ext4_fill_super

Besides, this patch will call ext4_commit_super in ext4_handle_error for
any nojournal case too. But it seems safe since the reason we call
schedule_work was that we should save error info to sb through journal
if available. Conversely, for the nojournal case, it seems useless delay
commit superblock to s_error_work.

Fixes: c92dc85684 ("ext4: defer saving error info from atomic context")
Fixes: 2d01ddc866 ("ext4: save error info to sb through journal if available")
Cc: stable@kernel.org
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210924093917.1953239-1-yangerkun@huawei.com
2021-10-01 00:04:01 -04:00
Ritesh Harjani
75ca6ad408 ext4: fix loff_t overflow in ext4_max_bitmap_size()
We should use unsigned long long rather than loff_t to avoid
overflow in ext4_max_bitmap_size() for comparison before returning.
w/o this patch sbi->s_bitmap_maxbytes was becoming a negative
value due to overflow of upper_limit (with has_huge_files as true)

Below is a quick test to trigger it on a 64KB pagesize system.

sudo mkfs.ext4 -b 65536 -O ^has_extents,^64bit /dev/loop2
sudo mount /dev/loop2 /mnt
sudo echo "hello" > /mnt/hello 	-> This will error out with
				"echo: write error: File too large"

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/594f409e2c543e90fd836b78188dfa5c575065ba.1622867594.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-10-01 00:03:51 -04:00
Jeffle Xu
6fed83957f ext4: fix reserved space counter leakage
When ext4_insert_delayed block receives and recovers from an error from
ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
space it has reserved for that block insertion as it should. One effect
of this bug is that s_dirtyclusters_counter is not decremented and
remains incorrectly elevated until the file system has been unmounted.
This can result in premature ENOSPC returns and apparent loss of free
space.

Another effect of this bug is that
/sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
after syncfs has been executed on the filesystem.

Besides, add check for s_dirtyclusters_counter when inode is going to be
evicted and freed. s_dirtyclusters_counter can still keep non-zero until
inode is written back in .evict_inode(), and thus the check is delayed
to .destroy_inode().

Fixes: 51865fda28 ("ext4: let ext4 maintain extent status tree")
Cc: stable@kernel.org
Suggested-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210823061358.84473-1-jefflexu@linux.alibaba.com
2021-10-01 00:03:41 -04:00
Hou Tao
a2c2f0826e ext4: limit the number of blocks in one ADD_RANGE TLV
Now EXT4_FC_TAG_ADD_RANGE uses ext4_extent to track the
newly-added blocks, but the limit on the max value of
ee_len field is ignored, and it can lead to BUG_ON as
shown below when running command "fallocate -l 128M file"
on a fast_commit-enabled fs:

  kernel BUG at fs/ext4/ext4_extents.h:199!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 3 PID: 624 Comm: fallocate Not tainted 5.14.0-rc6+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: 0010:ext4_fc_write_inode_data+0x1f3/0x200
  Call Trace:
   ? ext4_fc_write_inode+0xf2/0x150
   ext4_fc_commit+0x93b/0xa00
   ? ext4_fallocate+0x1ad/0x10d0
   ext4_sync_file+0x157/0x340
   ? ext4_sync_file+0x157/0x340
   vfs_fsync_range+0x49/0x80
   do_fsync+0x3d/0x70
   __x64_sys_fsync+0x14/0x20
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Simply fixing it by limiting the number of blocks
in one EXT4_FC_TAG_ADD_RANGE TLV.

Fixes: aa75f4d3da ("ext4: main fast-commit commit path")
Cc: stable@kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210820044505.474318-1-houtao1@huawei.com
2021-10-01 00:03:25 -04:00
Dan Carpenter
87ffb310d5 ksmbd: missing check for NULL in convert_to_nt_pathname()
The kmalloc() does not have a NULL check.  This code can be re-written
slightly cleaner to just use the kstrdup().

Fixes: 265fd1991c ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 20:00:05 -05:00
Namjae Jeon
4227f811cd ksmbd: fix transform header validation
Validate that the transform and smb request headers are present
before checking OriginalMessageSize and SessionId fields.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 09:58:07 -05:00
Hyunchul Lee
8f77150c15 ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
Add buffer validation for SMB2_CREATE_CONTEXT.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 09:58:07 -05:00
Namjae Jeon
442ff9ebeb ksmbd: add validation in smb2 negotiate
This patch add validation to check request buffer check in smb2
negotiate and fix null pointer deferencing oops in smb3_preauth_hash_rsp()
that found from manual test.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Hyunchul Lee <hyc.lee@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 09:58:07 -05:00
Namjae Jeon
9496e268e3 ksmbd: add request buffer validation in smb2_set_info
Add buffer validation in smb2_set_info, and remove unused variable
in set_file_basic_info. and smb2_set_info infolevel functions take
structure pointer argument.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 09:58:06 -05:00
Namjae Jeon
88d300522c ksmbd: use correct basic info level in set_file_basic_info()
Use correct basic info level in set/get_file_basic_info().

Reviewed-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-30 09:58:06 -05:00
Namjae Jeon
ce812992f2 ksmbd: remove NTLMv1 authentication
Remove insecure NTLMv1 authentication.

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Acked-by: Steve French <smfrench@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-29 16:17:34 -05:00
Enzo Matsumiya
1018bf2455 ksmbd: fix documentation for 2 functions
ksmbd_kthread_fn() and create_socket() returns 0 or error code, and not
task_struct/ERR_PTR.

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-28 20:51:32 -05:00
Hou Tao
df38d852c6 kernfs: also call kernfs_set_rev() for positive dentry
A KMSAN warning is reported by Alexander Potapenko:

BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
fs/kernfs/dir.c:1053
 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
 d_revalidate fs/namei.c:854
 lookup_dcache fs/namei.c:1522
 __lookup_hash+0x3a6/0x590 fs/namei.c:1543
 filename_create+0x312/0x7c0 fs/namei.c:3657
 do_mkdirat+0x103/0x930 fs/namei.c:3900
 __do_sys_mkdir fs/namei.c:3931
 __se_sys_mkdir fs/namei.c:3929
 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
 do_syscall_x64 arch/x86/entry/common.c:51

It seems a positive dentry in kernfs becomes a negative dentry directly
through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
when accessing it in kernfs_dop_revalidate(), because it is only
initialized when created as negative dentry in kernfs_iop_lookup().

The problem can be reproduced by the following command:

  cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi

A simple fixes seems to be initializing d->d_time for positive dentry
in kernfs_iop_lookup() as well. The downside is the negative dentry
will be revalidated again after it becomes negative in d_delete(),
because the revison of its parent must have been increased due to
its removal.

Alternative solution is implement .d_iput for kernfs, and assign d_time
for the newly-generated negative dentry in it. But we may need to
take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
on the parent directory, it is a little over-killing. Now the simple
fix is chosen.

Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
Fixes: c7e7c04274 ("kernfs: use VFS negative dentry caching")
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20210928140750.1274441-1-houtao1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28 18:18:15 +02:00
Linus Torvalds
6fd3ec5c7a fsverity fix for 5.15-rc4
Fix an integer overflow when computing the Merkle tree layout of
 extremely large files, exposed by btrfs adding support for fs-verity.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYVKxQBQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0q7AQCRVYl9e6gOPduntU6zNfYxYiJAmGRQ
 9jekhtPwFnuhLgEAnFxW3B51bG5c+Yv3xBBbDRpflk+39gd39eUOqRtlPQ4=
 =tPA9
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fsverity fix from Eric Biggers:
 "Fix an integer overflow when computing the Merkle tree layout of
  extremely large files, exposed by btrfs adding support for fs-verity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fs-verity: fix signed integer overflow with i_size near S64_MAX
2021-09-28 07:53:53 -07:00
Linus Torvalds
9b3b353ef3 vboxfs: fix broken legacy mount signature checking
Commit 9d682ea6bc ("vboxsf: Fix the check for the old binary
mount-arguments struct") was meant to fix a build error due to sign
mismatch in 'char' and the use of character constants, but it just moved
the error elsewhere, in that on some architectures characters and signed
and on others they are unsigned, and that's just how the C standard
works.

The proper fix is a simple "don't do that then".  The code was just
being silly and odd, and it should never have cared about signed vs
unsigned characters in the first place, since what it is testing is not
four "characters", but four bytes.

And the way to compare four bytes is by using "memcmp()".

Which compilers will know to just turn into a single 32-bit compare with
a constant, as long as you don't have crazy debug options enabled.

Link: https://lore.kernel.org/lkml/20210927094123.576521-1-arnd@kernel.org/
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-27 11:26:21 -07:00
Jens Axboe
78f8876c2d io-wq: exclusively gate signal based exit on get_signal() return
io-wq threads block all signals, except SIGKILL and SIGSTOP. We should not
need any extra checking of signal_pending or fatal_signal_pending, rely
exclusively on whether or not get_signal() tells us to exit.

The original debugging of this issue led to the false positive that we
were exiting on non-fatal signals, but that is not the case. The issue
was around races with nr_workers accounting.

Fixes: 87c1696655 ("io-wq: ensure we exit if thread group is exiting")
Fixes: 15e20db2e0 ("io-wq: only exit on fatal signals")
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-27 11:03:43 -06:00
Namjae Jeon
d72a9c1588 ksmbd: fix invalid request buffer access in compound
Ronnie reported invalid request buffer access in chained command when
inserting garbage value to NextCommand of compound request.
This patch add validation check to avoid this issue.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Tested-by: Steve French <smfrench@gmail.com>
Reviewed-by: Steve French <smfrench@gmail.com>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-26 16:47:14 -05:00
Ronnie Sahlberg
18d46769d5 ksmbd: remove RFC1002 check in smb2 request
In smb_common.c you have this function :   ksmbd_smb_request() which
is called from connection.c once you have read the initial 4 bytes for
the next length+smb2 blob.

It checks the first byte of this 4 byte preamble for valid values,
i.e. a NETBIOSoverTCP SESSION_MESSAGE or a SESSION_KEEP_ALIVE.

We don't need to check this for ksmbd since it only implements SMB2
over TCP port 445.
The netbios stuff was only used in very old servers when SMB ran over
TCP port 139.
Now that we run over TCP port 445, this is actually not a NB header anymore
and you can just treat it as a 4 byte length field that must be less
than 16Mbyte. and remove the references to the RFC1002 constants that no
longer applies.

Cc: Tom Talpey <tom@talpey.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Böhme <slow@samba.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-26 16:47:14 -05:00
Linus Torvalds
5e5d759763 Five fixes for the ksmbd kernel server, including three security fixes: removing follow symlinks support and converting to use LOOKUP_BENEATH to prevent out of share access, and a compounding security fix, also includes a fix for FILE_STREAM_INFORMATION fixing a bug when writing ppt or doc files from some clients
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmFPfPIACgkQiiy9cAdy
 T1FXSAv/Y/iJPj48F/rO2FX5aoUgK2All0dmQ2VOOJ96HuKOVH+9oTv60fbVoAjI
 16zOXRhMGQHjLNHWuJp+n4OOM3pgOyg3MaLReIaIVnHe7z8G5WpmMID+QfIJhFWM
 PfS+nr9S6sHwfspOIX4AwVzsYozj8vNWsUJKGLcG/d1u67ipemAOmiXMrD+4P+1Q
 sO0v0rVcuOy3tYMXYpDhOQBWxK2R8DeWCs+NTradntnWWsfHUUfuhj7G66AphjDh
 HzH9TuR+M5gOrLKpHDCWlE1u3MoTGURBXZC6xB4zHMZy+W5hM6PVNrO/89GdLD89
 nXa1ZxvNSwrJsc6JArExwKcz+sKM172ui/BY5iR7YEuchVUd8u3L0yeThx/2H7Ri
 9C/k9oNnql3hB8vX9yeqrt8Rr6G0IGJ/nlS+aBdbTKqB/t3cNGFImL3X0BeBsxjD
 umkeY92wVP0ve7hLSOPSQnNoWo4p+eN215tRRSCu+1DjAqJA1HyD+PqibzeWTAdK
 FT+CF/Ko
 =wDpw
 -----END PGP SIGNATURE-----

Merge tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
 "Five fixes for the ksmbd kernel server, including three security
  fixes:

   - remove follow symlinks support

   - use LOOKUP_BENEATH to prevent out of share access

   - SMB3 compounding security fix

   - fix for returning the default streams correctly, fixing a bug when
     writing ppt or doc files from some clients

   - logging more clearly that ksmbd is experimental (at module load
     time)"

* tag '5.15-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: use LOOKUP_BENEATH to prevent the out of share access
  ksmbd: remove follow symlinks support
  ksmbd: check protocol id in ksmbd_verify_smb_message()
  ksmbd: add default data stream name in FILE_STREAM_INFORMATION
  ksmbd: log that server is experimental at module load
2021-09-26 12:46:45 -07:00
Linus Torvalds
a3b397b4ff Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 patches.

  Subsystems affected by this patch series: xtensa, sh, ocfs2, scripts,
  lib, and mm (memory-failure, kasan, damon, shmem, tools, pagecache,
  debug, and pagemap)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: fix uninitialized use in overcommit_policy_handler
  mm/memory_failure: fix the missing pte_unmap() call
  kasan: always respect CONFIG_KASAN_STACK
  sh: pgtable-3level: fix cast to pointer from integer of different size
  mm/debug: sync up latest migrate_reason to migrate_reason_names
  mm/debug: sync up MR_CONTIG_RANGE and MR_LONGTERM_PIN
  mm: fs: invalidate bh_lrus for only cold path
  lib/zlib_inflate/inffast: check config in C to avoid unused function warning
  tools/vm/page-types: remove dependency on opt_file for idle page tracking
  scripts/sorttable: riscv: fix undeclared identifier 'EM_RISCV' error
  ocfs2: drop acl cache for directories too
  mm/shmem.c: fix judgment error in shmem_is_huge()
  xtensa: increase size of gcc stack frame check
  mm/damon: don't use strnlen() with known-bogus source length
  kasan: fix Kconfig check of CC_HAS_WORKING_NOSANITIZE_ADDRESS
  mm, hwpoison: add is_free_buddy_page() in HWPoisonHandlable()
2021-09-25 16:20:34 -07:00
Linus Torvalds
f6f360aef0 io_uring-5.15-2021-09-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFPKzYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplzrD/9eAeKdjpehV+o8HuLC9EvaBmp8jjaSEwJq
 6S1CrCuIeCY9C0BfyZTaRjdS9Oo6VwCfuRhzEOq99QqTx1kPS/2fGqhymKAzMCGy
 zPBLMkcUsKtRVAJEA7qGHJl3CaCn7NrHOb9cAM150/vTS96GYjuaFWGbBKsGbbXu
 ORSsaeVLcwFyGwWSf5+v+tkORoSiKEfwgUy+y5xwkiFxStcVHsLNxQ/1HtfEIjdl
 j3KBsW9eqM59lAOmv0YPGZ5gCN7VXelqaI/SmZxH4Y5XDDk86vtl0A7l3B5HSURX
 6njZakejtvclqv7p8eb+oB2EDqgsjgPZE9uJduLqtRcLqdE7koy096drzfMsACfu
 NKCnltY56iDEKWCOg1DL2O4D1/qpcZE9rj14WJtwxeauHQd7uVwTOHdOHw3OLz3X
 xpEasZ2IG3h6qzXrAuAeBIbpp0KIwc03p/vBQTOFqXTxqV+x2c8J3oM4QHeU6nU/
 +5clqOuCdKMxWMmBti7cTA9HHupWZnB47P7BHRRL4z/19ysH29Fc7xVLnaB2WPeG
 1WmvD0DpeDgLnUfzohSh+vTU4XtR4UEjHp8S4HfrzQM7xxmweQNX3Ph/KCPZOe8F
 T1vSFFBiae0IHx3aX+vGvoTmznrEJ279jNArcA693VB9utANjVzvwMtlmnzN8DtQ
 wxhF4oXqXw==
 =jnhx
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "This one looks a bit bigger than it is, but that's mainly because 2/3
  of it is enabling IORING_OP_CLOSE to close direct file descriptors.

  We've had a few folks using them and finding it confusing that the way
  to close them is through using -1 for file update, this just brings
  API symmetry for direct descriptors. Hence I think we should just do
  this now and have a better API for 5.15 release. There's some room for
  de-duplicating the close code, but we're leaving that for the next
  merge window.

  Outside of that, just small fixes:

   - Poll race fixes (Hao)

   - io-wq core dump exit fix (me)

   - Reschedule around potentially intensive tctx and buffer iterators
     on teardown (me)

   - Fix for always ending up punting files update to io-wq (me)

   - Put the provided buffer meta data under memcg accounting (me)

   - Tweak for io_write(), removing dead code that was added with the
     iterator changes in this release (Pavel)"

* tag 'io_uring-5.15-2021-09-25' of git://git.kernel.dk/linux-block:
  io_uring: make OP_CLOSE consistent with direct open
  io_uring: kill extra checks in io_write()
  io_uring: don't punt files update to io-wq unconditionally
  io_uring: put provided buffer meta data under memcg accounting
  io_uring: allow conditional reschedule for intensive iterators
  io_uring: fix potential req refcount underflow
  io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
  io_uring: fix race between poll completion and cancel_hash insertion
  io-wq: ensure we exit if thread group is exiting
2021-09-25 15:51:08 -07:00
Linus Torvalds
a5e0aceabe Changes since last update:
- fix the dangling pointer use in erofs_lookup tracepoint;
  - fix unsupported chunk format check;
  - zero out compacted_2b if compacted_4b_initial > totalidx.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYU9CBxEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBDgBAQDaj1NWjIleK4Q7hoerl++6MMhzEJrmpxSE
 EENs9NPuiQEAp7dN0T05a2J+Szp5xJeLYg67LoYbAnDmbmzGH/jQQg0=
 =6e/B
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Two bugfixes to fix the 4KiB blockmap chunk format availability and a
  dangling pointer usage. There is also a trivial cleanup to clarify
  compacted_2b if compacted_4b_initial > totalidx.

  Summary:

   - fix the dangling pointer use in erofs_lookup tracepoint

   - fix unsupported chunk format check

   - zero out compacted_2b if compacted_4b_initial > totalidx"

* tag 'erofs-for-5.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: clear compacted_2b if compacted_4b_initial > totalidx
  erofs: fix misbehavior of unsupported chunk format check
  erofs: fix up erofs_lookup tracepoint
2021-09-25 11:31:48 -07:00
Linus Torvalds
b8f4296560 Six small cifs/smb3 fixes, 2 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmFOOyoACgkQiiy9cAdy
 T1FntAv/YGILEUdiKs364//L15K9rInUKo/FPLAg9THBvG+vGJd6HXD30RZNQBRl
 030BPcA3AfchrPPTW2Lf3LyEHYHarRI3RE6E5o0++9XodxmjoukhG0ogUnevLZYl
 IyQ0VwDBb0BhzY9CQHPc1cqbCqeabGHjF8+aPs2bN1G4wspW4EdUoc6xxZM1xKzi
 XCbCSgRQKMA4qOsmExX5Jfl3nwjIuiBzcV5HAvn9F6WCdILH9/ltjwXunZBWhI5Z
 aLRrEpMwFyCrNr3B59XPGmGXe5mw0guQzCZDcgMDHGXIVbY/IRZPTIw0pDJqkwvo
 XfHljjba9FlYDNxc4fgY1J+ez7zRmS0eJzP2wZw9UkoF0Sb0nYmlZmGPnGADqIJ6
 YBH7rKz+L6cVVa+zFRMEWcNguOADzZNMo3xuHriXy5lzKQpgT6dwxN6JUNfhvnfq
 vHzm8tjaFFlQQ47cmTiW6QDiMcyQtJGvP36v8PvxEilH0KUGozZGrsG69aI+sFoB
 a9XjbLhT
 =SK4S
 -----END PGP SIGNATURE-----

Merge tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Six small cifs/smb3 fixes, two for stable:

   - important fix for deferred close (found by a git functional test)
     related to attribute caching on close.

   - four (two cosmetic, two more serious) small fixes for problems
     pointed out by smatch via Dan Carpenter

   - fix for comment formatting problems pointed out by W=1"

* tag '5.15-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix incorrect check for null pointer in header_assemble
  smb3: correct server pointer dereferencing check to be more consistent
  smb3: correct smb3 ACL security descriptor
  cifs: Clear modified attribute bit from inode flags
  cifs: Deal with some warnings from W=1
  cifs: fix a sign extension bug
2021-09-25 11:08:12 -07:00
Hyunchul Lee
265fd1991c ksmbd: use LOOKUP_BENEATH to prevent the out of share access
instead of removing '..' in a given path, call
kern_path with LOOKUP_BENEATH flag to prevent
the out of share access.

ran various test on this:
smb2-cat-async smb://127.0.0.1/homes/../out_of_share
smb2-cat-async smb://127.0.0.1/homes/foo/../../out_of_share
smbclient //127.0.0.1/homes -c "mkdir ../foo2"
smbclient //127.0.0.1/homes -c "rename bar ../bar"

Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Ralph Boehme <slow@samba.org>
Tested-by: Steve French <smfrench@gmail.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-24 21:25:23 -05:00
Minchan Kim
243418e392 mm: fs: invalidate bh_lrus for only cold path
The kernel test robot reported the regression of fio.write_iops[1] with
commit 8cc621d2f4 ("mm: fs: invalidate BH LRU during page migration").

Since lru_add_drain is called frequently, invalidate bh_lrus there could
increase bh_lrus cache miss ratio, which needs more IO in the end.

This patch moves the bh_lrus invalidation from the hot path( e.g.,
zap_page_range, pagevec_release) to cold path(i.e., lru_add_drain_all,
lru_cache_disable).

Zhengjun Xing confirmed
 "I test the patch, the regression reduced to -2.9%"

[1] https://lore.kernel.org/lkml/20210520083144.GD14190@xsang-OptiPlex-9020/
[2] 8cc621d2f4, mm: fs: invalidate BH LRU during page migration

Link: https://lkml.kernel.org/r/20210907212347.1977686-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Chris Goldsworthy <cgoldswo@codeaurora.org>
Tested-by: "Xing, Zhengjun" <zhengjun.xing@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24 16:13:35 -07:00
Wengang Wang
9c0f0a03e3 ocfs2: drop acl cache for directories too
ocfs2_data_convert_worker() is currently dropping any cached acl info
for FILE before down-converting meta lock.  It should also drop for
DIRECTORY.  Otherwise the second acl lookup returns the cached one (from
VFS layer) which could be already stale.

The problem we are seeing is that the acl changes on one node doesn't
get refreshed on other nodes in the following case:

  Node 1                    Node 2
  --------------            ----------------
  getfacl dir1

                            getfacl dir1    <-- this is OK

  setfacl -m u:user1:rwX dir1
  getfacl dir1   <-- see the change for user1

                            getfacl dir1    <-- can't see change for user1

Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-24 16:13:34 -07:00
Pavel Begunkov
7df778be2f io_uring: make OP_CLOSE consistent with direct open
From recently open/accept are now able to manipulate fixed file table,
but it's inconsistent that close can't. Close the gap, keep API same as
with open/accept, i.e. via sqe->file_slot.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 14:07:54 -06:00
Linus Torvalds
4c4f0c2bf3 A fix for a potential array out of bounds access from Dan.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmFNudoTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi93DCACjNadeFDipw3oAkzdAo+wo0MSYgByU
 4eu3XN77yTHphc+xoCU/9cCeSkthfNc2XZDktb22lbAX2QxCgvQsWFgck+i0d245
 9uQ68IycrUza9PGjLL3okZiLGzqsk97ZDt7vqXT51zN6dgEATVJ5YaXIhVIwAIM0
 F6VVcoHruoOPLhPXctQUZusWS+XPzPvU34n2sNodREv9mYeoFmaZJ15wB6UZn2Ps
 qL6Usoq15EH5cbW60XbqVWFDAhWmCN/6HG3b6w3WtD+lVb2NWTFJOAZGxMVvNo46
 1TacgQUVs7iIh+bol3modOT94yVZ4Cvrb0uUMY/oE4AgjVPGxSjHSeMB
 =7D7F
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a potential array out of bounds access from Dan"

* tag 'ceph-for-5.15-rc3' of git://github.com/ceph/ceph-client:
  ceph: fix off by one bugs in unsafe_request_wait()
2021-09-24 10:28:18 -07:00
Linus Torvalds
e655c81ade \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmFNp2UACgkQnJ2qBz9k
 QNlubwf/Zv5XJccDBGxn0pB7ew1fN4HowTbWdaS0ELDuLZ2KHhZgEbtUu0V2oZ7I
 pkUMO97llPk0KHWWjcooIaBMGbBQ78Hqq3xFWWboxEu5hMhJyN1cR2uJlrELvxp1
 HsKKaREaUl8jHNQyIuREl/SqLaHmW4LWgrVCZKUTBEc4BeRz2E0C4LTymAZEpTXt
 UCRwAU8itK8Z9/Da1xJ6b04/ZMamgoc0a8Rzq3YwCIcUyFLJATB1/XHR6j0c6B6x
 gH/y7m/y1m3BFsMbMwGNMwiyJWcBXy2RB4I3B4HB3U6ifBL3ebFQlPQIu5PzRbiG
 bGaWRx5o8c5mY/eHpFV4kZS30HM25A==
 =Wep3
 -----END PGP SIGNATURE-----

Merge tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull misc filesystem fixes from Jan Kara:
 "A for ext2 sleep in atomic context in case of some fs problems and a
  cleanup of an invalidate_lock initialization"

* tag 'fixes_for_v5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: fix sleeping in atomic bugs on error
  mm: Fully initialize invalidate_lock, amend lock class later
2021-09-24 10:22:35 -07:00
Pavel Begunkov
9f3a2cb228 io_uring: kill extra checks in io_write()
We don't retry short writes and so we would never get to async setup in
io_write() in that case. Thus ret2 > 0 is always false and
iov_iter_advance() is never used. Apparently, the same is found by
Coverity, which complains on the code.

Fixes: cd65869512 ("io_uring: use iov_iter state save/restore helpers")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:26:11 -06:00
Jens Axboe
cdb31c29d3 io_uring: don't punt files update to io-wq unconditionally
There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.

Fixes: 05f3fb3c53 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Jens Axboe
9990da93d2 io_uring: put provided buffer meta data under memcg accounting
For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.

Fixes: ddf0322db7 ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Jens Axboe
8bab4c09f2 io_uring: allow conditional reschedule for intensive iterators
If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.

Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Hao Xu
5b7aa38d86 io_uring: fix potential req refcount underflow
For multishot mode, there may be cases like:

iowq                                 original context
io_poll_add
  _arm_poll()
  mask = vfs_poll() is not 0
  if mask
(2)  io_poll_complete()
  compl_unlock
   (interruption happens
    tw queued to original
    context)
                                     io_poll_task_func()
                                     compl_lock
                                 (3) done = io_poll_complete() is true
                                     compl_unlock
                                     put req ref
(1) if (poll->flags & EPOLLONESHOT)
      put req ref

EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple
combinations that can cause ref underfow.
Let's address it by:
- check the return value in (2) as done
- change (1) to if (done)
    in this way, we only do ref put in (1) if 'oneshot flag' is from
    (2)
- do poll.done check in io_poll_task_func(), so that we won't put ref
  for the second time.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Hao Xu
a62682f92e io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.

Fixes: 5082620fb2 ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Hao Xu
bd99c71bd1 io_uring: fix race between poll completion and cancel_hash insertion
If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:

  iowq                                      original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Fixes: 5082620fb2 ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Jens Axboe
87c1696655 io-wq: ensure we exit if thread group is exiting
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and
identified the culprit in the Fixes line below. The problem is that
relying solely on fatal_signal_pending() to gate whether to exit or not
fails miserably if a process gets eg SIGILL sent. Don't exclusively
rely on fatal signals, also check if the thread group is exiting.

Fixes: 15e20db2e0 ("io-wq: only exit on fatal signals")
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24 10:24:34 -06:00
Steve French
9ed38fd4a1 cifs: fix incorrect check for null pointer in header_assemble
Although very unlikely that the tlink pointer would be null in this case,
get_next_mid function can in theory return null (but not an error)
so need to check for null (not for IS_ERR, which can not be returned
here).

Address warning:

        fs/smbfs_client/connect.c:2392 cifs_match_super()
        warn: 'tlink' isn't an ERR_PTR

Pointed out by Dan Carpenter via smatch code analysis tool

CC: stable@vger.kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23 21:12:53 -05:00
Steve French
1db1aa9887 smb3: correct server pointer dereferencing check to be more consistent
Address warning:

    fs/smbfs_client/misc.c:273 header_assemble()
    warn: variable dereferenced before check 'treeCon->ses->server'

Pointed out by Dan Carpenter via smatch code analysis tool

Although the check is likely unneeded, adding it makes the code
more consistent and easier to read, as the same check is
done elsewhere in the function.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23 21:12:23 -05:00
Linus Torvalds
f9e36107ec for-5.15-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmFM79wACgkQxWXV+ddt
 WDtdZQ/+K7NNEutg4JEH7n2KiXxwj8P23NwVK66a+XwH6/ZBe9xz5TQpnTJQ+D13
 +3mhthTJG7Wbcv+FlUVbfTSp5q8m2IH7CKox43o4JCZEEGtFfRPHrBGHLlGKMk3P
 ap2TZ3rvo0Sb21rx978HCQY824wdJvhv0SmWSScmvWzlTQEKaJHz1OFJgFxhUsMp
 Cy9y7mtIy+Ei4qJglU88iFNXhNL6YvwqXxDFY5LwN9rlCaV+rLk476aPfIBvvyf8
 4f34FHJOe1w9Jlk3KfydIwWefRBbq2dm0zNqNrMHNjl8zXbvfn8+ETOvf54HbjIz
 GGgKiZlBgNh2Na+p0SLoloEvBUdD5lSUCXis8099oUWZ+MporIwsyy4jAvtAeWR/
 QxBkZyxvTNFlXLamSo6oS58K9BNuxFYO7nLGSXQFEoYvb8/fu18rRt/A/rmNS8TU
 2vxpYacNKbggoULiGDzB74JY7MHdHRcMcAhmfDeG1bvNESPHfyLnpHfWamBVoUO6
 0eQOr78f1UpBlqJAGAGtfBefN1kMDnORyX0npGkGLFrKYiZbMgsxdjkNhiHnsufl
 9gsNVJ6baCeB1d5qS2vpZXeOLw0ln5iYZa5Yqz0eh/yc/9Wlj/YCsKRuAbaPMR1i
 i2ppHo3/na4K6L0EgSi6SU3xaUT+4LLzEEcBlJJuWZEwUTeYwiM=
 =VJFC
 -----END PGP SIGNATURE-----

Merge tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - regression fix for leak of transaction handle after verity rollback
   failure

 - properly reset device last error between mounts

 - improve one error handling case when checksumming bios

 - fixup confusing displayed size of space info free space

* tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: prevent __btrfs_dump_space_info() to underflow its free space
  btrfs: fix mount failure due to past and transient device flush error
  btrfs: fix transaction handle leak after verity rollback failure
  btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
2021-09-23 14:39:41 -07:00
Steve French
b06d893ef2 smb3: correct smb3 ACL security descriptor
Address warning:

        fs/smbfs_client/smb2pdu.c:2425 create_sd_buf()
        warn: struct type mismatch 'smb3_acl vs cifs_acl'

Pointed out by Dan Carpenter via smatch code analysis tool

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23 16:17:07 -05:00
Steve French
4f22262280 cifs: Clear modified attribute bit from inode flags
Clear CIFS_INO_MODIFIED_ATTR bit from inode flags after
updating mtime and ctime

Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org # 5.13+
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23 16:16:19 -05:00