linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-15 07:04:44 +00:00

Author	SHA1	Message	Date
Dongliang Mu	24ab2d4ef5	ntfs: add sanity check on allocation size [ Upstream commit `714fbf2647` ] ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation size. It triggers one BUG in the __ntfs_malloc function. Fix this by adding sanity check on ni->attr_list_size. Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Rohith Surabattula	9dd6bb11df	Adjust cifssb maximum read size [ Upstream commit `06a466565d` ] When session gets reconnected during mount then read size in super block fs context gets set to zero and after negotiate, rsize is not modified which results in incorrect read with requested bytes as zero. Fixes intermittent failure of xfstest generic/240 Note that stable requires a different version of this patch which will be sent to the stable mailing list. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Chao Yu	1a55c48bba	f2fs: compress: fix to print raw data size in error path of lz4 decompression [ Upstream commit `d284af43f7` ] In lz4_decompress_pages(), if size of decompressed data is not equal to expected one, we should print the size rather than size of target buffer for decompressed data, fix it. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Jaegeuk Kim	2eff60346e	f2fs: use spin_lock to avoid hang [ Upstream commit `98237fcda4` ] [14696.634553] task:cat state:D stack: 0 pid:1613738 ppid:1613735 flags:0x00000004 [14696.638285] Call Trace: [14696.639038] <TASK> [14696.640032] __schedule+0x302/0x930 [14696.640969] schedule+0x58/0xd0 [14696.641799] schedule_preempt_disabled+0x18/0x30 [14696.642890] __mutex_lock.constprop.0+0x2fb/0x4f0 [14696.644035] ? mod_objcg_state+0x10c/0x310 [14696.645040] ? obj_cgroup_charge+0xe1/0x170 [14696.646067] __mutex_lock_slowpath+0x13/0x20 [14696.647126] mutex_lock+0x34/0x40 [14696.648070] stat_show+0x25/0x17c0 [f2fs] [14696.649218] seq_read_iter+0x120/0x4b0 [14696.650289] ? aa_file_perm+0x12a/0x500 [14696.651357] ? lru_cache_add+0x1c/0x20 [14696.652470] seq_read+0xfd/0x140 [14696.653445] full_proxy_read+0x5c/0x80 [14696.654535] vfs_read+0xa0/0x1a0 [14696.655497] ksys_read+0x67/0xe0 [14696.656502] __x64_sys_read+0x1a/0x20 [14696.657580] do_syscall_64+0x3b/0xc0 [14696.658671] entry_SYSCALL_64_after_hwframe+0x44/0xae [14696.660068] RIP: 0033:0x7efe39df1cb2 [14696.661133] RSP: 002b:00007ffc8badd948 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [14696.662958] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007efe39df1cb2 [14696.664757] RDX: 0000000000020000 RSI: 00007efe399df000 RDI: 0000000000000003 [14696.666542] RBP: 00007efe399df000 R08: 00007efe399de010 R09: 00007efe399de010 [14696.668363] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000 [14696.670155] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [14696.671965] </TASK> [14696.672826] task:umount state:D stack: 0 pid:1614985 ppid:1614984 flags:0x00004000 [14696.674930] Call Trace: [14696.675903] <TASK> [14696.676780] __schedule+0x302/0x930 [14696.677927] schedule+0x58/0xd0 [14696.679019] schedule_preempt_disabled+0x18/0x30 [14696.680412] __mutex_lock.constprop.0+0x2fb/0x4f0 [14696.681783] ? destroy_inode+0x65/0x80 [14696.683006] __mutex_lock_slowpath+0x13/0x20 [14696.684305] mutex_lock+0x34/0x40 [14696.685442] f2fs_destroy_stats+0x1e/0x60 [f2fs] [14696.686803] f2fs_put_super+0x158/0x390 [f2fs] [14696.688238] generic_shutdown_super+0x7a/0x120 [14696.689621] kill_block_super+0x27/0x50 [14696.690894] kill_f2fs_super+0x7f/0x100 [f2fs] [14696.692311] deactivate_locked_super+0x35/0xa0 [14696.693698] deactivate_super+0x40/0x50 [14696.694985] cleanup_mnt+0x139/0x190 [14696.696209] __cleanup_mnt+0x12/0x20 [14696.697390] task_work_run+0x64/0xa0 [14696.698587] exit_to_user_mode_prepare+0x1b7/0x1c0 [14696.700053] syscall_exit_to_user_mode+0x27/0x50 [14696.701418] do_syscall_64+0x48/0xc0 [14696.702630] entry_SYSCALL_64_after_hwframe+0x44/0xae Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Josef Bacik	c78bada18a	btrfs: make search_csum_tree return 0 if we get -EFBIG [ Upstream commit `03ddb19d2e` ] We can either fail to find a csum entry at all and return -ENOENT, or we can find a range that is close, but return -EFBIG. In essence these both mean the same thing when we are doing a lookup for a csum in an existing range, we didn't find a csum. We want to treat both of these errors the same way, complain loudly that there wasn't a csum. This currently happens anyway because we do count = search_csum_tree(); if (count <= 0) { // reloc and error handling } However it forces us to incorrectly treat EIO or ENOMEM errors as on disk corruption. Fix this by returning 0 if we get either -ENOENT or -EFBIG from btrfs_lookup_csum() so we can do proper error handling. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Anand Jain	40d006dfed	btrfs: harden identification of a stale device [ Upstream commit `770c79fb65` ] Identifying and removing the stale device from the fs_uuids list is done by btrfs_free_stale_devices(). btrfs_free_stale_devices() in turn depends on device_path_matched() to check if the device appears in more than one btrfs_device structure. The matching of the device happens by its path, the device path. However, when device mapper is in use, the dm device paths are nothing but a link to the actual block device, which leads to the device_path_matched() failing to match. Fix this by matching the dev_t as provided by lookup_bdev() instead of plain string compare of the device paths. Reported-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Jaegeuk Kim	58d3aa672d	f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs [ Upstream commit `ba900534f8` ] Let's purge inode cache in order to avoid the below deadlock. [freeze test] shrinkder freeze_super - pwercpu_down_write(SB_FREEZE_FS) - super_cache_scan - down_read(&sb->s_umount) - prune_icache_sb - dispose_list - evict - f2fs_evict_inode thaw_super - down_write(&sb->s_umount); - __percpu_down_read(SB_FREEZE_FS) Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:58 +02:00
Chuck Lever	7260793c13	NFSD: Fix nfsd_breaker_owns_lease() return values [ Upstream commit `50719bf344` ] These have been incorrect since the function was introduced. A proper kerneldoc comment is added since this function, though static, is part of an external interface. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:57 +02:00
Chao Yu	f68caedf26	f2fs: fix to do sanity check on curseg->alloc_type [ Upstream commit `f41ee8b91c` ] As Wenqing Liu reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215657 - Overview UBSAN: array-index-out-of-bounds in fs/f2fs/segment.c:3460:2 when mount and operate a corrupted image - Reproduce tested on kernel 5.17-rc4, 5.17-rc6 1. mkdir test_crash 2. cd test_crash 3. unzip tmp2.zip 4. mkdir mnt 5. ./single_test.sh f2fs 2 - Kernel dump [ 46.434454] loop0: detected capacity change from 0 to 131072 [ 46.529839] F2FS-fs (loop0): Mounted with checkpoint version = 7548c2d9 [ 46.738319] ================================================================================ [ 46.738412] UBSAN: array-index-out-of-bounds in fs/f2fs/segment.c:3460:2 [ 46.738475] index 231 is out of range for type 'unsigned int [2]' [ 46.738539] CPU: 2 PID: 939 Comm: umount Not tainted 5.17.0-rc6 #1 [ 46.738547] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 46.738551] Call Trace: [ 46.738556] <TASK> [ 46.738563] dump_stack_lvl+0x47/0x5c [ 46.738581] ubsan_epilogue+0x5/0x50 [ 46.738592] __ubsan_handle_out_of_bounds+0x68/0x80 [ 46.738604] f2fs_allocate_data_block+0xdff/0xe60 [f2fs] [ 46.738819] do_write_page+0xef/0x210 [f2fs] [ 46.738934] f2fs_do_write_node_page+0x3f/0x80 [f2fs] [ 46.739038] __write_node_page+0x2b7/0x920 [f2fs] [ 46.739162] f2fs_sync_node_pages+0x943/0xb00 [f2fs] [ 46.739293] f2fs_write_checkpoint+0x7bb/0x1030 [f2fs] [ 46.739405] kill_f2fs_super+0x125/0x150 [f2fs] [ 46.739507] deactivate_locked_super+0x60/0xc0 [ 46.739517] deactivate_super+0x70/0xb0 [ 46.739524] cleanup_mnt+0x11a/0x200 [ 46.739532] __cleanup_mnt+0x16/0x20 [ 46.739538] task_work_run+0x67/0xa0 [ 46.739547] exit_to_user_mode_prepare+0x18c/0x1a0 [ 46.739559] syscall_exit_to_user_mode+0x26/0x40 [ 46.739568] do_syscall_64+0x46/0xb0 [ 46.739584] entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is we missed to do sanity check on curseg->alloc_type, result in out-of-bound accessing on sbi->block_count[] array, fix it. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:57 +02:00
Theodore Ts'o	a0856764dc	ext4: don't BUG if someone dirty pages without asking ext4 first [ Upstream commit `cc5095747e` ] [un]pin_user_pages_remote is dirtying pages without properly warning the file system in advance. A related race was noted by Jan Kara in 2018[1]; however, more recently instead of it being a very hard-to-hit race, it could be reliably triggered by process_vm_writev(2) which was discovered by Syzbot[2]. This is technically a bug in mm/gup.c, but arguably ext4 is fragile in that if some other kernel subsystem dirty pages without properly notifying the file system using page_mkwrite(), ext4 will BUG, while other file systems will not BUG (although data will still be lost). So instead of crashing with a BUG, issue a warning (since there may be potential data loss) and just mark the page as clean to avoid unprivileged denial of service attacks until the problem can be properly fixed. More discussion and background can be found in the thread starting at [2]. [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:57 +02:00
Ritesh Harjani	6a6beb0741	ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit [ Upstream commit `bfdc502a4a` ] In case of flex_bg feature (which is by default enabled), extents for any given inode might span across blocks from two different block group. ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the starting block group, but it fails to read it again when the extent length boundary overflows to another block group. Then in this below loop it accesses memory beyond the block group bitmap buffer_head and results into a data abort. for (i = 0; i < clen; i++) if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state) already++; This patch adds this functionality for checking block group boundary in ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different block group. w/o this patch, I was easily able to hit a data access abort using Power platform. <...> [ 74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters [ 74.533214] EXT4-fs (loop3): shut down requested (2) [ 74.536705] Aborting journal on device loop3-8. [ 74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000 [ 74.703727] Faulting instruction address: 0xc0000000007bffb8 cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060] pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0 lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0 sp: c000000015db7300 msr: 800000000280b033 dar: c00000005e980000 dsisr: 40000000 current = 0xc000000027af6880 paca = 0xc00000003ffd5200 irqmask: 0x03 irq_happened: 0x01 pid = 5167, comm = mount <...> enter ? for help [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410 [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000 [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0 [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0 [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0 [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10 [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350 [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40 [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100 [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70 [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550 [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0 [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250 Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/2609bc8f66fc15870616ee416a18a3d392a209c4.1644992609.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:57 +02:00
Ritesh Harjani	572d14e6ce	ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb [ Upstream commit `a5c0e2fdf7` ] ext4_mb_mark_bb() currently wrongly calculates cluster len (clen) and flex_group->free_clusters. This patch fixes that. Identified based on code review of ext4_mb_mark_bb() function. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/a0b035d536bafa88110b74456853774b64c8ac40.1644992609.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:57 +02:00
Akira Kawata	e0943c456b	fs/binfmt_elf: Fix AT_PHDR for unusual ELF files [ Upstream commit `0da1d50027` ] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197921 As pointed out in the discussion of buglink, we cannot calculate AT_PHDR as the sum of load_addr and exec->e_phoff. : The AT_PHDR of ELF auxiliary vectors should point to the memory address : of program header. But binfmt_elf.c calculates this address as follows: : : NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); : : which is wrong since e_phoff is the file offset of program header and : load_addr is the memory base address from PT_LOAD entry. : : The ld.so uses AT_PHDR as the memory address of program header. In normal : case, since the e_phoff is usually 64 and in the first PT_LOAD region, it : is the correct program header address. : : But if the address of program header isn't equal to the first PT_LOAD : address + e_phoff (e.g. Put the program header in other non-consecutive : PT_LOAD region), ld.so will try to read program header from wrong address : then crash or use incorrect program header. This is because exec->e_phoff is the offset of PHDRs in the file and the address of PHDRs in the memory may differ from it. This patch fixes the bug by calculating the address of program headers from PT_LOADs directly. Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127124014.338760-2-akirakawata1@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:56 +02:00
Linus Torvalds	c331c9d1d2	fs: fix fd table size alignment properly [ Upstream commit `d888c83fce` ] Jason Donenfeld reports that my commit `1c24a18639` ("fs: fd tables have to be multiples of BITS_PER_LONG") doesn't work, and the reason is an embarrassing brown-paper-bag bug. Yes, we want to align the number of fds to BITS_PER_LONG, and yes, the reason they might not be aligned is because the incoming 'max_fd' argument might not be aligned. But aligining the argument - while simple - will cause a "infinitely big" maxfd (eg NR_OPEN_MAX) to just overflow to zero. Which most definitely isn't what we want either. The obvious fix was always just to do the alignment last, but I had moved it earlier just to make the patch smaller and the code look simpler. Duh. It certainly made _me_ look simple. Fixes: `1c24a18639` ("fs: fd tables have to be multiples of BITS_PER_LONG") Reported-and-tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Fedor Pchelkin <aissur0002@gmail.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:54 +02:00
Linus Torvalds	136736abcd	fs: fd tables have to be multiples of BITS_PER_LONG [ Upstream commit `1c24a18639` ] This has always been the rule: fdtables have several bitmaps in them, and as a result they have to be sized properly for bitmaps. We walk those bitmaps in chunks of 'unsigned long' in serveral cases, but even when we don't, we use the regular kernel bitops that are defined to work on arrays of 'unsigned long', not on some byte array. Now, the distinction between arrays of bytes and 'unsigned long' normally only really ends up being noticeable on big-endian systems, but Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps() could be called with an argument that wasn't even a multiple of BITS_PER_BYTE. And then it fails to do the proper copy even on little-endian machines. The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which didn't actually sanitize the fdtable size sufficiently, and never made sure it had the proper BITS_PER_LONG alignment. That's partly because the alignment historically came not from having to explicitly align things, but simply from previous fdtable sizes, and from count_open_files(), which counts the file descriptors by walking them one 'unsigned long' word at a time and thus naturally ends up doing sizing in the proper 'chunks of unsigned long'. But with the introduction of close_range(), we now have an external source of "this is how many files we want to have", and so sane_fdtable_size() needs to do a better job. This also adds that explicit alignment to alloc_fdtable(), although there it is mainly just for documentation at a source code level. The arithmetic we do there to pick a reasonable fdtable size already aligns the result sufficiently. In fact,clang notices that the added ALIGN() in that function doesn't actually do anything, and does not generate any extra code for it. It turns out that gcc ends up confusing itself by combining a previous constant-sized shift operation with the variable-sized shift operations in roundup_pow_of_two(). And probably due to that doesn't notice that the ALIGN() is a no-op. But that's a (tiny) gcc misfeature that doesn't matter. Having the explicit alignment makes sense, and would actually matter on a 128-bit architecture if we ever go there. This also adds big comments above both functions about how fdtable sizes have to have that BITS_PER_LONG alignment. Fixes: `60997c3d45` ("close_range: add CLOSE_RANGE_UNSHARE") Reported-by: Fedor Pchelkin <aissur0002@gmail.com> Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@gmail.com/ Tested-and-acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:54 +02:00
Trond Myklebust	a738ff8143	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head [ Upstream commit `7c9d845f06` ] In nfs4_callback_devicenotify(), if we don't find a matching entry for the deviceid, we're left with a pointer to 'struct nfs_server' that actually points to the list of super blocks associated with our struct nfs_client. Furthermore, even if we have a valid pointer, nothing pins the super block, and so the struct nfs_server could end up getting freed while we're using it. Since all we want is a pointer to the struct pnfs_layoutdriver_type, let's skip all the iteration over super blocks, and just use APIs to find the layout driver directly. Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Fixes: `1be5683b03` ("pnfs: CB_NOTIFY_DEVICEID") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:54 +02:00
Trond Myklebust	c955782358	NFS: Don't loop forever in nfs_do_recoalesce() [ Upstream commit `d02d81efc7` ] If __nfs_pageio_add_request() fails to add the request, it will return with either desc->pg_error < 0, or mirror->pg_recoalesce will be set, so we are guaranteed either to exit the function altogether, or to loop. However if there is nothing left in mirror->pg_list to coalesce, we must exit, so make sure that we clear mirror->pg_recoalesce every time we loop. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: `70536bf4eb` ("NFS: Clean up reset of the mirror accounting variables") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:53 +02:00
Olga Kornievskaia	0445609a7a	NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error [ Upstream commit `1d15d121cc` ] There is no reason to retry the operation if a session error had occurred in such case result structure isn't filled out. Fixes: `dff58530c4` ("NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:51 +02:00
Pavel Skripkin	6bbfe9a715	jfs: fix divide error in dbNextAG [ Upstream commit `2cc7cc01c1` ] Syzbot reported divide error in dbNextAG(). The problem was in missing validation check for malicious image. Syzbot crafted an image with bmp->db_numag equal to 0. There wasn't any validation checks, but dbNextAG() blindly use bmp->db_numag in divide expression Fix it by validating bmp->db_numag in dbMount() and return an error if image is malicious Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+46f5c25af73eb8330eb6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:51 +02:00
Alexey Khoroshilov	75ee75cc36	NFS: remove unneeded check in decode_devicenotify_args() [ Upstream commit `cb8fac6d27` ] [You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] Overflow check in not needed anymore after we switch to kmalloc_array(). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: `a4f743a6bb` ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:48 +02:00
Trond Myklebust	18dc195712	NFS: Return valid errors from nfs2/3_decode_dirent() [ Upstream commit `64cfca85ba` ] Valid return values for decode_dirent() callback functions are: 0: Success -EBADCOOKIE: End of directory -EAGAIN: End of xdr_stream All errors need to map into one of those three values. Fixes: `573c4e1ef5` ("NFS: Simplify ->decode_dirent() calling sequence") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:46 +02:00
Trond Myklebust	ba3a3390c9	NFS: Use of mapping_set_error() results in spurious errors [ Upstream commit `6c984083ec` ] The use of mapping_set_error() in conjunction with calls to filemap_check_errors() is problematic because every error gets reported as either an EIO or an ENOSPC by filemap_check_errors() in functions such as filemap_write_and_wait() or filemap_write_and_wait_range(). In almost all cases, we prefer to use the more nuanced wb errors. Fixes: `b8946d7bfb` ("NFS: Revalidate the file mapping on all fatal writeback errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:46 +02:00
Zhang Yi	3813591bc0	ext2: correct max file size computing [ Upstream commit `50b3a81899` ] We need to calculate the max file size accurately if the total blocks that can address by block tree exceed the upper_limit. But this check is not correct now, it only compute the total data blocks but missing metadata blocks are needed. So in the case of "data blocks < upper_limit && total blocks > upper_limit", we will get wrong result. Fortunately, this case could not happen in reality, but it's confused and better to correct the computing. bits data blocks metadatablocks upper_limit 10 16843020 66051 2147483647 11 134480396 263171 1073741823 12 1074791436 1050627 536870911 () 13 8594130956 4198403 268435455 () 14 68736258060 16785411 134217727 () 15 549822930956 67125251 67108863 () 16 4398314962956 268468227 33554431 () [] Need to calculate in depth. Fixes: `1c2d14212b` ("ext2: Fix underflow in ext2_max_size()") Link: https://lore.kernel.org/r/20220212050532.179055-1-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:35 +02:00
Fengnan Chang	0f42a02e47	f2fs: fix compressed file start atomic write may cause data corruption [ Upstream commit `9b56adcf52` ] When compressed file has blocks, f2fs_ioc_start_atomic_write will succeed, but compressed flag will be remained in inode. If write partial compreseed cluster and commit atomic write will cause data corruption. This is the reproduction process: Step 1: create a compressed file ,write 64K data , call fsync(), then the blocks are write as compressed cluster. Step2: iotcl(F2FS_IOC_START_ATOMIC_WRITE) --- this should be fail, but not. write page 0 and page 3. iotcl(F2FS_IOC_COMMIT_ATOMIC_WRITE) -- page 0 and 3 write as normal file, Step3: drop cache. read page 0-4 -- Since page 0 has a valid block address, read as non-compressed cluster, page 1 and 2 will be filled with compressed data or zero. The root cause is, after commit `7eab7a6968` ("f2fs: compress: remove unneeded read when rewrite whole cluster"), in step 2, f2fs_write_begin() only set target page dirty, and in f2fs_commit_inmem_pages(), we will write partial raw pages into compressed cluster, result in corrupting compressed cluster layout. Fixes: `4c8ff7095b` ("f2fs: support data compression") Fixes: `7eab7a6968` ("f2fs: compress: remove unneeded read when rewrite whole cluster") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:11 +02:00
Filipe Manana	1a97987f76	btrfs: fix unexpected error path when reflinking an inline extent [ Upstream commit `1f4613cdbe` ] When reflinking an inline extent, we assert that its file offset is 0 and that its uncompressed length is not greater than the sector size. We then return an error if one of those conditions is not satisfied. However we use a return statement, which results in returning from btrfs_clone() without freeing the path and buffer that were allocated before, as well as not clearing the flag BTRFS_INODE_NO_DELALLOC_FLUSH for the destination inode. Fix that by jumping to the 'out' label instead, and also add a WARN_ON() for each condition so that in case assertions are disabled, we get to known which of the unexpected conditions triggered the error. Fixes: `a61e1e0df9` ("Btrfs: simplify inline extent handling when doing reflinks") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:11 +02:00
Chao Yu	2911ad0249	f2fs: fix to avoid potential deadlock [ Upstream commit `344150999b` ] Quoted from Jing Xia's report, there is a potential deadlock may happen between kworker and checkpoint as below: [T:writeback] [T:checkpoint] - wb_writeback - blk_start_plug bio contains NodeA was plugged in writeback threads - do_writepages -- sync write inodeB, inc wb_sync_req[DATA] - f2fs_write_data_pages - f2fs_write_single_data_page -- write last dirty page - f2fs_do_write_data_page - set_page_writeback -- clear page dirty flag and PAGECACHE_TAG_DIRTY tag in radix tree - f2fs_outplace_write_data - f2fs_update_data_blkaddr - f2fs_wait_on_page_writeback -- wait NodeA to writeback here - inode_dec_dirty_pages - writeback_sb_inodes - writeback_single_inode - do_writepages - f2fs_write_data_pages -- skip writepages due to wb_sync_req[DATA] - wbc->pages_skipped += get_dirty_pages() -- PAGECACHE_TAG_DIRTY is not set but get_dirty_pages() returns one - requeue_inode -- requeue inode to wb->b_dirty queue due to non-zero.pages_skipped - blk_finish_plug Let's try to avoid deadlock condition by forcing unplugging previous bio via blk_finish_plug(current->plug) once we'v skipped writeback in writepages() due to valid sbi->wb_sync_req[DATA/NODE]. Fixes: `687de7f101` ("f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:11 +02:00
Amir Goldstein	cc91880f04	nfsd: more robust allocation failure handling in nfsd_file_cache_init [ Upstream commit `4d2eeafecd` ] The nfsd file cache table can be pretty large and its allocation may require as many as 80 contigious pages. Employ the same fix that was employed for similar issue that was reported for the reply cache hash table allocation several years ago by commit `8f97514b42` ("nfsd: more robust allocation failure handling in nfsd_reply_cache_init"). Fixes: `65294c1f2c` ("nfsd: add a new struct file caching facility to nfsd") Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/ Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:11 +02:00
Jaegeuk Kim	d1eaaf6cad	f2fs: fix missing free nid in f2fs_handle_failed_inode [ Upstream commit `2fef99b837` ] This patch fixes xfstests/generic/475 failure. [ 293.680694] F2FS-fs (dm-1): May loss orphan inode, run fsck to fix. [ 293.685358] Buffer I/O error on dev dm-1, logical block 8388592, async page read [ 293.691527] Buffer I/O error on dev dm-1, logical block 8388592, async page read [ 293.691764] sh (7615): drop_caches: 3 [ 293.691819] sh (7616): drop_caches: 3 [ 293.694017] Buffer I/O error on dev dm-1, logical block 1, async page read [ 293.695659] sh (7618): drop_caches: 3 [ 293.696979] sh (7617): drop_caches: 3 [ 293.700290] sh (7623): drop_caches: 3 [ 293.708621] sh (7626): drop_caches: 3 [ 293.711386] sh (7628): drop_caches: 3 [ 293.711825] sh (7627): drop_caches: 3 [ 293.716738] sh (7630): drop_caches: 3 [ 293.719613] sh (7632): drop_caches: 3 [ 293.720971] sh (7633): drop_caches: 3 [ 293.727741] sh (7634): drop_caches: 3 [ 293.730783] sh (7636): drop_caches: 3 [ 293.732681] sh (7635): drop_caches: 3 [ 293.732988] sh (7637): drop_caches: 3 [ 293.738836] sh (7639): drop_caches: 3 [ 293.740568] sh (7641): drop_caches: 3 [ 293.743053] sh (7640): drop_caches: 3 [ 293.821889] ------------[ cut here ]------------ [ 293.824654] kernel BUG at fs/f2fs/node.c:3334! [ 293.826226] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 293.828713] CPU: 0 PID: 7653 Comm: umount Tainted: G OE 5.17.0-rc1-custom #1 [ 293.830946] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 293.832526] RIP: 0010:f2fs_destroy_node_manager+0x33f/0x350 [f2fs] [ 293.833905] Code: e8 d6 3d f9 f9 48 8b 45 d0 65 48 2b 04 25 28 00 00 00 75 1a 48 81 c4 28 03 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b [ 293.837783] RSP: 0018:ffffb04ec31e7a20 EFLAGS: 00010202 [ 293.839062] RAX: 0000000000000001 RBX: ffff9df947db2eb8 RCX: 0000000080aa0072 [ 293.840666] RDX: 0000000000000000 RSI: ffffe86c0432a140 RDI: ffffffffc0b72a21 [ 293.842261] RBP: ffffb04ec31e7d70 R08: ffff9df94ca85780 R09: 0000000080aa0072 [ 293.843909] R10: ffff9df94ca85700 R11: ffff9df94e1ccf58 R12: ffff9df947db2e00 [ 293.845594] R13: ffff9df947db2ed0 R14: ffff9df947db2eb8 R15: ffff9df947db2eb8 [ 293.847855] FS: 00007f5a97379800(0000) GS:ffff9dfa77c00000(0000) knlGS:0000000000000000 [ 293.850647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 293.852940] CR2: 00007f5a97528730 CR3: 000000010bc76005 CR4: 0000000000370ef0 [ 293.854680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 293.856423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 293.858380] Call Trace: [ 293.859302] <TASK> [ 293.860311] ? ttwu_do_wakeup+0x1c/0x170 [ 293.861800] ? ttwu_do_activate+0x6d/0xb0 [ 293.863057] ? _raw_spin_unlock_irqrestore+0x29/0x40 [ 293.864411] ? try_to_wake_up+0x9d/0x5e0 [ 293.865618] ? debug_smp_processor_id+0x17/0x20 [ 293.866934] ? debug_smp_processor_id+0x17/0x20 [ 293.868223] ? free_unref_page+0xbf/0x120 [ 293.869470] ? __free_slab+0xcb/0x1c0 [ 293.870614] ? preempt_count_add+0x7a/0xc0 [ 293.871811] ? __slab_free+0xa0/0x2d0 [ 293.872918] ? __wake_up_common_lock+0x8a/0xc0 [ 293.874186] ? __slab_free+0xa0/0x2d0 [ 293.875305] ? free_inode_nonrcu+0x20/0x20 [ 293.876466] ? free_inode_nonrcu+0x20/0x20 [ 293.877650] ? debug_smp_processor_id+0x17/0x20 [ 293.878949] ? call_rcu+0x11a/0x240 [ 293.880060] ? f2fs_destroy_stats+0x59/0x60 [f2fs] [ 293.881437] ? kfree+0x1fe/0x230 [ 293.882674] f2fs_put_super+0x160/0x390 [f2fs] [ 293.883978] generic_shutdown_super+0x7a/0x120 [ 293.885274] kill_block_super+0x27/0x50 [ 293.886496] kill_f2fs_super+0x7f/0x100 [f2fs] [ 293.887806] deactivate_locked_super+0x35/0xa0 [ 293.889271] deactivate_super+0x40/0x50 [ 293.890513] cleanup_mnt+0x139/0x190 [ 293.891689] __cleanup_mnt+0x12/0x20 [ 293.892850] task_work_run+0x64/0xa0 [ 293.894035] exit_to_user_mode_prepare+0x1b7/0x1c0 [ 293.895409] syscall_exit_to_user_mode+0x27/0x50 [ 293.896872] do_syscall_64+0x48/0xc0 [ 293.898090] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 293.899517] RIP: 0033:0x7f5a975cd25b Fixes: `7735730d39` ("f2fs: fix to propagate error from __get_meta_page()") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:11 +02:00
Chao Yu	d8c8dd97bb	f2fs: fix to enable ATGC correctly via gc_idle sysfs interface [ Upstream commit `7d19e3dab0` ] It needs to assign sbi->gc_mode with GC_IDLE_AT rather than GC_AT when user tries to enable ATGC via gc_idle sysfs interface, fix it. Fixes: `093749e296` ("f2fs: support age threshold based garbage collection") Cc: Zhipeng Tan <tanzhipeng@hust.edu.cn> Signed-off-by: Jicheng Shao <shaojicheng@hust.edu.cn> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:10 +02:00
Jens Axboe	109dda4510	io_uring: terminate manual loop iterator loop correctly for non-vecs [ Upstream commit `5e92936746` ] The fix for not advancing the iterator if we're using fixed buffers is broken in that it can hit a condition where we don't terminate the loop. This results in io-wq looping forever, asking to read (or write) 0 bytes for every subsequent loop. Reported-by: Joel Jaeschke <joel.jaeschke@gmail.com> Link: https://github.com/axboe/liburing/issues/549 Fixes: `16c8d2df7e` ("io_uring: ensure symmetry in handling iter types in loop_rw_iter()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:10 +02:00
Jens Axboe	1323976e94	io_uring: don't check unrelated req->open.how in accept request [ Upstream commit `adf3a9e9f5` ] Looks like a victim of too much copy/paste, we should not be looking at req->open.how in accept. The point is to check CLOEXEC and error out, which we don't invalid direct descriptors on exec. Hence any attempt to get a direct descriptor with CLOEXEC is invalid. No harm is done here, as req->open.how.flags overlaps with req->accept.flags, but it's very confusing and might change if either of those command structs are modified. Fixes: `aaa4db12ef` ("io_uring: accept directly into fixed file table") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-08 14:23:10 +02:00
Qu Wenruo	00c6bb4cea	btrfs: verify the tranisd of the to-be-written dirty extent buffer commit `3777369ff1` upstream. [BUG] There is a bug report that a bitflip in the transid part of an extent buffer makes btrfs to reject certain tree blocks: BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 [CAUSE] Note the failed transid check, hex(262166) = 0x40016, while hex(22) = 0x16. It's an obvious bitflip. Furthermore, the reporter also confirmed the bitflip is from the hardware, so it's a real hardware caused bitflip, and such problem can not be detected by the existing tree-checker framework. As tree-checker can only verify the content inside one tree block, while generation of a tree block can only be verified against its parent. So such problem remain undetected. [FIX] Although tree-checker can not verify it at write-time, we still have a quick (but not the most accurate) way to catch such obvious corruption. Function csum_one_extent_buffer() is called before we submit metadata write. Thus it means, all the extent buffer passed in should be dirty tree blocks, and should be newer than last committed transaction. Using that we can catch the above bitflip. Although it's not a perfect solution, as if the corrupted generation is higher than the correct value, we have no way to catch it at all. Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/ CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@sus,ree.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:23:02 +02:00
Niels Dossche	f85ee0c845	btrfs: extend locking to all space_info members accesses commit `06bae87663` upstream. bytes_pinned is always accessed under space_info->lock, except in btrfs_preempt_reclaim_metadata_space, however the other members are accessed under that lock. The reserved member of the rsv's are also partially accessed under a lock and partially not. Move all these accesses into the same lock to ensure consistency. This could potentially race and lead to a flush instead of a commit but it's not a big problem as it's only for preemptive flush. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Niels Dossche <niels.dossche@ugent.be> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:23:02 +02:00
Naohiro Aota	68a8120e16	btrfs: zoned: mark relocation as writing commit `ca5e4ea0be` upstream. There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extents never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit adds sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Fixes: `18bb8bbf13` ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13+ Link: https://github.com/naota/linux/issues/56 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:23:02 +02:00
Kees Cook	1290eb4412	exec: Force single empty string when argv is empty commit `dcd46d897a` upstream. Quoting[1] Ariadne Conill: "In several other operating systems, it is a hard requirement that the second argument to execve(2) be the name of a program, thus prohibiting a scenario where argc < 1. POSIX 2017 also recommends this behaviour, but it is not an explicit requirement[2]: The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions. ... Interestingly, Michael Kerrisk opened an issue about this in 2008[3], but there was no consensus to support fixing this issue then. Hopefully now that CVE-2021-4034 shows practical exploitative use[4] of this bug in a shellcode, we can reconsider. This issue is being tracked in the KSPP issue tracker[5]." While the initial code searches[6][7] turned up what appeared to be mostly corner case tests, trying to that just reject argv == NULL (or an immediately terminated pointer list) quickly started tripping[8] existing userspace programs. The next best approach is forcing a single empty string into argv and adjusting argc to match. The number of programs depending on argc == 0 seems a smaller set than those calling execve with a NULL argv. Account for the additional stack space in bprm_stack_limits(). Inject an empty string when argc == 0 (and set argc = 1). Warn about the case so userspace has some notice about the change: process './argc0' launched './argc0' with NULL argv: empty string added Additionally WARN() and reject NULL argv usage for kernel threads. [1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/ [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=8408 [4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt [5] https://github.com/KSPP/linux/issues/176 [6] https://codesearch.debian.net/search?q=execve%5C+%5C%28%5B%5E%2C%5D%2B%2C+NULL&literal=0 [7] https://codesearch.debian.net/search?q=execlp%3F%5Cs%5C%28%5B%5E%2C%5D%2B%2C%5CsNULL&literal=0 [8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/ Reported-by: Ariadne Conill <ariadne@dereferenced.org> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Ariadne Conill <ariadne@dereferenced.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:23:01 +02:00
Jann Horn	bc5f440e1c	pstore: Don't use semaphores in always-atomic-context code commit `8126b1c731` upstream. pstore_dump() is always invoked in atomic context (nowadays in an RCU read-side critical section, before that under a spinlock). It doesn't make sense to try to use semaphores here. This is mostly a revert of commit `ea84b580b9` ("pstore: Convert buf_lock to semaphore"), except that two parts aren't restored back exactly as they were: - keep the lock initialization in pstore_register - in efi_pstore_write(), always set the "block" flag to false - omit "is_locked", that was unnecessary since commit `959217c84c` ("pstore: Actually give up during locking failure") - fix the bailout message The actual problem that the buggy commit was trying to address may have been that the use of preemptible() in efi_pstore_write() was wrong - it only looks at preempt_count() and the state of IRQs, but __rcu_read_lock() doesn't touch either of those under CONFIG_PREEMPT_RCU. (Sidenote: CONFIG_PREEMPT_RCU means that the scheduler can preempt tasks in RCU read-side critical sections, but you're not allowed to actively block/reschedule.) Lockdep probably never caught the problem because it's very rare that you actually hit the contended case, so lockdep always just sees the down_trylock(), not the down_interruptible(), and so it can't tell that there's a problem. Fixes: `ea84b580b9` ("pstore: Convert buf_lock to semaphore") Cc: stable@vger.kernel.org Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220314185953.2068993-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:23:01 +02:00
Ojaswin Mujoo	3c65b7309d	ext4: make mb_optimize_scan performance mount option work with extents commit `077d0c2c78` upstream. Currently mb_optimize_scan scan feature which improves filesystem performance heavily (when FS is fragmented), seems to be not working with files with extents (ext4 by default has files with extents). This patch fixes that and makes mb_optimize_scan feature work for files with extents. Below are some performance numbers obtained when allocating a 10M and 100M file with and w/o this patch on a filesytem with no 1M contiguous block. <perf numbers> =============== Workload: dd if=/dev/urandom of=test conv=fsync bs=1M count=10/100 Time taken ===================================================== no. Size without-patch with-patch Diff(%) 1 10M 0m8.401s 0m5.623s 33.06% 2 100M 1m40.465s 1m14.737s 25.6% <debug stats> ============= w/o patch: mballoc: reqs: 17056 success: 11407 groups_scanned: 13643 cr0_stats: hits: 37 groups_considered: 9472 useless_loops: 36 bad_suggestions: 0 cr1_stats: hits: 11418 groups_considered: 908560 useless_loops: 1894 bad_suggestions: 0 cr2_stats: hits: 1873 groups_considered: 6913 useless_loops: 21 cr3_stats: hits: 21 groups_considered: 5040 useless_loops: 21 extents_scanned: 417364 goal_hits: 3707 2^n_hits: 37 breaks: 1873 lost: 0 buddies_generated: 239/240 buddies_time_used: 651080 preallocated: 705 discarded: 478 with patch: mballoc: reqs: 12768 success: 11305 groups_scanned: 12768 cr0_stats: hits: 1 groups_considered: 18 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 5829 groups_considered: 50626 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 6938 groups_considered: 580363 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 309059 goal_hits: 0 2^n_hits: 1 breaks: 1463 lost: 0 buddies_generated: 239/240 buddies_time_used: 791392 preallocated: 673 discarded: 446 Fixes: `196e402` (ext4: improve cr 0 / cr 1 group scanning) Cc: stable@kernel.org Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Suggested-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/fc9a48f7f8dcfc83891a8b21f6dd8cdf056ed810.1646732698.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:59 +02:00
Ye Bin	597393cde8	ext4: fix fs corruption when tring to remove a non-empty directory with IO error commit `7aab5c84a0` upstream. We inject IO error when rmdir non empty direcory, then got issue as follows: step1: mkfs.ext4 -F /dev/sda step2: mount /dev/sda test step3: cd test step4: mkdir -p 1/2 step5: rmdir 1 [ 110.920551] ext4_empty_dir: inject fault [ 110.921926] EXT4-fs warning (device sda): ext4_rmdir:3113: inode #12: comm rmdir: empty directory '1' has too many links (3) step6: cd .. step7: umount test step8: fsck.ext4 -f /dev/sda e2fsck 1.42.9 (28-Dec-2013) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry '..' in .../??? (13) has deleted/unused inode 12. Clear<y>? yes Pass 3: Checking directory connectivity Unconnected directory inode 13 (...) Connect to /lost+found<y>? yes Pass 4: Checking reference counts Inode 13 ref count is 3, should be 2. Fix<y>? yes Pass 5: Checking group summary information /dev/sda: *** FILE SYSTEM WAS MODIFIED *** /dev/sda: 12/131072 files (0.0% non-contiguous), 26157/524288 blocks ext4_rmdir if (!ext4_empty_dir(inode)) goto end_rmdir; ext4_empty_dir bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE); if (IS_ERR(bh)) return true; Now if read directory block failed, 'ext4_empty_dir' will return true, assume directory is empty. Obviously, it will lead to above issue. To solve this issue, if read directory block failed 'ext4_empty_dir' just return false. To avoid making things worse when file system is already corrupted, 'ext4_empty_dir' also return false. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20220228024815.3952506-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:59 +02:00
Jann Horn	6cdb84dd0c	coredump: Also dump first pages of non-executable ELF libraries commit `84158b7f6a` upstream. When I rewrote the VMA dumping logic for coredumps, I changed it to recognize ELF library mappings based on the file being executable instead of the mapping having an ELF header. But turns out, distros ship many ELF libraries as non-executable, so the heuristic goes wrong... Restore the old behavior where FILTER(ELF_HEADERS) dumps the first page of any offset-0 readable mapping that starts with the ELF magic. This fix is technically layer-breaking a bit, because it checks for something ELF-specific in fs/coredump.c; but since we probably want to share this between standard ELF and FDPIC ELF anyway, I guess it's fine? And this also keeps the change small for backporting. Cc: stable@vger.kernel.org Fixes: `429a22e776` ("coredump: rework elf/elf_fdpic vma_dump_size() into common helper") Reported-by: Bill Messmer <wmessmer@microsoft.com> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220126025739.2014888-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:58 +02:00
Joseph Qi	7c5312fdb1	ocfs2: fix crash when mount with quota enabled commit `de19433423` upstream. There is a reported crash when mounting ocfs2 with quota enabled. RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2] Call Trace: ocfs2_local_read_info+0xb9/0x6f0 [ocfs2] dquot_load_quota_sb+0x216/0x470 dquot_load_quota_inode+0x85/0x100 ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2] ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2] mount_bdev+0x185/0x1b0 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 path_mount+0x465/0xac0 __x64_sys_mount+0x103/0x140 It is caused by when initializing dqi_gqlock, the corresponding dqi_type and dqi_sb are not properly initialized. This issue is introduced by commit `6c85c2c728`, which wants to avoid accessing uninitialized variables in error cases. So make global quota info properly initialized. Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141 Fixes: `6c85c2c728` ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Dayvison <sathlerds@gmail.com> Tested-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:56 +02:00
Paulo Alcantara	39a4bf7d1a	cifs: fix NULL ptr dereference in smb2_ioctl_query_info() commit `d6f5e35845` upstream. When calling smb2_ioctl_query_info() with invalid smb_query_info::flags, a NULL ptr dereference is triggered when trying to kfree() uninitialised rqst[n].rq_iov array. This also fixes leaked paths that are created in SMB2_open_init() which required SMB2_open_free() to properly free them. Here is a small C reproducer that triggers it #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <fcntl.h> #include <sys/ioctl.h> #define die(s) perror(s), exit(1) #define QUERY_INFO 0xc018cf07 int main(int argc, char *argv[]) { int fd; if (argc < 2) exit(1); fd = open(argv[1], O_RDONLY); if (fd == -1) die("open"); if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1) die("ioctl"); close(fd); return 0; } mount.cifs //srv/share /mnt -o ... gcc repro.c && ./a.out /mnt/f0 [ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4 [ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2 [ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs] [ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48 [ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256 [ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d [ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0 [ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003 [ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800 [ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8 [ 1832.131485] FS: 00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000 [ 1832.131993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0 [ 1832.132801] Call Trace: [ 1832.132962] <TASK> [ 1832.133104] ? smb2_query_reparse_tag+0x890/0x890 [cifs] [ 1832.133489] ? cifs_mapchar+0x460/0x460 [cifs] [ 1832.133822] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.134125] ? cifs_strndup_to_utf16+0x15b/0x250 [cifs] [ 1832.134502] ? lock_downgrade+0x6f0/0x6f0 [ 1832.134760] ? cifs_convert_path_to_utf16+0x198/0x220 [cifs] [ 1832.135170] ? smb2_check_message+0x1080/0x1080 [cifs] [ 1832.135545] cifs_ioctl+0x1577/0x3320 [cifs] [ 1832.135864] ? lock_downgrade+0x6f0/0x6f0 [ 1832.136125] ? cifs_readdir+0x2e60/0x2e60 [cifs] [ 1832.136468] ? rcu_read_lock_sched_held+0x3f/0x70 [ 1832.136769] ? __rseq_handle_notify_resume+0x80b/0xbe0 [ 1832.137096] ? __up_read+0x192/0x710 [ 1832.137327] ? __ia32_sys_rseq+0xf0/0xf0 [ 1832.137578] ? __x64_sys_openat+0x11f/0x1d0 [ 1832.137850] __x64_sys_ioctl+0x127/0x190 [ 1832.138103] do_syscall_64+0x3b/0x90 [ 1832.138378] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1832.138702] RIP: 0033:0x7fcee9a253df [ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df [ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003 [ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e [ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48 [ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000 [ 1832.142851] </TASK> [ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs] Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:55 +02:00
Paulo Alcantara	f143f8334f	cifs: prevent bad output lengths in smb2_ioctl_query_info() commit `b92e358757` upstream. When calling smb2_ioctl_query_info() with smb_query_info::flags=PASSTHRU_FSCTL and smb_query_info::output_buffer_length=0, the following would return 0x10 buffer = memdup_user(arg + sizeof(struct smb_query_info), qi.output_buffer_length); if (IS_ERR(buffer)) { kfree(vars); return PTR_ERR(buffer); } rather than a valid pointer thus making IS_ERR() check fail. This would then cause a NULL ptr deference in @buffer when accessing it later in smb2_ioctl_query_ioctl(). While at it, prevent having a @buffer smaller than 8 bytes to correctly handle SMB2_SET_INFO FileEndOfFileInformation requests when smb_query_info::flags=PASSTHRU_SET_INFO. Here is a small C reproducer which triggers a NULL ptr in @buffer when passing an invalid smb_query_info::flags #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <fcntl.h> #include <sys/ioctl.h> #define die(s) perror(s), exit(1) #define QUERY_INFO 0xc018cf07 int main(int argc, char *argv[]) { int fd; if (argc < 2) exit(1); fd = open(argv[1], O_RDONLY); if (fd == -1) die("open"); if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1) die("ioctl"); close(fd); return 0; } mount.cifs //srv/share /mnt -o ... gcc repro.c && ./a.out /mnt/f0 [ 114.138620] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 114.139310] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 114.139775] CPU: 2 PID: 995 Comm: a.out Not tainted 5.17.0-rc8 #1 [ 114.140148] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 114.140818] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs] [ 114.141221] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24 [ 114.142348] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256 [ 114.142692] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d [ 114.143119] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380 [ 114.143544] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003 [ 114.143983] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288 [ 114.144424] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000 [ 114.144852] FS: 00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000 [ 114.145338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.145692] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0 [ 114.146131] Call Trace: [ 114.146291] <TASK> [ 114.146432] ? smb2_query_reparse_tag+0x890/0x890 [cifs] [ 114.146800] ? cifs_mapchar+0x460/0x460 [cifs] [ 114.147121] ? rcu_read_lock_sched_held+0x3f/0x70 [ 114.147412] ? cifs_strndup_to_utf16+0x15b/0x250 [cifs] [ 114.147775] ? dentry_path_raw+0xa6/0xf0 [ 114.148024] ? cifs_convert_path_to_utf16+0x198/0x220 [cifs] [ 114.148413] ? smb2_check_message+0x1080/0x1080 [cifs] [ 114.148766] ? rcu_read_lock_sched_held+0x3f/0x70 [ 114.149065] cifs_ioctl+0x1577/0x3320 [cifs] [ 114.149371] ? lock_downgrade+0x6f0/0x6f0 [ 114.149631] ? cifs_readdir+0x2e60/0x2e60 [cifs] [ 114.149956] ? rcu_read_lock_sched_held+0x3f/0x70 [ 114.150250] ? __rseq_handle_notify_resume+0x80b/0xbe0 [ 114.150562] ? __up_read+0x192/0x710 [ 114.150791] ? __ia32_sys_rseq+0xf0/0xf0 [ 114.151025] ? __x64_sys_openat+0x11f/0x1d0 [ 114.151296] __x64_sys_ioctl+0x127/0x190 [ 114.151549] do_syscall_64+0x3b/0x90 [ 114.151768] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 114.152079] RIP: 0033:0x7f7aead043df [ 114.152306] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 114.153431] RSP: 002b:00007ffc2e0c1f80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 114.153890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7aead043df [ 114.154315] RDX: 00007ffc2e0c1ff0 RSI: 00000000c018cf07 RDI: 0000000000000003 [ 114.154747] RBP: 00007ffc2e0c2010 R08: 00007f7aeae03db0 R09: 00007f7aeae24c4e [ 114.155192] R10: 00007f7aeabf7d40 R11: 0000000000000246 R12: 00007ffc2e0c2128 [ 114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000 [ 114.156071] </TASK> [ 114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [ 114.156608] ---[ end trace 0000000000000000 ]--- [ 114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs] [ 114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24 [ 114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256 [ 114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d [ 114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380 [ 114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003 [ 114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288 [ 114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000 [ 114.156071] </TASK> [ 114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [ 114.156608] ---[ end trace 0000000000000000 ]--- [ 114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs] [ 114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24 [ 114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256 [ 114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d [ 114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380 [ 114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003 [ 114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288 [ 114.161823] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000 [ 114.162274] FS: 00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000 [ 114.162853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.163218] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0 [ 114.163691] Kernel panic - not syncing: Fatal exception [ 114.164087] Kernel Offset: disabled [ 114.164316] ---[ end Kernel panic - not syncing: Fatal exception ]--- Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:55 +02:00
Minchan Kim	ab657a29c3	mm: fs: fix lru_cache_disabled race in bh_lru commit `c0226eb8bd` upstream. Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce race below and it fails to migrate pages containing buffer_head. CPU 0 CPU 1 bh_lru_install lru_cache_disable lru_cache_disabled = false atomic_inc(&lru_disable_count); invalidate_bh_lrus_cpu of CPU 0 bh_lru_lock __invalidate_bh_lrus bh_lru_unlock bh_lru_lock install the bh bh_lru_unlock WHen this race happens a CMA allocation fails, which is critical for the workload which depends on CMA. Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org Fixes: `8cc621d2f4` ("mm: fs: invalidate BH LRU during page migration") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:54 +02:00
Baokun Li	52ba0ab4f0	jffs2: fix memory leak in jffs2_scan_medium commit `9cdd312887` upstream. If an error is returned in jffs2_scan_eraseblock() and some memory has been added to the jffs2_summary *s, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88812b889c40 (size 64): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): 40 48 b5 14 81 88 ff ff 01 e0 31 00 00 00 50 00 @H........1...P. 00 00 01 00 00 00 01 00 00 00 02 00 00 00 09 08 ................ backtrace: [<ffffffffae93a3a3>] __kmalloc+0x613/0x910 [<ffffffffaf423b9c>] jffs2_sum_add_dirent_mem+0x5c/0xa0 [<ffffffffb0f3afa8>] jffs2_scan_medium.cold+0x36e5/0x4794 [<ffffffffb0f3dbe1>] jffs2_do_mount_fs.cold+0xa7/0x2267 [<ffffffffaf40acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffaf40c00a>] jffs2_fill_super+0x2ea/0x4c0 [<ffffffffb0315d64>] mtd_get_sb+0x254/0x400 [<ffffffffb0315f5f>] mtd_get_sb_by_nr+0x4f/0xd0 [<ffffffffb0316478>] get_tree_mtd+0x498/0x840 [<ffffffffaf40bd15>] jffs2_get_tree+0x25/0x30 [<ffffffffae9f358d>] vfs_get_tree+0x8d/0x2e0 [<ffffffffaea7a98f>] path_mount+0x50f/0x1e50 [<ffffffffaea7c3d7>] do_mount+0x107/0x130 [<ffffffffaea7c5c5>] __se_sys_mount+0x1c5/0x2f0 [<ffffffffaea7c917>] __x64_sys_mount+0xc7/0x160 [<ffffffffb10142f5>] do_syscall_64+0x45/0x70 unreferenced object 0xffff888114b54840 (size 32): comm "mount", pid 692, jiffies 4294838325 (age 34.288s) hex dump (first 32 bytes): c0 75 b5 14 81 88 ff ff 02 e0 02 00 00 00 02 00 .u.............. 00 00 84 00 00 00 44 00 00 00 6b 6b 6b 6b 6b a5 ......D...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423b04>] jffs2_sum_add_inode_mem+0x54/0x90 [<ffffffffb0f3bd44>] jffs2_scan_medium.cold+0x4481/0x4794 [...] unreferenced object 0xffff888114b57280 (size 32): comm "mount", pid 692, jiffies 4294838393 (age 34.357s) hex dump (first 32 bytes): 10 d5 6c 11 81 88 ff ff 08 e0 05 00 00 00 01 00 ..l............. 00 00 38 02 00 00 28 00 00 00 6b 6b 6b 6b 6b a5 ..8...(...kkkkk. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423c34>] jffs2_sum_add_xattr_mem+0x54/0x90 [<ffffffffb0f3a24f>] jffs2_scan_medium.cold+0x298c/0x4794 [...] unreferenced object 0xffff8881116cd510 (size 16): comm "mount", pid 692, jiffies 4294838395 (age 34.355s) hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 09 e0 60 02 00 00 6b a5 ..........`...k. backtrace: [<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffaf423cc4>] jffs2_sum_add_xref_mem+0x54/0x90 [<ffffffffb0f3b2e3>] jffs2_scan_medium.cold+0x3a20/0x4794 [...] -------------------------------------------- Therefore, we should call jffs2_sum_reset_collected(s) on exit to release the memory added in s. In addition, a new tag "out_buf" is added to prevent the NULL pointer reference caused by s being NULL. (thanks to Zhang Yi for this analysis) Fixes: `e631ddba58` ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Co-developed-with: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:53 +02:00
Baokun Li	4392e8aeeb	jffs2: fix memory leak in jffs2_do_mount_fs commit `d051cef784` upstream. If jffs2_build_filesystem() in jffs2_do_mount_fs() returns an error, we can observe the following kmemleak report: -------------------------------------------- unreferenced object 0xffff88811b25a640 (size 64): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa493be24>] kmem_cache_alloc_trace+0x584/0x880 [<ffffffffa5423a06>] jffs2_sum_init+0x86/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] unreferenced object 0xffff88812c760000 (size 65536): comm "mount", pid 691, jiffies 4294957728 (age 71.952s) hex dump (first 32 bytes): bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ backtrace: [<ffffffffa493a449>] __kmalloc+0x6b9/0x910 [<ffffffffa5423a57>] jffs2_sum_init+0xd7/0x130 [<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0 [<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30 [<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0 [...] -------------------------------------------- This is because the resources allocated in jffs2_sum_init() are not released. Call jffs2_sum_exit() to release these resources to solve the problem. Fixes: `e631ddba58` ("[JFFS2] Add erase block summary support (mount time improvement)") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:53 +02:00
Baokun Li	7a75740206	jffs2: fix use-after-free in jffs2_clear_xattr_subsystem commit `4c7c44ee16` upstream. When we mount a jffs2 image, assume that the first few blocks of the image are normal and contain at least one xattr-related inode, but the next block is abnormal. As a result, an error is returned in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then called in jffs2_build_filesystem() and then again in jffs2_do_fill_super(). Finally we can observe the following report: ================================================================== BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac Read of size 8 at addr ffff8881243384e0 by task mount/719 Call Trace: dump_stack+0x115/0x16b jffs2_clear_xattr_subsystem+0x95/0x6ac jffs2_do_fill_super+0x84f/0xc30 jffs2_fill_super+0x2ea/0x4c0 mtd_get_sb+0x254/0x400 mtd_get_sb_by_nr+0x4f/0xd0 get_tree_mtd+0x498/0x840 jffs2_get_tree+0x25/0x30 vfs_get_tree+0x8d/0x2e0 path_mount+0x50f/0x1e50 do_mount+0x107/0x130 __se_sys_mount+0x1c5/0x2f0 __x64_sys_mount+0xc7/0x160 do_syscall_64+0x45/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Allocated by task 719: kasan_save_stack+0x23/0x60 __kasan_kmalloc.constprop.0+0x10b/0x120 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x1c0/0x870 jffs2_alloc_xattr_ref+0x2f/0xa0 jffs2_scan_medium.cold+0x3713/0x4794 jffs2_do_mount_fs.cold+0xa7/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] Freed by task 719: kmem_cache_free+0xcc/0x7b0 jffs2_free_xattr_ref+0x78/0x98 jffs2_clear_xattr_subsystem+0xa1/0x6ac jffs2_do_mount_fs.cold+0x5e6/0x2253 jffs2_do_fill_super+0x383/0xc30 jffs2_fill_super+0x2ea/0x4c0 [...] The buggy address belongs to the object at ffff8881243384b8 which belongs to the cache jffs2_xattr_ref of size 48 The buggy address is located 40 bytes inside of 48-byte region [ffff8881243384b8, ffff8881243384e8) [...] ================================================================== The triggering of the BUG is shown in the following stack: ----------------------------------------------------------- jffs2_fill_super jffs2_do_fill_super jffs2_do_mount_fs jffs2_build_filesystem jffs2_scan_medium jffs2_scan_eraseblock <--- ERROR jffs2_clear_xattr_subsystem <--- free jffs2_clear_xattr_subsystem <--- free again ----------------------------------------------------------- An error is returned in jffs2_do_mount_fs(). If the error is returned by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to be executed. If the error is returned by jffs2_build_filesystem(), the jffs2_clear_xattr_subsystem() also does not need to be executed again. So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root' to fix this UAF problem. Fixes: `aa98d7cf59` ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:53 +02:00
Chao Yu	b065f398c8	f2fs: fix to do sanity check on .cp_pack_total_block_count commit `5b5b4f85b0` upstream. As bughunter reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215709 f2fs may hang when mounting a fuzzed image, the dmesg shows as below: __filemap_get_folio+0x3a9/0x590 pagecache_get_page+0x18/0x60 __get_meta_page+0x95/0x460 [f2fs] get_checkpoint_version+0x2a/0x1e0 [f2fs] validate_checkpoint+0x8e/0x2a0 [f2fs] f2fs_get_valid_checkpoint+0xd0/0x620 [f2fs] f2fs_fill_super+0xc01/0x1d40 [f2fs] mount_bdev+0x18a/0x1c0 f2fs_mount+0x15/0x20 [f2fs] legacy_get_tree+0x28/0x50 vfs_get_tree+0x27/0xc0 path_mount+0x480/0xaa0 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x38/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is cp_pack_total_block_count field in checkpoint was fuzzed to one, as calcuated, two cp pack block locates in the same block address, so then read latter cp pack block, it will block on the page lock due to the lock has already held when reading previous cp pack block, fix it by adding sanity check for cp_pack_total_block_count. Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:52 +02:00
Juhyung Park	f9156db098	f2fs: quota: fix loop condition at f2fs_quota_sync() commit `680af5b824` upstream. cnt should be passed to sb_has_quota_active() instead of type to check active quota properly. Moreover, when the type is -1, the compiler with enough inline knowledge can discard sb_has_quota_active() check altogether, causing a NULL pointer dereference at the following inode_lock(dqopt->files[cnt]): [ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [ 2.796024] Mem abort info: [ 2.796025] ESR = 0x96000005 [ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.796029] SET = 0, FnV = 0 [ 2.796031] EA = 0, S1PTW = 0 [ 2.796032] Data abort info: [ 2.796034] ISV = 0, ISS = 0x00000005 [ 2.796035] CM = 0, WnR = 0 [ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000 [ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000 [ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1 [ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT) [ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO) [ 2.796065] pc : down_write+0x28/0x70 [ 2.796070] lr : f2fs_quota_sync+0x100/0x294 [ 2.796071] sp : ffffffa3f48ffc30 [ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000 [ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80 [ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001 [ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000 [ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110 [ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038 [ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0 [ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0 [ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000 [ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948 [ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0 [ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a [ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d [ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001 [ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0 [ 2.796098] Call trace: [ 2.796100] down_write+0x28/0x70 [ 2.796102] f2fs_quota_sync+0x100/0x294 [ 2.796104] block_operations+0x120/0x204 [ 2.796106] f2fs_write_checkpoint+0x11c/0x520 [ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34 [ 2.796109] issue_checkpoint_thread+0x6c/0xb8 [ 2.796112] kthread+0x138/0x414 [ 2.796114] ret_from_fork+0x10/0x18 [ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02) [ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]--- [ 2.800116] Kernel panic - not syncing: Fatal exception [ 2.800120] SMP: stopping secondary CPUs Fixes: `9de71ede81` ("f2fs: quota: fix potential deadlock") Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:52 +02:00
Chao Yu	e98ae961b3	f2fs: fix to unlock page correctly in error path of is_alive() commit `6d18762ed5` upstream. As Pavel Machek reported in below link [1]: After commit `77900c45ee` ("f2fs: fix to do sanity check in is_alive()"), node page should be unlock via calling f2fs_put_page() in the error path of is_alive(), otherwise, f2fs may hang when it tries to lock the node page, fix it. [1] https://lore.kernel.org/stable/20220124203637.GA19321@duo.ucw.cz/ Fixes: `77900c45ee` ("f2fs: fix to do sanity check in is_alive()") Cc: <stable@vger.kernel.org> Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Pavel Machek <pavel@denx.de> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:52 +02:00
Dan Carpenter	614a61e159	NFSD: prevent underflow in nfssvc_decode_writeargs() commit `184416d4b9` upstream. Smatch complains: fs/nfsd/nfsxdr.c:341 nfssvc_decode_writeargs() warn: no lower bound on 'args->len' Change the type to unsigned to prevent this issue. Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-08 14:22:52 +02:00

1 2 3 4 5 ...

73390 commits