linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-20 09:31:09 +00:00

Author	SHA1	Message	Date
Ye Bin	d11d2ded29	jbd2: fix potential use-after-free in jbd2_fc_wait_bufs commit `243d1a5d50` upstream. In 'jbd2_fc_wait_bufs' use 'bh' after put buffer head reference count which may lead to use-after-free. So judge buffer if uptodate before put buffer head reference count. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Ye Bin	e7385c868e	jbd2: fix potential buffer head reference count leak commit `e0d5fc7a6d` upstream. As in 'jbd2_fc_wait_bufs' if buffer isn't uptodate, will return -EIO without update 'journal->j_fc_off'. But 'jbd2_fc_release_bufs' will release buffer head from ‘j_fc_off - 1’ if 'bh' is NULL will terminal release which will lead to buffer head buffer head reference count leak. To solve above issue, update 'journal->j_fc_off' before return -EIO. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Andrew Perepechko	d87fe290a5	jbd2: wake up journal waiters in FIFO order, not LIFO commit `34fc8768ec` upstream. LIFO wakeup order is unfair and sometimes leads to a journal user not being able to get a journal handle for hundreds of transactions in a row. FIFO wakeup can make things more fair. Cc: stable@kernel.org Signed-off-by: Alexey Lyashkov <alexey.lyashkov@gmail.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220907165959.1137482-1-alexey.lyashkov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Kees Cook	7434626c5e	hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero commit `607e57c6c6` upstream. Now that Clang's -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang option is no longer required, remove it from the command line. Clang 16 and later will warn when it is used, which will cause Kconfig to think it can't use -ftrivial-auto-var-init=zero at all. Check for whether it is required and only use it when so. Cc: Nathan Chancellor <nathan@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: linux-kbuild@vger.kernel.org Cc: llvm@lists.linux.dev Cc: stable@vger.kernel.org Fixes: `f02003c860` ("hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Kees Cook	095493833b	hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO commit `f02003c860` upstream. Currently under Clang, CC_HAS_AUTO_VAR_INIT_ZERO requires an extra -enable flag compared to CC_HAS_AUTO_VAR_INIT_PATTERN. GCC 12[1] will not, and will happily ignore the Clang-specific flag. However, its presence on the command-line is both cumbersome and confusing. Due to GCC's tolerant behavior, though, we can continue to use a single Kconfig cc-option test for the feature on both compilers, but then drop the Clang-specific option in the Makefile. In other words, this patch does not change anything other than making the compiler command line shorter once GCC supports -ftrivial-auto-var-init=zero. [1] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a25e0b5e6ac8a77a71c229e0a7b744603365b0e9 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: llvm@lists.linux.dev Fixes: `dcb7c0b946` ("hardening: Clarify Kconfig text for auto-var-init") Suggested-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/20210914102837.6172-1-will@kernel.org/ Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Chao Yu	73687c5391	f2fs: fix to do sanity check on summary info commit `c6ad7fd166` upstream. As Wenqing Liu reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216456 BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs] Read of size 4 at addr ffff8881464dcd80 by task mount/1013 CPU: 3 PID: 1013 Comm: mount Tainted: G W 6.0.0-rc4 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x5e print_report.cold+0xf3/0x68d kasan_report+0xa8/0x130 recover_data+0x63ae/0x6ae0 [f2fs] f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs] f2fs_fill_super+0x4665/0x61e0 [f2fs] mount_bdev+0x2cf/0x3b0 legacy_get_tree+0xed/0x1d0 vfs_get_tree+0x81/0x2b0 path_mount+0x47e/0x19d0 do_mount+0xce/0xf0 __x64_sys_mount+0x12c/0x1a0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size page. - recover_data - do_recover_data - check_index_in_prev_nodes - f2fs_data_blkaddr This patch adds sanity check on summary info in recovery and GC flow in where the flows rely on them. After patch: [ 29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018 Cc: stable@vger.kernel.org Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:28 +02:00
Chao Yu	ed854f10e6	f2fs: fix to do sanity check on destination blkaddr during recovery commit `0ef4ca04a3` upstream. As Wenqing Liu reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216456 loop5: detected capacity change from 0 to 131072 F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1 F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0 F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1 F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0 F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1 F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0 F2FS-fs (loop5): Bitmap was wrongly set, blk:5634 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198 RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs] Call Trace: <TASK> f2fs_do_replace_block+0xa98/0x1890 [f2fs] f2fs_replace_block+0xeb/0x180 [f2fs] recover_data+0x1a69/0x6ae0 [f2fs] f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs] f2fs_fill_super+0x4665/0x61e0 [f2fs] mount_bdev+0x2cf/0x3b0 legacy_get_tree+0xed/0x1d0 vfs_get_tree+0x81/0x2b0 path_mount+0x47e/0x19d0 do_mount+0xce/0xf0 __x64_sys_mount+0x12c/0x1a0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic instead of warning. The root cause is: in fuzzed image, SIT table is inconsistent with inode mapping table, result in triggering such warning during SIT table update. This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this flag, data block recovery flow can check destination blkaddr's validation in SIT table, and skip f2fs_replace_block() to avoid inconsistent status. Cc: stable@vger.kernel.org Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Jaegeuk Kim	7f10357c90	f2fs: increase the limit for reserve_root commit `da35fe96d1` upstream. This patch increases the threshold that limits the reserved root space from 0.2% to 12.5% by using simple shift operation. Typically Android sets 128MB, but if the storage capacity is 32GB, 0.2% which is around 64MB becomes too small. Let's relax it. Cc: stable@vger.kernel.org Reported-by: Aran Dalton <arda@allwinnertech.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Jaegeuk Kim	0035b84223	f2fs: flush pending checkpoints when freezing super commit `c7b5857637` upstream. This avoids -EINVAL when trying to freeze f2fs. Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Jaegeuk Kim	ab49589754	f2fs: complete checkpoints during remount commit `4f99484d27` upstream. Otherwise, pending checkpoints can contribute a race condition to give a quota warning. - Thread - checkpoint thread add checkpoints to the list do_remount() down_write(&sb->s_umount); f2fs_remount() block_operations() down_read_trylock(&sb->s_umount) = 0 up_write(&sb->s_umount); f2fs_quota_sync() dquot_writeback_dquots() WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount)); Or, do_remount() down_write(&sb->s_umount); f2fs_remount() create a ckpt thread f2fs_enable_checkpoint() adds checkpoints wait for f2fs_sync_fs() trigger another pending checkpoint block_operations() down_read_trylock(&sb->s_umount) = 0 up_write(&sb->s_umount); f2fs_quota_sync() dquot_writeback_dquots() WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount)); Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Tetsuo Handa	0a408c6212	btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer commit `cbddcc4fa3` upstream. syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for commit `bc877d285c` ("btrfs: Deduplicate extent_buffer init code") missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must not be moved to after clean_tree_block() because clean_tree_block() is calling btrfs_header_generation() since commit `55c69072d6` ("Btrfs: Fix extent_buffer usage when nodesize != leafsize"). Since memzero_extent_buffer() will reset "struct btrfs_header" part, we can't move btrfs_set_header_generation() to before memzero_extent_buffer(). Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block(). Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1] Reported-by: syzbot <syzbot+fba8e2116a12609b6c59@syzkaller.appspotmail.com> Fixes: `bc877d285c` ("btrfs: Deduplicate extent_buffer init code") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Filipe Manana	4b996a3014	btrfs: fix race between quota enable and quota rescan ioctl commit `331cd94614` upstream. When enabling quotas, at btrfs_quota_enable(), after committing the transaction, we change fs_info->quota_root to point to the quota root we created and set BTRFS_FS_QUOTA_ENABLED at fs_info->flags. Then we try to start the qgroup rescan worker, first by initializing it with a call to qgroup_rescan_init() - however if that fails we end up freeing the quota root but we leave fs_info->quota_root still pointing to it, this can later result in a use-after-free somewhere else. We have previously set the flags BTRFS_FS_QUOTA_ENABLED and BTRFS_QGROUP_STATUS_FLAG_ON, so we can only fail with -EINPROGRESS at btrfs_quota_enable(), which is possible if someone already called the quota rescan ioctl, and therefore started the rescan worker. So fix this by ignoring an -EINPROGRESS and asserting we can't get any other error. Reported-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/linux-btrfs/20220823015931.421355-1-yebin10@huawei.com/ CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Lukas Czerner	0d94230343	fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE commit `cbfecb927f` upstream. Currently the I_DIRTY_TIME will never get set if the inode already has I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's true, however ext4 will only update the on-disk inode in ->dirty_inode(), not on actual writeback. As a result if the inode already has I_DIRTY_INODE state by the time we get to __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled into on-disk inode and will not get updated until the next I_DIRTY_INODE update, which might never come if we crash or get a power failure. The problem can be reproduced on ext4 by running xfstest generic/622 with -o iversion mount option. Fix it by allowing I_DIRTY_TIME to be set even if the inode already has I_DIRTY_INODE. Also make sure that the case is properly handled in writeback_single_inode() as well. Additionally changes in xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. Thanks Jan Kara for suggestions on how to make this work properly. Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: stable@kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220825100657.44217-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:27 +02:00
Mickaël Salaün	95a520b591	ksmbd: Fix user namespace mapping commit `7c88c1e0ab` upstream. A kernel daemon should not rely on the current thread, which is unknown and might be malicious. Before this security fix, ksmbd_override_fsids() didn't correctly override FS UID/GID which means that arbitrary user space threads could trick the kernel to impersonate arbitrary users or groups for file system access checks, leading to file system access bypass. This was found while investigating truncate support for Landlock: https://lore.kernel.org/r/CAKYAXd8fpMJ7guizOjHgxEyyjoUwPsx3jLOPZP=wPYcbhkVXqA@mail.gmail.com Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20220929100447.108468-1-mic@digikod.net Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Zhang Xiaoxu	a19f316406	ksmbd: Fix wrong return value and message length check in smb2_ioctl() commit `b1763d265a` upstream. Commit `c7803b05f7` ("smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common") use the defination of 'struct validate_negotiate_info_req' in smbfs_common, the array length of 'Dialects' changed from 1 to 4, but the protocol does not require the client to send all 4. This lead the request which satisfied with protocol and server to fail. So just ensure the request payload has the 'DialectCount' in smb2_ioctl(), then fsctl_validate_negotiate_info() will use it to validate the payload length and each dialect. Also when the {in, out}_buf_len is less than the required, should goto out to initialize the status in the response header. Fixes: `f7db8fd03a` ("ksmbd: add validation in smb2_ioctl") Cc: stable@vger.kernel.org Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Namjae Jeon	39b6855628	ksmbd: fix endless loop when encryption for response fails commit `360c8ee6fe` upstream. If ->encrypt_resp return error, goto statement cause endless loop. It send an error response immediately after removing it. Fixes: `0626e6641f` ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Hyunwoo Kim	2b0897e336	fbdev: smscufx: Fix use-after-free in ufx_ops_open() commit `5610bcfe86` upstream. A race condition may occur if the user physically removes the USB device while calling open() for this device node. This is a race condition between the ufx_ops_open() function and the ufx_usb_disconnect() function, which may eventually result in UAF. So, add a mutex to the ufx_ops_open() and ufx_usb_disconnect() functions to avoid race contidion of krefs. Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Quentin Schulz	aa7b2c927e	pinctrl: rockchip: add pinmux_ops.gpio_set_direction callback commit `4635c0e2a7` upstream. Before the split of gpio and pinctrl sections in their own driver, rockchip_set_mux was called in pinmux_ops.gpio_set_direction for configuring a pin in its GPIO function. This is essential for cases where pinctrl is "bypassed" by gpio consumers otherwise the GPIO function is not configured for the pin and it does not work. Such was the case for the sysfs/libgpiod userspace GPIO handling. Let's re-implement the pinmux_ops.gpio_set_direction callback so that the gpio subsystem can request from the pinctrl driver to put the pin in its GPIO function. Fixes: `9ce9a02039` ("pinctrl/rockchip: drop the gpio related codes") Cc: stable@vger.kernel.org Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Quentin Schulz <quentin.schulz@theobroma-systems.com> Link: https://lore.kernel.org/r/20220930132033.4003377-2-foss+kernel@0leil.net Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Quentin Schulz	5d97378b36	gpio: rockchip: request GPIO mux to pinctrl when setting direction commit `8ea8af6c84` upstream. Before the split of gpio and pinctrl sections in their own driver, rockchip_set_mux was called in pinmux_ops.gpio_set_direction for configuring a pin in its GPIO function. This is essential for cases where pinctrl is "bypassed" by gpio consumers otherwise the GPIO function is not configured for the pin and it does not work. Such was the case for the sysfs/libgpiod userspace GPIO handling. Let's call pinctrl_gpio_direction_input/output when setting the direction of a GPIO so that the pinctrl core requests from the rockchip pinctrl driver to put the pin in its GPIO function. Fixes: `9ce9a02039` ("pinctrl/rockchip: drop the gpio related codes") Fixes: `936ee2675e` ("gpio/rockchip: add driver for rockchip gpio") Cc: stable@vger.kernel.org Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Quentin Schulz <quentin.schulz@theobroma-systems.com> Link: https://lore.kernel.org/r/20220930132033.4003377-3-foss+kernel@0leil.net Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Saurav Kashyap	e0b1c16fda	scsi: qedf: Populate sysfs attributes for vport commit `592642e6b1` upstream. Few vport parameters were displayed by systool as 'Unknown' or 'NULL'. Copy speed, supported_speed, frame_size and update port_type for NPIV port. Link: https://lore.kernel.org/r/20220919134434.3513-1-njavali@marvell.com Cc: stable@vger.kernel.org Tested-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Krzysztof Kozlowski	1d567179f2	slimbus: qcom-ngd: cleanup in probe error path commit `16f14551d0` upstream. Add proper error path in probe() to cleanup resources previously acquired/allocated to fix warnings visible during probe deferral: notifier callback qcom_slim_ngd_ssr_notify already registered WARNING: CPU: 6 PID: 70 at kernel/notifier.c:28 notifier_chain_register+0x5c/0x90 Modules linked in: CPU: 6 PID: 70 Comm: kworker/u16:1 Not tainted 6.0.0-rc3-next-20220830 #380 Call trace: notifier_chain_register+0x5c/0x90 srcu_notifier_chain_register+0x44/0x90 qcom_register_ssr_notifier+0x38/0x4c qcom_slim_ngd_ctrl_probe+0xd8/0x400 platform_probe+0x6c/0xe0 really_probe+0xbc/0x2d4 __driver_probe_device+0x78/0xe0 driver_probe_device+0x3c/0x12c __device_attach_driver+0xb8/0x120 bus_for_each_drv+0x78/0xd0 __device_attach+0xa8/0x1c0 device_initial_probe+0x18/0x24 bus_probe_device+0xa0/0xac deferred_probe_work_func+0x88/0xc0 process_one_work+0x1d4/0x320 worker_thread+0x2cc/0x44c kthread+0x110/0x114 ret_from_fork+0x10/0x20 Fixes: `e1ae85e183` ("slimbus: qcom-ngd-ctrl: add Protection Domain Restart Support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220916122910.170730-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:26 +02:00
Krzysztof Kozlowski	fa0aab2e45	slimbus: qcom-ngd: use correct error in message of pdr_add_lookup() failure commit `5038d21dde` upstream. Use correct error code, instead of previous 'ret' value, when printing error from pdr_add_lookup() failure. Fixes: `e1ae85e183` ("slimbus: qcom-ngd-ctrl: add Protection Domain Restart Support") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220916122910.170730-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Pali Rohár	ba2159df18	powerpc/boot: Explicitly disable usage of SPE instructions commit `110a58b9f9` upstream. uImage boot wrapper should not use SPE instructions, like kernel itself. Boot wrapper has already disabled Altivec and VSX instructions but not SPE. Options -mno-spe and -mspe=no already set when compilation of kernel, but not when compiling uImage wrapper yet. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220827134454.17365-1-pali@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Zhang Rui	9df2a9cdad	powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domain commit `4c081324df` upstream. Intel Xeon servers used to use a fixed energy resolution (15.3uj) for Dram RAPL domain. But on SPR, Dram RAPL domain follows the standard energy resolution as described in MSR_RAPL_POWER_UNIT. Remove the SPR dram_domain_energy_unit quirk. Fixes: `2d798d9f59` ("powercap: intel_rapl: add support for Sapphire Rapids") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Chuck Lever	75d9de25a6	NFSD: Protect against send buffer overflow in NFSv3 READ commit `fa6be9cc6e` upstream. Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Chuck Lever	2be9331ca6	NFSD: Protect against send buffer overflow in NFSv2 READ commit `401bc1f908` upstream. Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Chuck Lever	071a076fd1	NFSD: Protect against send buffer overflow in NFSv3 READDIR commit `640f87c190` upstream. Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply message at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this issue. Reported-by: Ben Ronallo <Benjamin.Ronallo@synopsys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Maciej W. Rozycki	209a94c519	serial: 8250: Request full 16550A feature probing for OxSemi PCIe devices commit `00b7a4d4ee` upstream. Oxford Semiconductor PCIe (Tornado) 950 serial port devices need to operate in the enhanced mode via the EFR register for the Divide-by-M N/8 baud rate generator prescaler to be used in their native UART mode. Otherwise the prescaler is fixed at 1 causing grossly incorrect baud rates to be programmed. Accessing the EFR register requires 16550A features to have been probed for, so request this to happen regardless of SERIAL_8250_16550A_VARIANTS by setting UPF_FULL_PROBE in port flags. Fixes: `366f6c955d` ("serial: 8250: Add proper clock handling for OxSemi PCIe devices") Cc: stable@vger.kernel.org # v5.19+ Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209210005040.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:25 +02:00
Maciej W. Rozycki	63a3d75cf1	serial: 8250: Let drivers request full 16550A feature probing commit `9906890c89` upstream. A SERIAL_8250_16550A_VARIANTS configuration option has been recently defined that lets one request the 8250 driver not to probe for 16550A device features so as to reduce the driver's device startup time in virtual machines. Some actual hardware devices require these features to have been fully determined however for their driver to work correctly, so define a flag to let drivers request full 16550A feature probing on a device-by-device basis if required regardless of the SERIAL_8250_16550A_VARIANTS option setting chosen. Fixes: `dc56ecb81a` ("serial: 8250: Support disabling mdelay-filled probes of 16550A variants") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209202357520.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Maciej W. Rozycki	26e5c79e67	PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge commit `0e32818397` upstream. When pci_assign_resource() is unable to assign resources to a BAR, it uses pci_revert_fw_address() to fall back to a firmware assignment (if any). Previously pci_revert_fw_address() assumed all addresses could reach the device, but this is not true if the device is below a bridge that only forwards addresses within its windows. This problem was observed on a Tyan Tomcat IV S1564D system where the BIOS did not assign valid addresses to several bridges and USB devices: pci 0000:00:11.0: PCI-to-PCIe bridge to [bus 01-ff] pci 0000:00:11.0: bridge window [io 0xe000-0xefff] pci 0000:01:00.0: PCIe Upstream Port to [bus 02-ff] pci 0000:01:00.0: bridge window [io 0x0000-0x0fff] # unreachable pci 0000:02:02.0: PCIe Downstream Port to [bus 05-ff] pci 0000:02:02.0: bridge window [io 0x0000-0x0fff] # unreachable pci 0000:05:00.0: PCIe-to-PCI bridge to [bus 06-ff] pci 0000:05:00.0: bridge window [io 0x0000-0x0fff] # unreachable pci 0000:06:08.0: USB UHCI 1.1 pci 0000:06:08.0: BAR 4: [io 0xfce0-0xfcff] # unreachable pci 0000:06:08.1: USB UHCI 1.1 pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] # unreachable pci 0000:06:08.0: can't claim BAR 4 [io 0xfce0-0xfcff]: no compatible bridge window pci 0000:06:08.1: can't claim BAR 4 [io 0xfce0-0xfcff]: no compatible bridge window During the first pass of assigning unassigned resources, there was not enough I/O space available, so we couldn't assign the 06:08.0 BAR and reverted to the firmware assignment (still unreachable). Reverting the 06:08.1 assignment failed because it conflicted with 06:08.0: pci 0000:00:11.0: bridge window [io 0xe000-0xefff] pci 0000:01:00.0: no space for bridge window [io size 0x2000] pci 0000:02:02.0: no space for bridge window [io size 0x1000] pci 0000:05:00.0: no space for bridge window [io size 0x1000] pci 0000:06:08.0: BAR 4: no space for [io size 0x0020] pci 0000:06:08.0: BAR 4: trying firmware assignment [io 0xfce0-0xfcff] pci 0000:06:08.1: BAR 4: no space for [io size 0x0020] pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff] pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff] A subsequent pass assigned valid bridge windows and a valid 06:08.1 BAR, but left the 06:08.0 BAR alone, so the UHCI device was still unusable: pci 0000:00:11.0: bridge window [io 0xe000-0xefff] released pci 0000:00:11.0: bridge window [io 0x1000-0x2fff] # reassigned pci 0000:01:00.0: bridge window [io 0x1000-0x2fff] # reassigned pci 0000:02:02.0: bridge window [io 0x2000-0x2fff] # reassigned pci 0000:05:00.0: bridge window [io 0x2000-0x2fff] # reassigned pci 0000:06:08.0: BAR 4: assigned [io 0xfce0-0xfcff] # left alone pci 0000:06:08.1: BAR 4: assigned [io 0x2000-0x201f] ... uhci_hcd 0000:06:08.0: host system error, PCI problems? uhci_hcd 0000:06:08.0: host controller process error, something bad happened! uhci_hcd 0000:06:08.0: host controller halted, very bad! uhci_hcd 0000:06:08.0: HCRESET not completed yet! uhci_hcd 0000:06:08.0: HC died; cleaning up If the address assigned by firmware is not reachable because it's not within upstream bridge windows, fail instead of assigning the unusable address from firmware. [bhelgaas: commit log, use pci_upstream_bridge()] Link: https://bugzilla.kernel.org/show_bug.cgi?id=16263 Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203012338460.46819@angie.orcam.me.uk Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209211921250.29493@angie.orcam.me.uk Fixes: `58c84eda07` ("PCI: fall back to original BIOS BAR addresses") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v2.6.35+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
M. Vefa Bicakci	7c16d0a4e6	xen/gntdev: Accommodate VMA splitting commit `5c13a4a029` upstream. Prior to this commit, the gntdev driver code did not handle the following scenario correctly with paravirtualized (PV) Xen domains: * User process sets up a gntdev mapping composed of two grant mappings (i.e., two pages shared by another Xen domain). * User process munmap()s one of the pages. * User process munmap()s the remaining page. * User process exits. In the scenario above, the user process would cause the kernel to log the following messages in dmesg for the first munmap(), and the second munmap() call would result in similar log messages: BUG: Bad page map in process doublemap.test pte:... pmd:... page:0000000057c97bff refcount:1 mapcount:-1 \ mapping:0000000000000000 index:0x0 pfn:... ... page dumped because: bad pte ... file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0 ... Call Trace: <TASK> dump_stack_lvl+0x46/0x5e print_bad_pte.cold+0x66/0xb6 unmap_page_range+0x7e5/0xdc0 unmap_vmas+0x78/0xf0 unmap_region+0xa8/0x110 __do_munmap+0x1ea/0x4e0 __vm_munmap+0x75/0x120 __x64_sys_munmap+0x28/0x40 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb ... For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG) would print out the following and trigger a general protection fault in the affected Xen PV domain: (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ... (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ... As of this writing, gntdev_grant_map structure's vma field (referred to as map->vma below) is mainly used for checking the start and end addresses of mappings. However, with split VMAs, these may change, and there could be more than one VMA associated with a gntdev mapping. Hence, remove the use of map->vma and rely on map->pages_vm_start for the original start address and on (map->count << PAGE_SHIFT) for the original mapping size. Let the invalidate() and find_special_page() hooks use these. Also, given that there can be multiple VMAs associated with a gntdev mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to the end of gntdev_put_map, so that the MMU notifier is only removed after the closing of the last remaining VMA. Finally, use an atomic to prevent inadvertent gntdev mapping re-use, instead of using the map->live_grants atomic counter and/or the map->vma pointer (the latter of which is now removed). This prevents the userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the same address range as a previously set up gntdev mapping. This scenario can be summarized with the following call-trace, which was valid prior to this commit: mmap gntdev_mmap mmap (repeat mmap with MAP_FIXED over the same address range) gntdev_invalidate unmap_grant_pages (sets 'being_removed' entries to true) gnttab_unmap_refs_async unmap_single_vma gntdev_mmap (maps the shared pages again) munmap gntdev_invalidate unmap_grant_pages (no-op because 'being_removed' entries are true) unmap_single_vma (For PV domains, Xen reports that a granted page is being unmapped and triggers a general protection fault in the affected domain, if Xen was built with CONFIG_DEBUG) The fix for this last scenario could be worth its own commit, but we opted for a single commit, because removing the gntdev_grant_map structure's vma field requires guarding the entry to gntdev_mmap(), and the live_grants atomic counter is not sufficient on its own to prevent the mmap() over a pre-existing mapping. Link: https://github.com/QubesOS/qubes-issues/issues/7631 Fixes: `ab31523c2f` ("xen/gntdev: allow usermode to map granted pages") Cc: stable@vger.kernel.org Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221002222006.2077-3-m.v.b@runbox.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
M. Vefa Bicakci	1cb73704cb	xen/gntdev: Prevent leaking grants commit `0991028cd4` upstream. Prior to this commit, if a grant mapping operation failed partially, some of the entries in the map_ops array would be invalid, whereas all of the entries in the kmap_ops array would be valid. This in turn would cause the following logic in gntdev_map_grant_pages to become invalid: for (i = 0; i < map->count; i++) { if (map->map_ops[i].status == GNTST_okay) { map->unmap_ops[i].handle = map->map_ops[i].handle; if (!use_ptemod) alloced++; } if (use_ptemod) { if (map->kmap_ops[i].status == GNTST_okay) { if (map->map_ops[i].status == GNTST_okay) alloced++; map->kunmap_ops[i].handle = map->kmap_ops[i].handle; } } } ... atomic_add(alloced, &map->live_grants); Assume that use_ptemod is true (i.e., the domain mapping the granted pages is a paravirtualized domain). In the code excerpt above, note that the "alloced" variable is only incremented when both kmap_ops[i].status and map_ops[i].status are set to GNTST_okay (i.e., both mapping operations are successful). However, as also noted above, there are cases where a grant mapping operation fails partially, breaking the assumption of the code excerpt above. The aforementioned causes map->live_grants to be incorrectly set. In some cases, all of the map_ops mappings fail, but all of the kmap_ops mappings succeed, meaning that live_grants may remain zero. This in turn makes it impossible to unmap the successfully grant-mapped pages pointed to by kmap_ops, because unmap_grant_pages has the following snippet of code at its beginning: if (atomic_read(&map->live_grants) == 0) return; /* Nothing to do */ In other cases where only some of the map_ops mappings fail but all kmap_ops mappings succeed, live_grants is made positive, but when the user requests unmapping the grant-mapped pages, __unmap_grant_pages_done will then make map->live_grants negative, because the latter function does not check if all of the pages that were requested to be unmapped were actually unmapped, and the same function unconditionally subtracts "data->count" (i.e., a value that can be greater than map->live_grants) from map->live_grants. The side effects of a negative live_grants value have not been studied. The net effect of all of this is that grant references are leaked in one of the above conditions. In Qubes OS v4.1 (which uses Xen's grant mechanism extensively for X11 GUI isolation), this issue manifests itself with warning messages like the following to be printed out by the Linux kernel in the VM that had granted pages (that contain X11 GUI window data) to dom0: "g.e. 0x1234 still pending", especially after the user rapidly resizes GUI VM windows (causing some grant-mapping operations to partially or completely fail, due to the fact that the VM unshares some of the pages as part of the window resizing, making the pages impossible to grant-map from dom0). The fix for this issue involves counting all successful map_ops and kmap_ops mappings separately, and then adding the sum to live_grants. During unmapping, only the number of successfully unmapped grants is subtracted from live_grants. The code is also modified to check for negative live_grants values after the subtraction and warn the user. Link: https://github.com/QubesOS/qubes-issues/issues/7631 Fixes: `dbe97cff7d` ("xen/gntdev: Avoid blocking in unmap_grant_pages()") Cc: stable@vger.kernel.org Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221002222006.2077-2-m.v.b@runbox.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Carlos Llamas	43bed0a13a	mm/mmap: undo ->mmap() when arch_validate_flags() fails commit `deb0f65628` upstream. Commit `c462ac288f` ("mm: Introduce arch_validate_flags()") added a late check in mmap_region() to let architectures validate vm_flags. The check needs to happen after calling ->mmap() as the flags can potentially be modified during this callback. If arch_validate_flags() check fails we unmap and free the vma. However, the error path fails to undo the ->mmap() call that previously succeeded and depending on the specific ->mmap() implementation this translates to reference increments, memory allocations and other operations what will not be cleaned up. There are several places (mainly device drivers) where this is an issue. However, one specific example is bpf_map_mmap() which keeps count of the mappings in map->writecnt. The count is incremented on ->mmap() and then decremented on vm_ops->close(). When arch_validate_flags() fails this count is off since bpf_map_mmap_close() is never called. One can reproduce this issue in arm64 devices with MTE support. Here the vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set previously. From userspace then is enough to pass the PROT_MTE flag to mmap() syscall to trigger the arch_validate_flags() failure. The following program reproduces this issue: #include <stdio.h> #include <unistd.h> #include <linux/unistd.h> #include <linux/bpf.h> #include <sys/mman.h> int main(void) { union bpf_attr attr = { .map_type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(long long), .max_entries = 256, .map_flags = BPF_F_MMAPABLE, }; int fd; fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); mmap(NULL, 4096, PROT_WRITE \| PROT_MTE, MAP_SHARED, fd, 0); return 0; } By manually adding some log statements to the vm_ops callbacks we can confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon ->release(): With PROT_MTE flag: root@debian:~# ./bpf-test [ 111.263874] bpf_map_write_active_inc: map=9 writecnt=1 [ 111.288763] bpf_map_release: map=9 writecnt=1 Without PROT_MTE flag: root@debian:~# ./bpf-test [ 157.816912] bpf_map_write_active_inc: map=10 writecnt=1 [ 157.830442] bpf_map_write_active_dec: map=10 writecnt=0 [ 157.832396] bpf_map_release: map=10 writecnt=0 This patch fixes the above issue by calling vm_ops->close() when the arch_validate_flags() check fails, after this we can proceed to unmap and free the vma on the error path. Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com Fixes: `c462ac288f` ("mm: Introduce arch_validate_flags()") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Liam Howlett <liam.howlett@oracle.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Baolin Wang	2b0072d33e	mm/damon: validate if the pmd entry is present before accessing commit `c8b9aff419` upstream. pmd_huge() is used to validate if the pmd entry is mapped by a huge page, also including the case of non-present (migration or hwpoisoned) pmd entry on arm64 or x86 architectures. This means that pmd_pfn() can not get the correct pfn number for a non-present pmd entry, which will cause damon_get_page() to get an incorrect page struct (also may be NULL by pfn_to_online_page()), making the access statistics incorrect. This means that the DAMON may make incorrect decision according to the incorrect statistics, for example, DAMON may can not reclaim cold page in time due to this cold page was regarded as accessed mistakenly if DAMOS_PAGEOUT operation is specified. Moreover it does not make sense that we still waste time to get the page of the non-present entry. Just treat it as not-accessed and skip it, which maintains consistency with non-present pte level entries. So add pmd entry present validation to fix the above issues. Link: https://lkml.kernel.org/r/58b1d1f5fbda7db49ca886d9ef6783e3dcbbbc98.1660805030.git.baolin.wang@linux.alibaba.com Fixes: `3f49584b26` ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
James Morse	91c4eb16e8	arm64: errata: Add Cortex-A55 to the repeat tlbi list commit `171df58028` upstream. Cortex-A55 is affected by an erratum where in rare circumstances the CPUs may not handle a race between a break-before-make sequence on one CPU, and another CPU accessing the same page. This could allow a store to a page that has been unmapped. Work around this by adding the affected CPUs to the list that needs TLB sequences to be done twice. Signed-off-by: James Morse <james.morse@arm.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220930131959.3082594-1-james.morse@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Takashi Iwai	fc0f921b7e	drm/udl: Restore display mode on resume commit `6d6e732835` upstream. Restore the display mode whne resuming from suspend. Currently, the display remains dark. On resume, the CRTC's mode does not change, but the 'active' flag changes to 'true'. Taking this into account when considering a mode switch restores the display mode. The bug is reproducable by using Gnome with udl and observing the adapter's suspend/resume behavior. Actually, the whole check added in udl_simple_display_pipe_enable() about the crtc_state->mode_changed was bogus. We should drop the whole check and always apply the mode change in this function. [ tiwai -- Drop the mode_changed check entirely instead, per Daniel's suggestion ] Fixes: `997d33c356` ("drm/udl: Inline DPMS code into CRTC enable and disable functions") Cc: <stable@vger.kernel.org> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220908095115.23396-2-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Dmitry Osipenko	0640934725	drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb() commit `4656b3a26a` upstream. Make virtio_gpu_plane_cleanup_fb() to clean the state which DRM core wants to clean up and not the current plane's state. Normally the older atomic state is cleaned up, but the newer state could also be cleaned up in case of aborted commits. Cc: stable@vger.kernel.org Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220630200726.1884320-6-dmitry.osipenko@collabora.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:24 +02:00
Dmitry Osipenko	fb3910436b	drm/virtio: Unlock reservations on virtio_gpu_object_shmem_init() error commit `fdf0ff4d12` upstream. Unlock reservations in the error code path of virtio_gpu_object_create() to silence debug warning splat produced by ww_mutex_destroy(&obj->lock) when GEM is released with the held lock. Cc: stable@vger.kernel.org Fixes: `30172efbfb` ("drm/virtio: blob prep: refactor getting pages and attaching backing") Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220630200726.1884320-4-dmitry.osipenko@collabora.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Dmitry Osipenko	f122bcb34f	drm/virtio: Check whether transferred 2D BO is shmem commit `e473216b42` upstream. Transferred 2D BO always must be a shmem BO. Add check for that to prevent NULL dereference if userspace passes a VRAM BO. Cc: stable@vger.kernel.org Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220630200726.1884320-3-dmitry.osipenko@collabora.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Dario Binacchi	a95fb5d55a	dmaengine: mxs: use platform_driver_register commit `26696d4657` upstream. Driver registration fails on SOC imx8mn as its supplier, the clock control module, is probed later than subsys initcall level. This driver uses platform_driver_probe which is not compatible with deferred probing and won't be probed again later if probe function fails due to clock not being available at that time. This patch replaces the use of platform_driver_probe with platform_driver_register which will allow probing the driver later again when the clock control module will be available. The __init annotation has been dropped because it is not compatible with deferred probing. The code is not executed once and its memory cannot be freed. Fixes: `a580b8c542` ("dmaengine: mxs-dma: add dma support for i.MX23/28") Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220921170556.1055962-1-dario.binacchi@amarulasolutions.com Signed-off-by: Vinod Koul <vkoul@kernel.org>	2022-10-26 12:34:23 +02:00
Hamza Mahfooz	e7a3334e83	Revert "drm/amdgpu: use dirty framebuffer helper" commit `17d819e282` upstream. This reverts commit `66f99628eb`. Unfortunately, that commit causes performance regressions on non-PSR setups. So, just revert it until FB_DAMAGE_CLIPS support can be added. Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2189 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216554 Fixes: `66f99628eb` ("drm/amdgpu: use dirty framebuffer helper") Fixes: `abbc7a3daf` ("drm/amdgpu: don't register a dirty callback for non-atomic") Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Rishabh Bhatnagar	4bdedc3b53	nvme-pci: set min_align_mask before calculating max_hw_sectors commit `61ce339f19` upstream. If swiotlb is force enabled dma_max_mapping_size ends up calling swiotlb_max_mapping_size which takes into account the min align mask for the device. Set the min align mask for nvme driver before calling dma_max_mapping_size while calculating max hw sectors. Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Sagi Grimberg	32aa0b3f0c	nvme-multipath: fix possible hang in live ns resize with ANA access commit `72e3b8883a` upstream. When we revalidate paths as part of ns size change (as of commit `e7d65803e2`), it is possible that during the path revalidation, the only paths that is IO capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet informed to the host, which will cause inflight requests to be requeued (as we have available paths but none are IO capable). These requests on the requeue list are waiting for someone to resubmit them at some point. The IO capable paths will eventually notify the ns resize change to the host, but there is nothing that will kick the requeue list to resubmit the queued requests. Fix this by always kicking the requeue list, and if no IO capable path exists, these requests will be queued again. A typical log that indicates that IOs are requeued: -- nvme nvme1: creating 4 I/O queues. nvme nvme1: new ctrl: "testnqn1" nvme nvme2: creating 4 I/O queues. nvme nvme2: mapped 4/0/0 default/read/poll queues. nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009 nvme nvme1: rescanning namespaces. nvme1n1: detected capacity change from 2097152 to 4194304 block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O nvme nvme2: rescanning namespaces. -- Reported-by: Yogev Cohen <yogev@lightbitslabs.com> Fixes: `e7d65803e2` ("nvme-multipath: revalidate paths during rescan") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Gaosheng Cui	9391cc3a78	nvmem: core: Fix memleak in nvmem_register() commit `bd1244561f` upstream. dev_set_name will alloc memory for nvmem->dev.kobj.name in nvmem_register, when nvmem_validate_keepouts failed, nvmem's memory will be freed and return, but nobody will free memory for nvmem->dev.kobj.name, there will be memleak, so moving nvmem_validate_keepouts() after device_register() and let the device core deal with cleaning name in error cases. Fixes: `de0534df93` ("nvmem: core: fix error handling while validating keepout regions") Cc: stable@vger.kernel.org Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220916120402.38753-1-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Huacai Chen	7efe61dc6a	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK commit `16c546e148` upstream. When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected, cpu_max_bits_warn() generates a runtime warning similar as below while we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit) instead of NR_CPUS to iterate CPUs. [ 3.052463] ------------[ cut here ]------------ [ 3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0 [ 3.070072] Modules linked in: efivarfs autofs4 [ 3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052 [ 3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000 [ 3.109127] 9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430 [ 3.118774] 90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff [ 3.128412] 0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890 [ 3.138056] 0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa [ 3.147711] ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000 [ 3.157364] 900000000101c998 0000000000000004 9000000000ef7430 0000000000000000 [ 3.167012] 0000000000000009 000000000000006c 0000000000000000 0000000000000000 [ 3.176641] 9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286 [ 3.186260] 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c [ 3.195868] ... [ 3.199917] Call Trace: [ 3.203941] [<90000000002086d8>] show_stack+0x38/0x14c [ 3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88 [ 3.217625] [<900000000023d268>] __warn+0xd0/0x100 [ 3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc [ 3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0 [ 3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4 [ 3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4 [ 3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0 [ 3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100 [ 3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94 [ 3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160 [ 3.281824] ---[ end trace 8b484262b4b8c24c ]--- Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:23 +02:00
Fangrui Song	81ab826a28	riscv: Pass -mno-relax only on lld < 15.0.0 commit `3cebf80e9a` upstream. lld since llvm:6611d58f5bbc ("[ELF] Relax R_RISCV_ALIGN"), which will be included in the 15.0.0 release, has implemented some RISC-V linker relaxation. -mno-relax is no longer needed in KBUILD_CFLAGS/KBUILD_AFLAGS to suppress R_RISCV_ALIGN which older lld can not handle: ld.lld: error: capability.c:(.fixup+0x0): relocation R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with -mno-relax but the .o is already compiled with -mno-relax Signed-off-by: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/r/20220710071117.446112-1-maskray@google.com/ Link: https://lore.kernel.org/r/20220918092933.19943-1-palmer@rivosinc.com Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:22 +02:00
Wenting Zhang	7780bb02a0	riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb commit `10f6913c54` upstream. When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be used regardless of the result of device tree scanning. This especially fixes the case where a device tree without the chosen node is supplied to the kernel. In such cases, early_init_dt_scan would return true. But inside early_init_dt_scan_chosen, the cmdline won't be updated as there is no chosen node in the device tree. As a result, CONFIG_CMDLINE is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE is enabled. This commit allows properly update boot_command_line in this situation. Fixes: `8fd6e05c74` ("arch: riscv: support kernel command line forcing when no DTB passed") Signed-off-by: Wenting Zhang <zephray@outlook.com> Reviewed-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/PSBPR04MB399135DFC54928AB958D0638B1829@PSBPR04MB3991.apcprd04.prod.outlook.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:22 +02:00
Andrew Bresticker	c657b70e80	riscv: Make VM_WRITE imply VM_READ commit `7ab72c5973` upstream. RISC-V does not presently have write-only mappings as that PTE bit pattern is considered reserved in the privileged spec, so allow handling of read faults in VMAs that have VM_WRITE without VM_READ in order to be consistent with other architectures that have similar limitations. Fixes: `2139619bca` ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220915193702.2201018-2-abrestic@rivosinc.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:22 +02:00
Andrew Bresticker	3c3c4fa118	riscv: Allow PROT_WRITE-only mmap() commit `9e2e6042a7` upstream. Commit `2139619bca` ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout PROT_READ with the justification that a write-only PTE is considered a reserved PTE permission bit pattern in the privileged spec. This check is unnecessary since we let VM_WRITE imply VM_READ on RISC-V, and it is inconsistent with other architectures that don't support write-only PTEs, creating a potential software portability issue. Just remove the check altogether and let PROT_WRITE imply PROT_READ as is the case on other architectures. Note that this also allows PROT_WRITE\|PROT_EXEC mappings which were disallowed prior to the aforementioned commit; PROT_READ is implied in such mappings as well. Fixes: `2139619bca` ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid") Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220915193702.2201018-3-abrestic@rivosinc.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:22 +02:00
Helge Deller	af3aaee08d	parisc: fbdev/stifb: Align graphics memory size to 4MB commit `aca7c13d3b` upstream. Independend of the current graphics resolution, adjust the reported graphics card memory size to the next 4MB boundary. This fixes the fbtest program which expects a naturally aligned size. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:22 +02:00

1 2 3 4 5 ...

1055869 commits