linux-stable

Commit Graph

Author	SHA1	Message	Date
Naoya Horiguchi	e37e7b0b3b	mm, hwpoison: fix condition in free hugetlb page path When a memory error hits a tail page of a free hugepage, __page_handle_poison() is expected to be called to isolate the error in 4kB unit, but it's not called due to the outdated if-condition in memory_failure_hugetlb(). This loses the chance to isolate the error in the finer unit, so it's not optimal. Drop the condition. This "(p != head && TestSetPageHWPoison(head)" condition is based on the old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was set on the subpage), so it's not necessray any more. By getting to set PG_hwpoison on head page for hugepages, concurrent error events on different subpages in a single hugepage can be prevented by TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb(). So dropping the condition should not reopen the race window originally mentioned in commit `b985194c8c` ("hwpoison, hugetlb: lock_page/unlock_page does not match for handling a free hugepage") [naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter] Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004 Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Fei Luo <luofei@unicloud.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> [5.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-25 12:20:55 -08:00
Randy Dunlap	7e5b901e46	MAINTAINERS: mark more list instances as moderated Some lists that are moderated are not marked as moderated consistently, so mark them all as moderated. Link: https://lkml.kernel.org/r/20211209001330.18558-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Conor Culhane <conor.culhane@silvaco.com> Cc: Ryder Lee <ryder.lee@mediatek.com> Cc: Jianjun Wang <jianjun.wang@mediatek.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-25 12:20:55 -08:00
Philipp Rudo	71d2bcec2d	kernel/crash_core: suppress unknown crashkernel parameter warning When booting with crashkernel= on the kernel command line a warning similar to Kernel command line: ro console=ttyS0 crashkernel=256M Unknown kernel command line parameters "crashkernel=256M", will be passed to user space. is printed. This comes from crashkernel= being parsed independent from the kernel parameter handling mechanism. So the code in init/main.c doesn't know that crashkernel= is a valid kernel parameter and prints this incorrect warning. Suppress the warning by adding a dummy early_param handler for crashkernel=. Link: https://lkml.kernel.org/r/20211208133443.6867-1-prudo@redhat.com Fixes: `86d1919a4f` ("init: print out unknown kernel parameters") Signed-off-by: Philipp Rudo <prudo@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-25 12:20:55 -08:00
Andrey Ryabinin	3386353406	mm: mempolicy: fix THP allocations escaping mempolicy restrictions alloc_pages_vma() may try to allocate THP page on the local NUMA node first: page = __alloc_pages_node(hpage_node, gfp \| __GFP_THISNODE \| __GFP_NORETRY, order); And if the allocation fails it retries allowing remote memory: if (!page && (gfp & __GFP_DIRECT_RECLAIM)) page = __alloc_pages_node(hpage_node, gfp, order); However, this retry allocation completely ignores memory policy nodemask allowing allocation to escape restrictions. The first appearance of this bug seems to be the commit `ac5b2c1891` ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings"). The bug disappeared later in the commit `89c83fb539` ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and reappeared again in slightly different form in the commit `76e654cc91` ("mm, page_alloc: allow hugepage fallback to remote nodes when madvised") Fix this by passing correct nodemask to the __alloc_pages() call. The demonstration/reproducer of the problem: $ mount -oremount,size=4G,huge=always /dev/shm/ $ echo always > /sys/kernel/mm/transparent_hugepage/defrag $ cat mbind_thp.c #include <unistd.h> #include <sys/mman.h> #include <sys/stat.h> #include <fcntl.h> #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <numaif.h> #define SIZE 2ULL << 30 int main(int argc, char *argv) { int fd; unsigned long long i; char addr; pid_t pid; char buf[100]; unsigned long nodemask = 1; fd = open("/dev/shm/test", O_RDWR\|O_CREAT); assert(fd > 0); assert(ftruncate(fd, SIZE) == 0); addr = mmap(NULL, SIZE, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT\|MPOL_MF_MOVE)==0); for (i = 0; i < SIZE; i+=4096) { addr[i] = 1; } pid = getpid(); snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid); system(buf); sleep(10000); return 0; } $ gcc mbind_thp.c -o mbind_thp -lnuma $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 1918 MB node 0 free: 1595 MB node 1 cpus: 1 3 node 1 size: 2014 MB node 1 free: 1731 MB node distances: node 0 1 0: 10 20 1: 20 10 $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp 7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4 Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com Fixes: `ac5b2c1891` ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-25 12:20:55 -08:00
Baokun Li	0129ab1f26	kfence: fix memory leak when cat kfence objects Hulk robot reported a kmemleak problem: unreferenced object 0xffff93d1d8cc02e8 (size 248): comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s) hex dump (first 32 bytes): 00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: seq_open+0x2a/0x80 full_proxy_open+0x167/0x1e0 do_dentry_open+0x1e1/0x3a0 path_openat+0x961/0xa20 do_filp_open+0xae/0x120 do_sys_openat2+0x216/0x2f0 do_sys_open+0x57/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff93d419854000 (size 4096): comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s) hex dump (first 32 bytes): 6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30 kfence-#250: 0x0 30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d 0000000754bda12- backtrace: seq_read_iter+0x313/0x440 seq_read+0x14b/0x1a0 full_proxy_read+0x56/0x80 vfs_read+0xa5/0x1b0 ksys_read+0xa0/0xf0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 I find that we can easily reproduce this problem with the following commands: cat /sys/kernel/debug/kfence/objects echo scan > /sys/kernel/debug/kmemleak cat /sys/kernel/debug/kmemleak The leaked memory is allocated in the stack below: do_syscall_64 do_sys_open do_dentry_open full_proxy_open seq_open ---> alloc seq_file vfs_read full_proxy_read seq_read seq_read_iter traverse ---> alloc seq_buf And it should have been released in the following process: do_syscall_64 syscall_exit_to_user_mode exit_to_user_mode_prepare task_work_run ____fput __fput full_proxy_release ---> free here However, the release function corresponding to file_operations is not implemented in kfence. As a result, a memory leak occurs. Therefore, the solution to this problem is to implement the corresponding release function. Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com Fixes: `0ce20dd840` ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-12-25 12:20:55 -08:00
Linus Torvalds	bc491fb125	memblock: fix memblock_phys_alloc() section mismatch error There are section mismatch errors when compiler refuses to inline one-line wrapper memblock_phys_alloc(). Make memblock_phys_alloc() __always_inline to avoid these mismatch issues. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmHDZCMTHHJwcHRAbGlu dXguaWJtLmNvbQAKCRA5A4Ymyw79kSOpCACxKqKuiKf/AY1ckeSV7Y+zBWQrbPVj UoQ459RqhbkEAQYdFLW10T2PiNC7zVxgsNUBafy+8OAh/TSIXDH+pvfk1TbWZBFb G+Of0FASwiEmjir41HEie41MT6emtmjiWFlGBLFCBrA2nbZj4doC3UCV0F0TpN6I pq1sczkEeYT9Bu51t+qStW4zj6ICLvxAFlsB26jVFrUfv6Tg4r4JmCLrDzrgwjh7 pRUzghOq4poSIUO1HH/LGQvNkzyEtx47yi80Jp3nHRU2XjdPf7uRMTOhdplx56cm I57h35u1BmuxS0Zm7fBxMlc0ima44eG4eWJcQlBLAAdnVRvQr3l/vW+R =7qHX -----END PGP SIGNATURE----- Merge tag 'fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "Fix memblock_phys_alloc() section mismatch error There are section mismatch errors when compiler refuses to inline one-line wrapper memblock_phys_alloc(). Make memblock_phys_alloc() __always_inline to avoid these mismatch issues" * tag 'fixes-2021-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: fix memblock_phys_alloc() section mismatch error	2021-12-22 11:39:53 -08:00
Linus Torvalds	3f066e882b	parisc architecture late bug fixes for kernel v5.16 - Fix a bug in the C code which calculates the relevant futex spinlock based on the futex virtual address. In some cases a wrong spinlock (compared to what is calculated in the assembly code path) was choosen which then can lead to deadlocks. - The 64-bit kernel missed to clip the LWS number in the Light-weight-syscall path for 32-bit processes. - Prevent CPU register dump to show stale value in IIR register on access rights traps. - Remove unused ARCH_DEFCONFIG entries. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYcNELQAKCRD3ErUQojoP X+pLAP9Ufu4MYYdEzm7uMTPgaeg0HryHS5zVR9w9iLboK2HzkgD/S1I15E61smx0 Az/mhwr/uFtPd5tzAU1qUP7HMFk0wgg= =YhZ8 -----END PGP SIGNATURE----- Merge tag 'for-5.16/parisc-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: - Fix a bug in the C code which calculates the relevant futex spinlock based on the futex virtual address. In some cases a wrong spinlock (compared to what is calculated in the assembly code path) was choosen which then can lead to deadlocks. - The 64-bit kernel missed to clip the LWS number in the Light-weight-syscall path for 32-bit processes. - Prevent CPU register dump to show stale value in IIR register on access rights traps. - Remove unused ARCH_DEFCONFIG entries. * tag 'for-5.16/parisc-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: remove ARCH_DEFCONFIG parisc: Fix mask used to select futex spinlock parisc: Correct completer in lws start parisc: Clear stale IIR value on instruction access rights trap	2021-12-22 10:17:16 -08:00
Linus Torvalds	0740040580	Fix some IPMI crashes Some crash fixes have come in dealing with various error handling issues. They have sat in next for 5 days or more without issue, and they are fairly critical. -corey -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE/Q1c5nzg9ZpmiCaGYfOMkJGb/4EFAmHDMeUACgkQYfOMkJGb /4GZcw//dwGQ4Ky4QaDxjE513EskfwM+Tdd688nBzN72xBY9KXnxsnbTVJmPJbUd QYgzCxBdQ9PW/1vei5DAtbOTvLbknQ9IY3UYvgOVmdmC5xHkGlws835yqnwv8rf7 CxBMOKIEsRr4rl3V4gDqchGeJr1gGMXRgJrPCSrDLh7Ontl07jEE89aOz3SoGT4/ 5pzAOPUtLzkwrFjSbqs9LK8+YT0DvQ4DwZo3dbqoECId7k1t8v4KuG5sTc8mDg14 FsHhSRPVqLaSRnhkg7tQTocKpMsnPDmvnW7GMejrCS8mfe1yp5dUPZDAk4d/eY2B ++zxvWV4jokmgxAEMkFXZ86VLxxRhL5nYKF0g8j7RBAIJYDPci9zTw+e9su73OSs moWarqDABqfBz5yMFUAJgjwHXJwlsWITLAFOJki8Jjf5asD/Q+zmOqW+T7Yczlca NmjH3w/WsXuJZSaMqOWXAzU7pMLtI68I2gWHuAfxjnRIXZ78ywrqLdZoDzWJ7ZFJ AbWqlfQ9AobtcOJ34W02Ktl0FH2y/IPAJVQNa2/CPPiZqKGJDpCVXkTneP91zage p0iWXLNhDMJlfEdIDPGTHxVdX1zerxhEJPlyWOHCU8V7S3879I5RaRS3aqqejjp9 d7VpyQa4dwYogYatE0lYLoRutMDwKftOL2P2RTBbL64sTzXd+Qk= =6lgC -----END PGP SIGNATURE----- Merge tag 'for-linus-5.16-3' of git://github.com/cminyard/linux-ipmi Pull IPMI fixes from Corey Minyard: "Fix some IPMI crashes Some crash fixes have come in dealing with various error handling issues. They have sat in next for 5 days or more without issue, and they are fairly critical" * tag 'for-linus-5.16-3' of git://github.com/cminyard/linux-ipmi: ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module ipmi: fix initialization when workqueue allocation fails ipmi: bail out if init_srcu_struct fails ipmi: ssif: initialize ssif_info->client early	2021-12-22 10:11:17 -08:00
Linus Torvalds	c9ea870c6e	Two overhead reduction patches for testing/fuzzing environment. tomoyo: use hwight16() in tomoyo_domain_quota_is_ok() tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok(). security/tomoyo/util.c \| 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJhwyxIAAoJEEJfEo0MZPUqo8wP/1JRtcV9r5IoZ4ZSiC059j5i gfg6IXgRlfMcTxOlERinRQOEf6yR6nAnfvvh4Hfs92XexzKDTVXjilXXevDMlnoo FMqPI+4N0YGdw87YGj8rFu/Juwrkebt0psz516V36wR9PBCvw3/s/qF3cGLNfb3K 5cksHifJ6vGb8zEzyyXMRJjYbTrKhix7f3T7WNTAzUPn9Hr3OzIqjmFrsDfFuSXc xNIEezoODiGwjtx7JUCLLEcIJ5NtE84oDKiF4FTSp9J/RSrnOPHWHfunXH8XGYCQ Fs1DaiEkzcgRI+zhOnc7cOYJBXi3XHA8ncCUb59z3C5xsSZyUmnapV4ZRv6eUmbx zwKYCPSFjIsHheadEjyKnLRFQ8n1uhMLB68VMusKAyhEbY2hZ9GvJ10Bdf3hltoz /GRyjvkuUB9wdZM06H5F9NgnEg4Acj1285ynWRSTQ6P91Z7Wsfu4db00mXf2Xhxf m1SbWjKsi0fayFTZQ0ttCX3jrHPAJZcWXXxNIOmxrmqBgCcpGAukcdrFWI4TO433 xy3Fj0Z4PrztuR9MzJiGrmd5KlHS0uymHvB8HcxzMmlfsHQK41bYBu6lKjMnhGVH 1L6+T5PcHzAm1/PPNfHgIazmM+/igdgCd+l3DLS58sFbyz9pdaYFxCYerHU9SEvW rrqAi8WWzWLPh7AZIyYV =MN+/ -----END PGP SIGNATURE----- Merge tag 'tomoyo-pr-20211222' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 Pull tomoyo fixes from Tetsuo Handa: "Two overhead reduction patches for testing/fuzzing environment" * tag 'tomoyo-pr-20211222' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: tomoyo: use hweight16() in tomoyo_domain_quota_is_ok() tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().	2021-12-22 10:06:32 -08:00
Linus Torvalds	e19e226345	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in the qat driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: qat - do not handle PFVF sources for qat_4xxx	2021-12-22 10:02:08 -08:00
Jackie Liu	d7f55471db	memblock: fix memblock_phys_alloc() section mismatch error Fix modpost Section mismatch error in memblock_phys_alloc() [...] WARNING: modpost: vmlinux.o(.text.unlikely+0x1dcc): Section mismatch in reference from the function memblock_phys_alloc() to the function .init.text:memblock_phys_alloc_range() The function memblock_phys_alloc() references the function __init memblock_phys_alloc_range(). This is often because memblock_phys_alloc lacks a __init annotation or the annotation of memblock_phys_alloc_range is wrong. ERROR: modpost: Section mismatches detected. Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. [...] memblock_phys_alloc() is a one-line wrapper, make it __always_inline to avoid these section mismatches. Reported-by: k2ci <kernel-bot@kylinos.cn> Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> [rppt: slightly massaged changelog ] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20211217020754.2874872-1-liu.yun@linux.dev	2021-12-22 19:35:29 +02:00
Masahiro Yamada	aacb201606	parisc: remove ARCH_DEFCONFIG Commit `2a86f66121` ("kbuild: use KBUILD_DEFCONFIG as the fallback for DEFCONFIG_LIST") removed ARCH_DEFCONFIG because it does not make much sense. In the same development cycle, Commit `ededa081ed` ("parisc: Fix defconfig selection") added ARCH_DEFCONFIG for parisc. Please use KBUILD_DEFCONFIG in arch/*/Makefile for defconfig selection. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>	2021-12-22 09:02:10 +01:00
Linus Torvalds	2f47a9a4df	Power management fix for 5.16-rc7 Fix a recent regression causing the loop in dpm_prepare() to become infinite if one of the device ->prepare() callbacks returns an error. -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmHCKZ4SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxrIwP/3gQbaUQjdMZ2rSX9hV8rZxEdzqOY9Js +JBMNSH9jc/8t0QlKGl+LmPlu79+Y5QFljCN0c4xCoSGisQfOnOrGtIAxWT2ldP0 Uk02NXqAfJRnZ+g4FbwSN/CfdENdsKnBttpaKRQbXcxgzqERp+ahjVtPV1aSKE5i aAlsAgGqBAZCiSVkp0jG3ubo8Z/x6dWfjd4Y/CcSyhMpX8F4TjRR2Ka7QqRbdJr/ 5oEw4u8O4fkufiVCjq2Owd352VGJ1XFBasx1+OtyLkSy7wIH2Ofq9HIClM8+eJnH a8b9T0HEmNnKgSKQyIFzAOcpt4YCe0oQsfjGctFAvAAl6dBcGUQFGxzHAuNjSvrv yDdWkuFScXCTgM6vHoawwR4fCU5t5iHMNgp8o7qf3mWMA1EwCuC4CduK/W8QSQT8 s/jsC4tuNOiW14PpUPpCEjButv0ylsZwyDu/VvztYEHKO1VsG3j065zoTio0WsaM Nu4IDRuR+1hDEDo0U1x+0BJPQh28o24Hu0iyU67ku0u2vcNoyn+/kVTPWMO4zLKZ fghe19hpTPwiSp3KEEtqDiRmW9bBRulJHuRxAHSgXDLPm7zeruW3pnQlGXBBzE2z TOYANBBPoaMpbZmLbg14ZsZvcw8IERXLJXqK5SoKYUVYp2IAXL4sIItU4ZRRKIBm NLBez5NM78bS =5TSj -----END PGP SIGNATURE----- Merge tag 'pm-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a recent regression causing the loop in dpm_prepare() to become infinite if one of the device ->prepare() callbacks returns an error" * tag 'pm-5.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: sleep: Fix error handling in dpm_prepare()	2021-12-21 12:31:55 -08:00
Linus Torvalds	ca0ea8a60b	* Fix for compilation of selftests on non-x86 architectures * Fix for kvm_run->if_flag on SEV-ES * Fix for page table use-after-free if yielding during exit_mm() * Improve behavior when userspace starts a nested guest with invalid state * Fix missed wakeup with assigned devices but no VT-d posted interrupts * Do not tell userspace to save/restore an unsupported PMU MSR -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHCGbIUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNMSwf+PpOl69xqQCHzLpDtA9MTBZZi5uKc eZqMtQ7D+T7XYbtMiHBI9zofcbtHlQVC9Na74j8J7bCuvqe8UGS/z93RoZZQ+/dg seEbQNEgcNlPD1G3JzaipcGhRqQl6j5G9LjuixyFpkS2O27YsRTpb1HJag31uSFV Xd7Ih0R2YnKCNJqJov324KHpDM3c4FZL8HdCS/gG94qyLHPA2dJ8fOTwCcAK4LP6 tl9XW5InXMVcck9ssDKFpAfipMNNPNPeAHMONbIm6hyG5HkVjcS3JT1bZmkauxky VIegPlAmorRCig5Hj2QfWEIOfvTUPdmm7WWh+SSZamY6cyvolWs1hV86Bg== =333O -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: - Fix for compilation of selftests on non-x86 architectures - Fix for kvm_run->if_flag on SEV-ES - Fix for page table use-after-free if yielding during exit_mm() - Improve behavior when userspace starts a nested guest with invalid state - Fix missed wakeup with assigned devices but no VT-d posted interrupts - Do not tell userspace to save/restore an unsupported PMU MSR * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required KVM: VMX: Always clear vmx->fail on emulation_required selftests: KVM: Fix non-x86 compiling KVM: x86: Always set kvm_run->if_flag KVM: x86/mmu: Don't advance iterator after restart due to yielding KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all	2021-12-21 12:25:57 -08:00
John David Anglin	d3a5a68cff	parisc: Fix mask used to select futex spinlock The address bits used to select the futex spinlock need to match those used in the LWS code in syscall.S. The mask 0x3f8 only selects 7 bits. It should select 8 bits. This change fixes the glibc nptl/tst-cond24 and nptl/tst-cond25 tests. Signed-off-by: John David Anglin <dave.anglin@bell.net> Fixes: `53a42b6324` ("parisc: Switch to more fine grained lws locks") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Helge Deller <deller@gmx.de>	2021-12-21 21:15:59 +01:00
John David Anglin	8f66fce0f4	parisc: Correct completer in lws start The completer in the "or,ev %r1,%r30,%r30" instruction is reversed, so we are not clipping the LWS number when we are called from a 32-bit process (W=0). We need to nulify the following depdi instruction when the least-significant bit of %r30 is 1. If the %r20 register is not clipped, a user process could perform a LWS call that would branch to an undefined location in the kernel and potentially crash the machine. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Helge Deller <deller@gmx.de>	2021-12-21 21:07:39 +01:00
Linus Torvalds	5dbdc4c565	Fixes: - Address a buffer overrun reported by Anatoly Trosinenko -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmHArD0ACgkQM2qzM29m f5ddQRAAnPQnaLHGzvXkLKP76EMkFWlCPZk4pkNy1LuHRyU3F+D2Pj1qZBJpqFCa gzRh2N1U+zPguwsoKlPuPjHUqqU4D2Wf0DHo9xsU0gvY5B86m4bebJgpK3zXpD3m 37gyM/UMe744D5A2Kh/yyKCEWR+8STh5/a956dCi22Z7qyEhPQDkrbk9yLBsDo9t NIb2rV/tvdvQvjzBd5om4Sm8QAXrdqoChK619b/T3v46shj/OX2Rqneid1deJi0U czrz19g/hzzvaTlrdXmFT0w9qZQ3Md/T2wDtiKtc/XoTF7ZsBHZFOwLhcDRuTXJO UAfXzXkf1WmhrQZOqJqHCuWFf01vdc3++8L01PXyxVupwGYS/uvdy0VNdO/u5hDr ibYjkrMpTKT14Q7iyPU7CCPolWipVpKt3pMwXuusCz8ky8oMQQSmpjdkofXutVnP NNuzx8iqW8N4Vo86Itoau7qEFM0FcxWR0Ut2F0EKiOjiS63Ccg9wxNbu3rJe/Wpb gpb3ICpPgyOaSUI1D0NEEibbRyiOAE+ldiDdpGHyOGgcK672jaD/5NAvD3s18FRp 2V7Xf/vY3Qiggakopk+hEKYNh2aFXkhZmUf+rNNbhEHpgJGQ6HU1rlSw4+SfrdPK vKx/uDuUr5aahnjrZDaLv1sGp4Nq+m3KEjhlQGKSVi25n4xXgtc= =FIJ5 -----END PGP SIGNATURE----- Merge tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: "Address a buffer overrun reported by Anatoly Trosinenko" * tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix READDIR buffer overflow	2021-12-21 12:02:36 -08:00
Sean Christopherson	fdba608f15	KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() \| \|-> eventfd_signal() \| \|-> ... \| \|-> irqfd_wakeup() \| \|->kvm_arch_set_irq_inatomic() \| \|-> kvm_irq_delivery_to_apic_fast() \| \|-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: `379a3c8ee4` ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-21 12:39:03 -05:00
Linus Torvalds	1c3e979bf3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - NULL pointer dereference fix in Vivaldi driver (Jiasheng Jiang) - regression fix for device probing in Holtek driver (Benjamin Tissoires) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: potential dereference of null pointer HID: holtek: fix mouse probing	2021-12-21 09:30:32 -08:00
Wu Bo	ffb76a86f8	ipmi: Fix UAF when uninstall ipmi_si and ipmi_msghandler module Hi, When testing install and uninstall of ipmi_si.ko and ipmi_msghandler.ko, the system crashed. The log as follows: [ 141.087026] BUG: unable to handle kernel paging request at ffffffffc09b3a5a [ 141.087241] PGD 8fe4c0d067 P4D 8fe4c0d067 PUD 8fe4c0f067 PMD 103ad89067 PTE 0 [ 141.087464] Oops: 0010 [#1] SMP NOPTI [ 141.087580] CPU: 67 PID: 668 Comm: kworker/67:1 Kdump: loaded Not tainted 4.18.0.x86_64 #47 [ 141.088009] Workqueue: events 0xffffffffc09b3a40 [ 141.088009] RIP: 0010:0xffffffffc09b3a5a [ 141.088009] Code: Bad RIP value. [ 141.088009] RSP: 0018:ffffb9094e2c3e88 EFLAGS: 00010246 [ 141.088009] RAX: 0000000000000000 RBX: ffff9abfdb1f04a0 RCX: 0000000000000000 [ 141.088009] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246 [ 141.088009] RBP: 0000000000000000 R08: ffff9abfffee3cb8 R09: 00000000000002e1 [ 141.088009] R10: ffffb9094cb73d90 R11: 00000000000f4240 R12: ffff9abfffee8700 [ 141.088009] R13: 0000000000000000 R14: ffff9abfdb1f04a0 R15: ffff9abfdb1f04a8 [ 141.088009] FS: 0000000000000000(0000) GS:ffff9abfffec0000(0000) knlGS:0000000000000000 [ 141.088009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 141.088009] CR2: ffffffffc09b3a30 CR3: 0000008fe4c0a001 CR4: 00000000007606e0 [ 141.088009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 141.088009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 141.088009] PKRU: 55555554 [ 141.088009] Call Trace: [ 141.088009] ? process_one_work+0x195/0x390 [ 141.088009] ? worker_thread+0x30/0x390 [ 141.088009] ? process_one_work+0x390/0x390 [ 141.088009] ? kthread+0x10d/0x130 [ 141.088009] ? kthread_flush_work_fn+0x10/0x10 [ 141.088009] ? ret_from_fork+0x35/0x40] BUG: unable to handle kernel paging request at ffffffffc0b28a5a [ 200.223240] PGD 97fe00d067 P4D 97fe00d067 PUD 97fe00f067 PMD a580cbf067 PTE 0 [ 200.223464] Oops: 0010 [#1] SMP NOPTI [ 200.223579] CPU: 63 PID: 664 Comm: kworker/63:1 Kdump: loaded Not tainted 4.18.0.x86_64 #46 [ 200.224008] Workqueue: events 0xffffffffc0b28a40 [ 200.224008] RIP: 0010:0xffffffffc0b28a5a [ 200.224008] Code: Bad RIP value. [ 200.224008] RSP: 0018:ffffbf3c8e2a3e88 EFLAGS: 00010246 [ 200.224008] RAX: 0000000000000000 RBX: ffffa0799ad6bca0 RCX: 0000000000000000 [ 200.224008] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246 [ 200.224008] RBP: 0000000000000000 R08: ffff9fe43fde3cb8 R09: 00000000000000d5 [ 200.224008] R10: ffffbf3c8cb53d90 R11: 00000000000f4240 R12: ffff9fe43fde8700 [ 200.224008] R13: 0000000000000000 R14: ffffa0799ad6bca0 R15: ffffa0799ad6bca8 [ 200.224008] FS: 0000000000000000(0000) GS:ffff9fe43fdc0000(0000) knlGS:0000000000000000 [ 200.224008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.224008] CR2: ffffffffc0b28a30 CR3: 00000097fe00a002 CR4: 00000000007606e0 [ 200.224008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 200.224008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 200.224008] PKRU: 55555554 [ 200.224008] Call Trace: [ 200.224008] ? process_one_work+0x195/0x390 [ 200.224008] ? worker_thread+0x30/0x390 [ 200.224008] ? process_one_work+0x390/0x390 [ 200.224008] ? kthread+0x10d/0x130 [ 200.224008] ? kthread_flush_work_fn+0x10/0x10 [ 200.224008] ? ret_from_fork+0x35/0x40 [ 200.224008] kernel fault(0x1) notification starting on CPU 63 [ 200.224008] kernel fault(0x1) notification finished on CPU 63 [ 200.224008] CR2: ffffffffc0b28a5a [ 200.224008] ---[ end trace c82a412d93f57412 ]--- The reason is as follows: T1: rmmod ipmi_si. ->ipmi_unregister_smi() -> ipmi_bmc_unregister() -> __ipmi_bmc_unregister() -> kref_put(&bmc->usecount, cleanup_bmc_device); -> schedule_work(&bmc->remove_work); T2: rmmod ipmi_msghandler. ipmi_msghander module uninstalled, and the module space will be freed. T3: bmc->remove_work doing cleanup the bmc resource. -> cleanup_bmc_work() -> platform_device_unregister(&bmc->pdev); -> platform_device_del(pdev); -> device_del(&pdev->dev); -> kobject_uevent(&dev->kobj, KOBJ_REMOVE); -> kobject_uevent_env() -> dev_uevent() -> if (dev->type && dev->type->name) 'dev->type'(bmc_device_type) pointer space has freed when uninstall ipmi_msghander module, 'dev->type->name' cause the system crash. drivers/char/ipmi/ipmi_msghandler.c: 2820 static const struct device_type bmc_device_type = { 2821 .groups = bmc_dev_attr_groups, 2822 }; Steps to reproduce: Add a time delay in cleanup_bmc_work() function, and uninstall ipmi_si and ipmi_msghandler module. 2910 static void cleanup_bmc_work(struct work_struct work) 2911 { 2912 struct bmc_device bmc = container_of(work, struct bmc_device, 2913 remove_work); 2914 int id = bmc->pdev.id; /* Unregister overwrites id */ 2915 2916 msleep(3000); <--- 2917 platform_device_unregister(&bmc->pdev); 2918 ida_simple_remove(&ipmi_bmc_ida, id); 2919 } Use 'remove_work_wq' instead of 'system_wq' to solve this issues. Fixes: `b2cfd8ab4a` ("ipmi: Rework device id and guid handling to catch changing BMCs") Signed-off-by: Wu Bo <wubo40@huawei.com> Message-Id: <1640070034-56671-1-git-send-email-wubo40@huawei.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>	2021-12-21 08:04:42 -06:00
Linus Torvalds	6e0567b730	RDMA v5.16 third rc pull request Last fixes before holidays: - Work around a HW bug in HNS HIP08 - Recent memory leak regression in qib - Incorrect use of kfree() for vmalloc memory in hns -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmHBK3MACgkQOG33FX4g mxobyRAAmeN+27yYxE7NHjNPdFQ8wDGwJW5+L6gaydzsG19Ql69ZdA2VRQqO+kyI 8RH69IPKkRbpgruZhCguNSCyYan+mre9qL7kcRJXuIbt1gKZVoD/6J+70t4TOw3W +R154GUGN8dzilmYp1Rr8rGNKZDLRc+yc6SSSDbGu3RmCJm7PZgLjrj0pfqHO37g juA3lOvGHgVEJtmFM3SILfkwqKFkNRhB7pVb/3HE18Vpwn0dXFh42DyoA/9T8DwX Ufx1rm7/hxjYIxv0TXmNU6MqhobP3SwF8wCBTlS0IBPHWSBVoYAKon2hCdFw2wD2 wadGCEMuD+MXZu+/ZvyDz0EN9AWamCktKr74ZD5fp8Un7XY4r2vpjBb0uxuMrfyJ H2TZSFxDp31W1yU2UgDmpPOLrQhC5iduDGGZ7olw41CAqC95zUidAv2tJrRra1Hv UlnAIKM7VZ+tT8VcDPcdmUkGjvgJlmFhWD0VM6RJ+RU2smTW1WcCoPrNQV9ngSMy AO8wmVg8KwKQtkd8roOkdNe1QxvnUuAt7WWTWW3sc2IqELoYGfm5uJXoQQICP3Q/ nCVXjjDmUY9zfgjRqcqLGwZrQJTCHQp7JSELmQCaEODJVYsafW0FxZlHhyLr48t6 Q30YdioNbO9+DG8Ggza3os5jeCqxEbh+kugg43KMQyjGtQ05oJg= =rRP8 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Last fixes before holidays. Nothing very exciting: - Work around a HW bug in HNS HIP08 - Recent memory leak regression in qib - Incorrect use of kfree() for vmalloc memory in hns" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Replace kfree() with kvfree() IB/qib: Fix memory leak in qib_user_sdma_queue_pkts() RDMA/hns: Fix RNR retransmission issue for HIP08	2021-12-20 17:26:42 -08:00
Linus Torvalds	86085fe79e	spi: Fix for v5.16 One small fix for a long standing issue with error handling on probe in the Armada driver. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmHAupoACgkQJNaLcl1U h9Ca7Af+MSrnkT9q1GiDRT57ndUpG7CWa7tKDfzU/2QnPFFQED+Mne/ZvaopMNuz bbov7YvvObJKDjfdwYYK5KrnNyaqHcbjX5cow+BG6zEMAMTeQLqOjMEN4/QcgO2j nr5gYIqj9bOosoTrPZlJ2HUdTg48FgFovc0w/b9GUrzKAJJSKNZzeMffVx9ljtjs UgdFj1h9mgyZq/cD4TS5+DgnP6qwi195FidOJQWXoAiwohMBZFjGglaTNhGYprGq EDAbDZOaw4hjRBdZk2GAvo60PyhWU9BpGlsucFf341oLxdo9qqHuzP7x8MRIGobx TnKfHSWO0PoYVJReiIKy2ShgYvlrYA== =vwk0 -----END PGP SIGNATURE----- Merge tag 'spi-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "One small fix for a long standing issue with error handling on probe in the Armada driver" * tag 'spi-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: change clk_disable_unprepare to clk_unprepare	2021-12-20 10:23:19 -08:00
Linus Torvalds	3856c1b398	regulator: Binding fix for v5.16 This fixes problems validating DT bindings using op_mode which wasn't described as it should have been when converting to DT schema. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmHAuVgACgkQJNaLcl1U h9BD7gf+MMfzUBhMBgFM+iO0KRJt+yTZOS7d2FVSfKxF74px0gT43K0ABgFUPyb1 LhzWxHFAh8oxGFQtLu9KThFO0b7IaT07TGx0iqDdapVd7jj7aUg38/7fUqYY1p21 uAYRHqPJIYbLM5WyzXRrbF7LuhqrJ+ZhJhyeHy4HOYvbV8QqhmDjsnAuU/mXlicr pmkR+ph00V+JLW2w2EGJ+d8S9x2hOQ+PncQ2i++KFBQiUAyPSzlRLkvIXSgEqINp IxwISXXzh8CwKBlqV8Zo11vDW9q7Crt5OYxE1epfPBFjQPvu9HPCoIe7BTK55AuX /gkULxXtpTo13z5N20k5tMAitU0W+g== =JDrH -----END PGP SIGNATURE----- Merge tag 'regulator-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "Binding fix for v5.16 This fixes problems validating DT bindings using op_mode which wasn't described as it should have been when converting to DT schema" * tag 'regulator-fix-v5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: dt-bindings: samsung,s5m8767: add missing op_mode to bucks	2021-12-20 10:13:30 -08:00
Linus Torvalds	59b3f94488	Merge branch 'xsa' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Merge xen fixes from Juergen Gross: "Fixes for two issues related to Xen and malicious guests: - Guest can force the netback driver to hog large amounts of memory - Denial of Service in other guests due to event storms" * 'xsa' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/netback: don't queue unlimited number of packages xen/netback: fix rx queue stall detection xen/console: harden hvc_xen against event channel storms xen/netfront: harden netfront against event channel storms xen/blkfront: harden blkfront against event channel storms	2021-12-20 07:42:21 -08:00
Helge Deller	484730e586	parisc: Clear stale IIR value on instruction access rights trap When a trap 7 (Instruction access rights) occurs, this means the CPU couldn't execute an instruction due to missing execute permissions on the memory region. In this case it seems the CPU didn't even fetched the instruction from memory and thus did not store it in the cr19 (IIR) register before calling the trap handler. So, the trap handler will find some random old stale value in cr19. This patch simply overwrites the stale IIR value with a constant magic "bad food" value (0xbaadf00d), in the hope people don't start to try to understand the various random IIR values in trap 7 dumps. Noticed-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>	2021-12-20 14:38:40 +01:00
Sean Christopherson	ab1ef34416	KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state Add a selftest to attempt to enter L2 with invalid guests state by exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set invalid guest state (marking TR unusable is arbitrary chosen for its relative simplicity). This is a regression test for a bug introduced by commit `c8607e4a08` ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid guest state and ultimately triggered a WARN due to nested_vmx_vmexit() seeing vmx->fail==true while attempting to synthesize a nested VM-Exit. The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT for L2, which is somewhat arbitrary behavior, instead of emulating L2. KVM should never emulate L2 due to invalid guest state, as it's architecturally impossible for L1 to run an L2 guest with invalid state as nested VM-Enter should always fail, i.e. L1 needs to do the emulation. Stuffing state via KVM ioctl() is a non-architctural, out-of-band case, hence the TRIPLE_FAULT being rather arbitrary. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-5-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:55 -05:00
Sean Christopherson	0ff29701ff	KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state Update the documentation for kvm-intel's emulate_invalid_guest_state to rectify the description of KVM's default behavior, and to document that the behavior and thus parameter only applies to L1. Fixes: `a27685c33a` ("KVM: VMX: Emulate invalid guest state by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-4-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:55 -05:00
Sean Christopherson	cd0e615c49	KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required Synthesize a triple fault if L2 guest state is invalid at the time of VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs guest state via ioctls(), e.g. KVM_SET_SREGS. KVM should never emulate invalid guest state, since from L1's perspective, it's architecturally impossible for L2 to have invalid state while L2 is running in hardware. E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit or #GP. Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that can trigger this scenario, as nested VM-Enter correctly rejects any attempt to enter L2 with invalid state. RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits via RSM results in shutdown, i.e. there is precedent for KVM's behavior. Following AMD's SMRAM layout is important as AMD's layout saves/restores the descriptor cache information, including CS.RPL and SS.RPL, and also defines all the fields relevant to invalid guest state as read-only, i.e. so long as the vCPU had valid state before the SMI, which is guaranteed for L2, RSM will generate valid state unless SMRAM was modified. Intel's layout saves/restores only the selector, which means that scenarios where the selector and cached RPL don't match, e.g. conforming code segments, would yield invalid guest state. Intel CPUs fudge around this issued by stuffing SS.RPL and CS.RPL on RSM. Per Intel's SDM on the "Default Treatment of RSM", paraphrasing for brevity: IF internal storage indicates that the [CPU was post-VMXON] THEN enter VMX operation (root or non-root); restore VMX-critical state as defined in Section 34.14.1; set to their fixed values any bits in CR0 and CR4 whose values must be fixed in VMX operation [unless coming from an unrestricted guest]; IF RFLAGS.VM = 0 AND (in VMX root operation OR the “unrestricted guest” VM-execution control is 0) THEN CS.RPL := SS.DPL; SS.RPL := SS.DPL; FI; restore current VMCS pointer; FI; Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM will sythesize TRIPLE_FAULT in this scenario. KVM's behavior is allowed as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the only way for CR0 and/or CR4 to have illegal values is if they were modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map" section states "modifying these registers will result in unpredictable behavior". KVM's ioctl() behavior is less straightforward. Because KVM allows ioctls() to be executed in any order, rejecting an ioctl() if it would result in invalid L2 guest state is not an option as KVM cannot know if a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE. Ideally, KVM would reject KVM_RUN if L2 contained invalid guest state, but that carries the risk of a false positive, e.g. if RSM loaded invalid guest state and KVM exited to userspace. Setting a flag/request to detect such a scenario is undesirable because (a) it's extremely unlikely to add value to KVM as a whole, and (b) KVM would need to consider ioctl() interactions with such a flag, e.g. if userspace migrated the vCPU while the flag were set. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-3-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:54 -05:00
Sean Christopherson	a80dfc0259	KVM: VMX: Always clear vmx->fail on emulation_required Revert a relatively recent change that set vmx->fail if the vCPU is in L2 and emulation_required is true, as that behavior is completely bogus. Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong: (a) it's impossible to have both a VM-Fail and VM-Exit (b) vmcs.EXIT_REASON is not modified on VM-Fail (c) emulation_required refers to guest state and guest state checks are always VM-Exits, not VM-Fails. For KVM specifically, emulation_required is handled before nested exits in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect, i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored. Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit() firing when tearing down the VM as KVM never expects vmx->fail to be set when L2 is active, KVM always reflects those errors into L1. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548 nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Modules linked in: CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80 Call Trace: vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline] nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330 vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799 kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989 kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline] kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline] kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489 __fput+0x3fc/0x870 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0x705/0x24f0 kernel/exit.c:832 do_group_exit+0x168/0x2d0 kernel/exit.c:929 get_signal+0x1740/0x2120 kernel/signal.c:2852 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `c8607e4a08` ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry") Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:54 -05:00
Andrew Jones	577e022b7b	selftests: KVM: Fix non-x86 compiling Attempting to compile on a non-x86 architecture fails with include/kvm_util.h: In function â€˜vm_compute_max_gfnâ€™: include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type â€˜struct kvm_vmâ€™ return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; ^~ This is because the declaration of struct kvm_vm is in lib/kvm_util_internal.h as an effort to make it private to the test lib code. We can still provide arch specific functions, though, by making the generic function symbols weak. Do that to fix the compile error. Fixes: `c8cc43c1ea` ("selftests: KVM: avoid failures due to reserved HyperTransport region") Cc: stable@vger.kernel.org Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20211214151842.848314-1-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:54 -05:00
Marc Orr	c5063551bf	KVM: x86: Always set kvm_run->if_flag The kvm_run struct's if_flag is a part of the userspace/kernel API. The SEV-ES patches failed to set this flag because it's no longer needed by QEMU (according to the comment in the source code). However, other hypervisors may make use of this flag. Therefore, set the flag for guests with encrypted registers (i.e., with guest_state_protected set). Fixes: `f1c6366e30` ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Signed-off-by: Marc Orr <marcorr@google.com> Message-Id: <20211209155257.128747-1-marcorr@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>	2021-12-20 08:06:53 -05:00
Sean Christopherson	3a0f64de47	KVM: x86/mmu: Don't advance iterator after restart due to yielding After dropping mmu_lock in the TDP MMU, restart the iterator during tdp_iter_next() and do not advance the iterator. Advancing the iterator results in skipping the top-level SPTE and all its children, which is fatal if any of the skipped SPTEs were not visited before yielding. When zapping all SPTEs, i.e. when min_level == root_level, restarting the iter and then invoking tdp_iter_next() is always fatal if the current gfn has as a valid SPTE, as advancing the iterator results in try_step_side() skipping the current gfn, which wasn't visited before yielding. Sprinkle WARNs on iter->yielded being true in various helpers that are often used in conjunction with yielding, and tag the helper with __must_check to reduce the probabily of improper usage. Failing to zap a top-level SPTE manifests in one of two ways. If a valid SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(), the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of marking a struct page as dirty/accessed after it has been put back on the free list. This directly triggers a WARN due to encountering a page with page_count() == 0, but it can also lead to data corruption and additional errors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Note, the underlying bug existed even before commit `1af4a96025` ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still incorrectly advance past a top-level entry when yielding on a lower-level entry. But with respect to leaking shadow pages, the bug was introduced by yielding before processing the current gfn. Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or callers could jump to their "retry" label. The downside of that approach is that tdp_mmu_iter_cond_resched() _must_ be called before anything else in the loop, and there's no easy way to enfornce that requirement. Ideally, KVM would handling the cond_resched() fully within the iterator macro (the code is actually quite clean) and avoid this entire class of bugs, but that is extremely difficult do while also supporting yielding after tdp_mmu_set_spte_atomic() fails. Yielding after failing to set a SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE may block operations on the SPTE for a significant amount of time. Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: `1af4a96025` ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Reported-by: Ignat Korchagin <ignat@cloudflare.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211214033528.123268-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 08:06:53 -05:00
Jiasheng Jiang	13251ce1dd	HID: potential dereference of null pointer The return value of devm_kzalloc() needs to be checked. To avoid hdev->dev->driver_data to be null in case of the failure of alloc. Fixes: `14c9c014ba` ("HID: add vivaldi HID driver") Cc: stable@vger.kernel.org Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20211215083605.117638-1-jiasheng@iscas.ac.cn	2021-12-20 11:26:14 +01:00
Benjamin Tissoires	93a2207c25	HID: holtek: fix mouse probing An overlook from the previous commit: we don't even parse or start the device, meaning that the device is not presented to user space. Fixes: `93020953d0` ("HID: check for valid USB device for many HID drivers") Cc: stable@vger.kernel.org Link: https://bugs.archlinux.org/task/73048 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215341 Link: https://lore.kernel.org/r/e4efbf13-bd8d-0370-629b-6c80c0044b15@leemhuis.info/ Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>	2021-12-20 11:25:42 +01:00
Wei Wang	9fb12fe5b9	KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-20 10:51:19 +01:00
Linus Torvalds	a7904a5389	Linux 5.16-rc6	2021-12-19 14:14:33 -08:00
Linus Torvalds	f291e2d899	Two small fixes, one of which was being worked around in selftests. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmG/fW8UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOVBAf/dZDUEnrsu/o4xVPjPinMihtMstGk RP7PqmJ+6nHeDsfoivIiigaFchpc0DYq2kU/8iOPkbsEnLyFuhKg1jUHMQiK5ZGY shWvJq9HEwqAQ4v3H3PC5SomwlO4UmRkbrR4NT1SwFBSIk9s+soVbDZrdL8bJW7x 6XS01LsXeUPEcJj+h9bDGKmnibkiE4hsf9jGMgfiXgF33DB+LDNU9TH9MFn4K6J2 ColgIqZ+o5dXn7wMcsDsw+DTj81nki+gkghmW9k5rC5FA3qmKd3e5T2sgKey3Ztd J+acNmGyG/hhYNE4RHXaN+ckNEcqywcCA31DfWq87aw6yP4pF2dKWu+BYw== =yEYt -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Two small fixes, one of which was being worked around in selftests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Retry page fault if MMU reload is pending and root has no sp KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES	2021-12-19 12:44:03 -08:00
Linus Torvalds	2da09da4ae	block-5.16-2021-12-19 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG/Z/MQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjo7EADdal+Z26fAabpI/wHJJQQKOE9WIy3hP5kA G466lPP4FvW+2JIclxRGWmBfkvnzzKLjEFKg2OJZQgOvSycut217qBNxG4xpxy6O a0B0jKV3Dh8uLGaaOwvwcYmeBzuEnzmr+qh5hQJQDr+9Cpx491Uh7+GABOt+sceY +j+KNNNShcK1qUYtYHsKqSvxEAyTmKuN0uz6rOTbGr/Z2TjWXVVBjRw5g3J52U7y DjueqM0Y7qebCN+DexdOXZDC4h3A5/j1kCABPe3Vx3yRRWR1RJDLGDfQes/QZDpF sf5kkr/1iPki7Dh2laLin4686sNa4HyIJgS8wlrVXGBnRl38SGTShcbuP94m5okJ JsdBD38pmmBsSaokKGpxfsG1N5kf3MpBD9zc2WWeiYpwZH+Mr3cRwSvrz3jUpVqL MfrHoCHtgqOgIdC7Jdv3ilet3ujfp9yX9QcgIljgkxlBHZHQxX2mXcobROdWbMgz v+sn0Ot9+bEf4FrwSeq43fWgBGTK3IbiLajtxWozTKvy3Re+hnDnfoHFqUZE9yfe nFzKJPXeplKdM9Y5J75OaJQ+Ohp0F+0jcPNGHeseEWqvJhK0A5mgssM8TqrTQkAo GJj8DZQU8Szc0q7yO7gqVJ398sjyKEtL8Rg3qTGLmKJrNdFis+bmjtaDzyDd4HeW Uij+cetUuw== =dbu4 -----END PGP SIGNATURE----- Merge tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block Pull block revert from Jens Axboe: "It turns out that the fix for not hammering on the delayed work timer too much caused a performance regression for BFQ, so let's revert the change for now. I've got some ideas on how to fix it appropriately, but they should wait for 5.17" * tag 'block-5.16-2021-12-19' of git://git.kernel.dk/linux-block: Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption"	2021-12-19 12:38:53 -08:00
Linus Torvalds	a76c3d0358	- Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it is restored to its reset state - Mask MSI-X vectors late on the init path in order to handle out-of-spec Marvell NVME devices which apparently look at the MSI-X mask even when MSI-X is disabled -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmG/PCUACgkQEsHwGGHe VUpAkg//WZNlzHyovL4EslM/E2UPq23ALYGfK36td4b9G2+9PvnOZMeQ6hM8rOeN pMcwpSBvOs3hlWTWjhDLKWZwkxFBNi53Fj2YvxDFUFj771jFNIpvoIY47z2AB03E FnNUQHJNUe5clxARNc3RTlLtMURGghyfY7xwd/3UQcr6xfma7bZhiraUUpy/vUTn 29QhR4/iSNIqZdA1gzapDwnlRjvr3qcFe/Xa7tfqrs3xOKFGIRcHTpVMdhB4iBON 7JCZr/lKiw/HiJzXHgNG1SmLR6QyGVLshga5tMoswzKqoIDHbW7szBUFMe816/rP M5jfwkiUznvmrcSzqs12K5whLSRPXT2IFRegtcuA8bjNowwZqPWVEaY+z1s2iHeO u2WZnsaAHrqQqgZIR2bP6x1cex2bRERDEeUq2tj3A9EiTzEjI5OT66/cjeThiVDT b8J/vJEYiNBnqEI5REM1oajYLDO71H4SHxjZqfSGRxgHBwEcVry8b39QbglQPZ0V 87+2IhDcLfKcCQehCLb5Qd5PqFZMS4NM1KlcVjrt8O9Y01A0qIor9ZVPcA38R5cA nPo5EAALVX5z22KbnMmKgaL3APARsh9NM2wW0F5699irL2EgiOEWW+ENKOzycd8f dzFLT0UrZ6S07IiANAc4mhMpI2cIRuVJGDKJuWVOwn47V1L37ys= =Kdqk -----END PGP SIGNATURE----- Merge tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Clear the PCI_MSIX_FLAGS_MASKALL bit too on the error path so that it is restored to its reset state - Mask MSI-X vectors late on the init path in order to handle out-of-spec Marvell NVME devices which apparently look at the MSI-X mask even when MSI-X is disabled * tag 'irq_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error PCI/MSI: Mask MSI-X vectors only on success	2021-12-19 12:28:46 -08:00
Linus Torvalds	e1fe1b10e6	- Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never positive -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmG/OTsACgkQEsHwGGHe VUrBWxAAhCQ5rFc5WkVxN3Lr2JLtY2bNUAOrdWNVXXmuKIZhbCgnXZ6a7NH9Ins/ zkLS7YL1gaZtcK+sYnPbO7Z6oTVEqV5UZnxuUH8DF8Q2U7cVdGvQSeHx5ghx4O35 13P0RSrj0++Q03dc5mf7+OA7RTuH00JpFCvRavpHNJDYFIN+gl1pPDjM/0g+j90W PwFa/Hr8vOH7vpPRwygZ+yWfMunb7nTpY7Pa7toSQtE4NR6L2+A49+0/scjD5i9n wQCFI4Md49DRV8qvC04YmN4XC72PBKo59z0ptw1LP1yYuD3n0IjjxhRmkaEGLS/x abSs3DfwDDD3Bkl/CprJ6ZfoNez5jOsgdPgPH+c5QdHYk837JAgiLZL0M5YK+Gqf azuYSv0XfSA6Jg4ioaqsw5gq2QhJS0/ej3VN9qLIspDLncx0BHHr99inrmuvONbl cgtm24xQx8ezG8iEK4Ij05bg/sflwP8czTx4La8tnK2p1VK+xHeezKRLjEFqmXCr NV8nZEPO7QVbNinViHnEcvz4fur1lYHpCJnG2UbNPipYT2XHsAkaVEZ8uvmg+Ovy alcAasSVq9YdQbgWyYFmwWXVoPeG87z53MDA7kPk2TihJaOz2jaY0me5J6fOgSqh QFETA8Hcd3Do9hf0MRY9HTX18/uKinW8HclVw2yZxdztGfOfAxA= =8tah -----END PGP SIGNATURE----- Merge tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never positive * tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Really make sure wall_to_monotonic isn't positive	2021-12-19 12:23:18 -08:00
Linus Torvalds	909e1d166c	- Fix the condition checking when the optimistic spinning of a waiter needs to be terminated -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmG/NxMACgkQEsHwGGHe VUpzQw/+OQ6cDj41E+482w3iQDdnQWTyWV29ukyBbR+QRDmi7IyPIR6YQ3mEz0Wu qiG76aO3R7+y0mc84ISaZPhbZ1pTCvOPaBiE91rachc1w9bLH1J/HIy2veKvPw29 8Vhn6sB2lUoh8y8Cy8AHgD0D6u/imBuBrVyO+qT22r1ZUlnZj02fT1U/XD2e3WNO Id9JXhzu6S2leRqg5hSS6WodXbtGBsM4k5jDscu3s4Akv0JS7dxaeVaEGLw5oqyJ +sIL6V6BwbfLEe4UOgvVzVgwzXnyhqtVF8ldaqj3PpdjhqUtzqGEmirUq4WVjZ+R A1mHZ3bgPQNqmdhhWNtz1IFSJcuVXGEgXSS98LStyLyxVPiAByo5wHWJxF3jx/UW ag2boT/MyoKP3iRclUKOgRqeDFsDH4HCNF9YEyqu5uSrvJhMNwhhCttCDFKu3cAl vSEXmgNr1gcL1IAUlm3w4ZQIU8x/eznfhZiVpoWqtGhSxQPmTShV/YT4S7SY7mtf 0kxhK/Y1nS4nQqDTyuyVzJDFVX1ZoS0SJXe1L9TnMiD7VLO9wEblgdaDfp8DxCrY YPCpnpmnV9tOyGVmbAJU+Xz9Pbuoahr0h7JoslPDKMJQTO30vc0reF2Z5gV05FCM SgFUExL9a3TGLplMPmz96MhtTnN/a884txQOCCpvDygubXVLnTs= =85I3 -----END PGP SIGNATURE----- Merge tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix the rtmutex condition checking when the optimistic spinning of a waiter needs to be terminated * tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()	2021-12-19 12:17:26 -08:00
Linus Torvalds	c36d891d78	- Prevent lock contention on the new sigaltstack lock on the common-case path, when no changes have been made to the alternative signal stack. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmG/NZ0ACgkQEsHwGGHe VUqwEQ/8CCQ2KKBbDZOYIr1wnl+FDIycgq7tnz+q9SomzxQODdDWLREBoTPsOtoE NZgXZEQxX4Wh/+4rvvdSMCVT3nz2GvSSasVKGrPZyLpDDyL3coRO0Ngx9iRUd1kF j67e9oMuNboPC5jJfP9cC4T+GgDQDnXAjjT3jX7aiIXnNjnOCTZ5Z7W8GKw7d2qH 4L2SJwAPOkuRicdQiRMJhVLsowsDIZtC8q8OZHhwu0dqM3/JVJCIxKKGKV69j5uk TUP6M0ZdyR30VrDfKYlm3m5fY0YFsBY/algphP41Hz5sUe9Xsw6F5+8sL3nCqLz1 BBUFr/00qVruM3jWmIag/OQ8/4cAFZjrx+8ewdF61OEOWya9Mq7VxINjT8R77B0i AuA6Bkv1LArJyfvywbbD6JzAj7TQFPuhFPc0BUFwZfn+B1rvxm88JK2mjR9aO/wZ ZHgDJ5hOSIKKNJ2W9g2fhW0MTMUELxKqxHZqOmQU/8ydVxYHZtD2GLHLDAU3XBoe 9PTntBvv7+qxqNQyY70k4jzIRfOFB8XuYxeWCbg10LqkbFFm2otYN2orsjVVBY7u 9wPQhFvJo6pHBx+dNIV6be56SnIeTCdIWBqlUcAto5mCVbmIxQoIMoNLo6rGBrhA 7UdhVCFJJki/Bs92aEQxl09volI9Ec7yXvmpU74LfKD+Gc8TxQo= =if9T -----END PGP SIGNATURE----- Merge tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull signal handlign fix from Borislav Petkov: - Prevent lock contention on the new sigaltstack lock on the common-case path, when no changes have been made to the alternative signal stack. * tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: signal: Skip the altstack update when not needed	2021-12-19 11:46:54 -08:00
Linus Torvalds	a4cc5ea443	- only enable pci_remap_iospace() for Ralink devices -----BEGIN PGP SIGNATURE----- iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmG/DtkaHHRzYm9nZW5k QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAmSA/9E+u2lGZKybIKiKSU9AT/ nDpgEQqbT/Z29OS9VsV4CjLBpQLv44EbER9IxhemfsytOEfZPveY/X/RzLSNDYKx DqB2a+vPynpTtKyMvkIhh1lcze1ZmR9s87S4CB38mQ+5/0asSpofIr0y2nAwTmFz CLa2cyhy3se35SjmoTxdFAKonOG0MZ7OJoI0qiPUhAg+XlYI9Z8D2yllhrwl5GiN AUDbR88RZBmfrEJS6p4TlHZgE4eV893uzQLxQtrChu9HJPAeouMvCXsQBQLZJRu7 r9n3koM6XY77Xxm1qZB3xHKdfxBojSBj3kJvgwbgZDYILROLdOVempMdCnOONJyZ WThmjx8/MUMC0X/U4ECjGT7D6XALksWA2qHTKScl3tqOZu/wQGrMVYqjIz2B1gtJ DEyhmm+fKHkhpnMY19Bby6tpd1OkjO0Wg5dsTpWAub94XZi2Xq0T2n9qKQytwy5t y6Sdsb6f9e4MH0/kAzHbKakNT5v5eDx8k2h3zNFtBzvQZLmlbYeDO5aLKsfuyvNQ YvDFVf1fRrTaHv5AffVhgNYP6LjNKFMpte1aPiU5iBEdEnKs17uMKaSM9g5hEm23 yHf2TcsEotteg1iXxgZr5akY3tPE9wFYmDU5EKp3xv19s6ODj9pbBIZE/ujgHcOt SgaBNwy/qK0N5rHXSOrSZSA= =IyrT -----END PGP SIGNATURE----- Merge tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: - only enable pci_remap_iospace() for Ralink devices * tag 'mips-fixes_5.16_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Only define pci_remap_iospace() for Ralink	2021-12-19 11:40:11 -08:00
Linus Torvalds	713ab911f2	powerpc fixes for 5.16 #4 Fix a recently introduced oops at boot on 85xx in some configurations. Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX. Thanks to: Joe Lawrence, Russell Currey, Xiaoming Ni. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmG+q+UTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG5sEACVsb39+Q8w0GKJxUGR0Wri1InJTkLC UWpd9PVzG6n8+Epl0f+H5cMlv5V6xaDzTsNXEf27xIACMlPSVHVLDW8u/RFPizdp fUDUweGUvK/M0GQW89nRV29DjdNjMmUq1KQFM6eEgYN/0nS3HqF27jamiaFhiZ+e pJl1g2dy/DTWrFz3V/8b0nv6kyJE8GlpnCmTchDsPNDbzWQVJXVfM7IB/9N+EVna MQAZCyKSOafTgNRPOW+Sm4oF64z+Yn4cxURVKfzbM+0Cd8pNIcGZ7McXrnp0DR4t bLGeYEzyR1bPc3VTt7b+jafDP41AakRllKAFz65+pdYOcV1hTRM4inbLVPT5Bmz6 7ANlX/w4BwYWFk3mWW14BoxU02FlRKRddXhusq7x2wIqvtn1KI8Qm1xpneiGpcW+ GEe+k9ONGowXp/bc47/Co6dPSHKRRHUC3MsgbDv2Y7M5G9vsetckdQzA2bIqYSx8 r1ZXNiMwQarPeB6FrGvNbekyCsIhJkk6chhtL0k27BVOFqYiHZtaqBs+RCQYPdC8 Aih8WOEqdldbItFI3MihyKULucgkVpdnm80Ja67qRthuJICF9OML3mdrt83bxEIp fu4Bsw8dygVKJWScZj8AUC/ZBxOrrTGyZR1RvsHpaAiEfinMkxxTnXjqNDiSZWXt DKKLx6pfSFrokw== =Nmp5 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix a recently introduced oops at boot on 85xx in some configurations. Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX. Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni" * tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/module_64: Fix livepatching for RO modules powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n	2021-12-19 11:31:14 -08:00
Linus Torvalds	9273d6cb99	two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmG9V/0ACgkQiiy9cAdy T1GbQAv+N/2p7mMzhxcB91xlU6DSo2kZhC1FcmiXld3BEiMzkef6Rtb/lzW1AtBP kNgqjVU7MMmdFQNKLhrHUV2BNCtLWKOC2I0xTZqr+nBlGMJkiNUBzUWW8DcnLd1e zGky7Wck8HHu7LdnZA3TVlPhNV9KjdlHMEYBBJFlOc779a/Q0KyHLdwhP9mpSa2d QrVRHkBMylNUqF4WiLKQ0I8eJFhKtd7pJNAW1qGeKe2YDV6kZQiXwHOLIZb7SHET 9Qtj83IwsQ13cLTmb7tGibj0Fn+GEEqgBv40PjSMuTCoiCpfhcLrx0lWT5hxlxao 9w3Yw1pPPaJXbxCmL9b3Bqm3vnNqMaN7HqxsuGIFwnulGFB4GJY2l+ZQPcTHzWG0 oCHlimlixH7RKVEfjIDhidC1C0ReZI8/1XuHPGE17Ysxt6L7dH/a7J2y7SlIdYTZ ZVvVT0Kib74I7ZUJH1ymc4MdAVnBIkiTGOrQk/FU+cHOyzMUpNcwXiCqkvF77QUa hDW4hkzE =MW5I -----END PGP SIGNATURE----- Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable" * tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: sanitize multiple delimiters in prepath cifs: ignore resource_id while getting fscache super cookie	2021-12-19 11:23:02 -08:00
Sean Christopherson	18c841e1f4	KVM: x86: Retry page fault if MMU reload is pending and root has no sp Play nice with a NULL shadow page when checking for an obsolete root in the page fault handler by flagging the page fault as stale if there's no shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending. Invalidating memslots, which is the only case where _all_ roots need to be reloaded, requests all vCPUs to reload their MMUs while holding mmu_lock for lock. The "special" roots, e.g. pae_root when KVM uses PAE paging, are not backed by a shadow page. Running with TDP disabled or with nested NPT explodes spectaculary due to dereferencing a NULL shadow page pointer. Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the root. Zapping shadow pages in response to guest activity, e.g. when the guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen with a completely valid root shadow page. This is a bit of a moot point as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will be cleaned up in the future. Fixes: `a955cad84c` ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211209060552.2956723-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-19 19:38:58 +01:00
Vitaly Kuznetsov	0b091a43d7	KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDs Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend on guest visible CPUIDs and (incorrect) KVM logic implementing it is about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden and causes test to fail. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: `feb627e8d6` ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-19 19:35:29 +01:00
Vitaly Kuznetsov	1aa2abb33a	KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should not depend on guest visible CPUID entries, even if just to allow creating/restoring guest MSRs and CPUIDs in any sequence. Fixes: `27461da310` ("KVM: x86/pmu: Support full width counting") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-12-19 19:35:18 +01:00
Jens Axboe	87959fa16c	Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption" This reverts commit `cb2ac2912a`. Alex and the kernel test robot report that this causes a significant performance regression with BFQ. I can reproduce that result, so let's revert this one as we're close to -rc6 and we there's no point in trying to rush a fix. Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/ Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/ Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-19 07:58:44 -07:00
Chuck Lever	53b1119a6e	NFSD: Fix READDIR buffer overflow If a client sends a READDIR count argument that is too small (say, zero), then the buffer size calculation in the new init_dirlist helper functions results in an underflow, allowing the XDR stream functions to write beyond the actual buffer. This calculation has always been suspect. NFSD has never sanity- checked the READDIR count argument, but the old entry encoders managed the problem correctly. With the commits below, entry encoding changed, exposing the underflow to the pointer arithmetic in xdr_reserve_space(). Modern NFS clients attempt to retrieve as much data as possible for each READDIR request. Also, we have no unit tests that exercise the behavior of READDIR at the lower bound of @count values. Thus this case was missed during testing. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Fixes: `f5dcccd647` ("NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream") Fixes: `7f87fc2d34` ("NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2021-12-18 17:11:06 -05:00

1 2 3 4 5 ...

1059925 Commits All Branches Search

1059925 Commits

All Branches