linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-01 22:54:01 +00:00

Author	SHA1	Message	Date
Xianting Tian	ab551157a3	RISC-V: Add modules to virtual kernel memory layout dump commit `f9293ad46d` upstream. Modules always live before the kernel, MODULES_END is fixed but MODULES_VADDR isn't fixed, it depends on the kernel size. Let's add it to virtual kernel memory layout dump. As MODULES is only defined for CONFIG_64BIT, so we dump it when CONFIG_64BIT=y. eg, MODULES_VADDR - MODULES_END 0xffffffff01133000 - 0xffffffff80000000 Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-5-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Fixes: `2bfc6cd81b` ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:11 +02:00
Atish Patra	08f46c025a	RISC-V: Update user page mapping only once during start commit `133a6d1fe7` upstream. Currently, riscv_pmu_event_set_period updates the userpage mapping. However, the caller of riscv_pmu_event_set_period should update the userpage mapping because the counter can not be updated/started from set_period function in counter overflow path. Invoke the perf_event_update_userpage at the caller so that it doesn't get invoked twice during counter start path. Fixes: `f5bfa23f57` ("RISC-V: Add a perf core library for pmu drivers") Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220711174632.4186047-3-atishp@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:11 +02:00
Atish Patra	a92c834c0b	RISC-V: Fix SBI PMU calls for RV32 commit `0209b5830b` upstream. Some of the SBI PMU calls does not pass 64bit arguments correctly and not under RV32 compile time flags. Currently, this doesn't create any incorrect results as RV64 ignores any value in the additional register and qemu doesn't support raw events. Fix those SBI calls in order to set correct values for RV32. Fixes: `e999143459` ("RISC-V: Add perf platform driver based on SBI PMU extension") Signed-off-by: Atish Patra <atishp@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220711174632.4186047-4-atishp@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:11 +02:00
Atish Patra	764a9024a7	RISC-V: Fix counter restart during overflow for RV32 commit `acc1b919f4` upstream. Pass the upper half of the initial value of the counter correctly for RV32. Fixes: `4905ec2fb7` ("RISC-V: Add sscofpmf extension support") Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220711174632.4186047-2-atishp@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:11 +02:00
Xianting Tian	2bd2089313	RISC-V: Fixup schedule out issue in machine_crash_shutdown() commit `ad943893d5` upstream. Current task of executing crash kexec will be schedule out when panic is triggered by RCU Stall, as it needs to wait rcu completion. It lead to inability to enter the crash system. The implementation of machine_crash_shutdown() is non-standard for RISC-V according to other Arch's implementation(eg, x86, arm64), we need to send IPI to stop secondary harts. [224521.877268] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [224521.883471] rcu: 0-...0: (3 GPs behind) idle=cfa/0/0x1 softirq=3968793/3968793 fqs=2495 [224521.891742] (detected by 2, t=5255 jiffies, g=60855593, q=328) [224521.897754] Task dump for CPU 0: [224521.901074] task:swapper/0 state:R running task stack: 0 pid: 0 ppid: 0 flags:0x00000008 [224521.911090] Call Trace: [224521.913638] [<ffffffe000c432de>] __schedule+0x208/0x5ea [224521.918957] Kernel panic - not syncing: RCU Stall [224521.923773] bad: scheduling from the idle thread! [224521.928571] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G O 5.10.113-yocto-standard #1 [224521.938658] Call Trace: [224521.941200] [<ffffffe00020395c>] walk_stackframe+0x0/0xaa [224521.946689] [<ffffffe000c34f8e>] show_stack+0x32/0x3e [224521.951830] [<ffffffe000c39020>] dump_stack_lvl+0x7e/0xa2 [224521.957317] [<ffffffe000c39058>] dump_stack+0x14/0x1c [224521.962459] [<ffffffe000243884>] dequeue_task_idle+0x2c/0x40 [224521.968207] [<ffffffe000c434f4>] __schedule+0x41e/0x5ea [224521.973520] [<ffffffe000c43826>] schedule+0x34/0xe4 [224521.978487] [<ffffffe000c46cae>] schedule_timeout+0xc6/0x170 [224521.984234] [<ffffffe000c4491e>] wait_for_completion+0x98/0xf2 [224521.990157] [<ffffffe00026d9e2>] __wait_rcu_gp+0x148/0x14a [224521.995733] [<ffffffe0002761c4>] synchronize_rcu+0x5c/0x66 [224522.001307] [<ffffffe00026f1a6>] rcu_sync_enter+0x54/0xe6 [224522.006795] [<ffffffe00025a436>] percpu_down_write+0x32/0x11c [224522.012629] [<ffffffe000c4266a>] _cpu_down+0x92/0x21a [224522.017771] [<ffffffe000219a0a>] smp_shutdown_nonboot_cpus+0x90/0x118 [224522.024299] [<ffffffe00020701e>] machine_crash_shutdown+0x30/0x4a [224522.030483] [<ffffffe00029a3f8>] __crash_kexec+0x62/0xa6 [224522.035884] [<ffffffe000c3515e>] panic+0xfa/0x2b6 [224522.040678] [<ffffffe0002772be>] rcu_sched_clock_irq+0xc26/0xcb8 [224522.046774] [<ffffffe00027fc7a>] update_process_times+0x62/0x8a [224522.052785] [<ffffffe00028d522>] tick_sched_timer+0x9e/0x102 [224522.058533] [<ffffffe000280c3a>] __hrtimer_run_queues+0x16a/0x318 [224522.064716] [<ffffffe0002812ec>] hrtimer_interrupt+0xd4/0x228 [224522.070551] [<ffffffe0009a69b6>] riscv_timer_interrupt+0x3c/0x48 [224522.076646] [<ffffffe000268f8c>] handle_percpu_devid_irq+0xb0/0x24c [224522.083004] [<ffffffe00026428e>] __handle_domain_irq+0xa8/0x122 [224522.089014] [<ffffffe00062f954>] riscv_intc_irq+0x38/0x60 [224522.094501] [<ffffffe000201bd4>] ret_from_exception+0x0/0xc [224522.100161] [<ffffffe000c42146>] rcu_eqs_enter.constprop.0+0x8c/0xb8 With the patch, it can enter crash system when RCU Stall occur. Fixes: `e53d28180d` ("RISC-V: Add kdump support") Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-4-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Xianting Tian	221f8cb76e	RISC-V: Fixup get incorrect user mode PC for kernel mode regs commit `59c026c359` upstream. When use 'echo c > /proc/sysrq-trigger' to trigger kdump, riscv_crash_save_regs() will be called to save regs for vmcore, we found "epc" value 00ffffffa5537400 is not a valid kernel virtual address, but is a user virtual address. Other regs(eg, ra, sp, gp...) are correct kernel virtual address. Actually 0x00ffffffb0dd9400 is the user mode PC of 'PID: 113 Comm: sh', which is saved in the task's stack. [ 21.201701] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #45 [ 21.201979] Hardware name: riscv-virtio,qemu (DT) [ 21.202160] epc : 00ffffffa5537400 ra : ffffffff80088640 sp : ff20000010333b90 [ 21.202435] gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be7c [ 21.202707] t1 : 0720072007200720 t2 : 30203a7375746174 s0 : ff20000010333cf0 [ 21.202973] s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001 [ 21.203243] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 28c8f0aeffea4e00 [ 21.203519] a5 : 28c8f0aeffea4e00 a6 : 0000000000000009 a7 : ffffffff8035c9b8 [ 21.203794] s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98 [ 21.204062] s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468 [ 21.204331] s8 : 00ffffffef451410 s9 : 0000000000000007 s10: 00aaaaaac0510700 [ 21.204606] s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00 [ 21.204876] t5 : ff60000001218000 t6 : ff200000103338b8 [ 21.205079] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008 With the incorrect PC, the backtrace showed by crash tool as below, the first stack frame is abnormal, crash> bt PID: 113 TASK: ff60000002269600 CPU: 0 COMMAND: "sh" #0 [ff2000001039bb90] __efistub_.Ldebug_info0 at 00ffffffa5537400 <-- Abnormal #1 [ff2000001039bcf0] panic at ffffffff806578ba #2 [ff2000001039bd50] sysrq_reset_seq_param_set at ffffffff8038c030 #3 [ff2000001039bda0] __handle_sysrq at ffffffff8038c5f8 #4 [ff2000001039be00] write_sysrq_trigger at ffffffff8038cad8 #5 [ff2000001039be20] proc_reg_write at ffffffff801b7edc #6 [ff2000001039be40] vfs_write at ffffffff80152ba6 #7 [ff2000001039be80] ksys_write at ffffffff80152ece #8 [ff2000001039bed0] sys_write at ffffffff80152f46 With the patch, we can get current kernel mode PC, the output as below, [ 17.607658] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #42 [ 17.607937] Hardware name: riscv-virtio,qemu (DT) [ 17.608150] epc : ffffffff800078f8 ra : ffffffff8008862c sp : ff20000010333b90 [ 17.608441] gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be68 [ 17.608741] t1 : 0720072007200720 t2 : 666666666666663c s0 : ff20000010333cf0 [ 17.609025] s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001 [ 17.609320] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 0000000000000000 [ 17.609601] a5 : ff60000001c78000 a6 : 000000000000003c a7 : ffffffff8035c9a4 [ 17.609894] s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98 [ 17.610186] s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468 [ 17.610469] s8 : 00ffffffca281410 s9 : 0000000000000007 s10: 00aaaaaab5bb6700 [ 17.610755] s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00 [ 17.611041] t5 : ff60000001218000 t6 : ff20000010333988 [ 17.611255] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008 With the correct PC, the backtrace showed by crash tool as below, crash> bt PID: 113 TASK: ff6000000226c200 CPU: 0 COMMAND: "sh" #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8 <--- Normal #1 [ff20000010333cf0] panic at ffffffff806578c6 #2 [ff20000010333d50] sysrq_reset_seq_param_set at ffffffff8038c03c #3 [ff20000010333da0] __handle_sysrq at ffffffff8038c604 #4 [ff20000010333e00] write_sysrq_trigger at ffffffff8038cae4 #5 [ff20000010333e20] proc_reg_write at ffffffff801b7ee8 #6 [ff20000010333e40] vfs_write at ffffffff80152bb2 #7 [ff20000010333e80] ksys_write at ffffffff80152eda #8 [ff20000010333ed0] sys_write at ffffffff80152f52 Fixes: `e53d28180d` ("RISC-V: Add kdump support") Co-developed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-3-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Xianting Tian	ce0dbe9fc5	RISC-V: kexec: Fixup use of smp_processor_id() in preemptible context commit `357628e68f` upstream. Use __smp_processor_id() to avoid check the preemption context when CONFIG_DEBUG_PREEMPT enabled, as we will enter crash kernel and no return. Without the patch, [ 103.781044] sysrq: Trigger a crash [ 103.784625] Kernel panic - not syncing: sysrq triggered crash [ 103.837634] CPU1: off [ 103.889668] CPU2: off [ 103.933479] CPU3: off [ 103.939424] Starting crashdump kernel... [ 103.943442] BUG: using smp_processor_id() in preemptible [00000000] code: sh/346 [ 103.950884] caller is debug_smp_processor_id+0x1c/0x26 [ 103.956051] CPU: 0 PID: 346 Comm: sh Kdump: loaded Not tainted 5.10.113-00002-gce03f03bf4ec-dirty #149 [ 103.965355] Call Trace: [ 103.967805] [<ffffffe00020372a>] walk_stackframe+0x0/0xa2 [ 103.973206] [<ffffffe000bcf1f4>] show_stack+0x32/0x3e [ 103.978258] [<ffffffe000bd382a>] dump_stack_lvl+0x72/0x8e [ 103.983655] [<ffffffe000bd385a>] dump_stack+0x14/0x1c [ 103.988705] [<ffffffe000bdc8fe>] check_preemption_disabled+0x9e/0xaa [ 103.995057] [<ffffffe000bdc926>] debug_smp_processor_id+0x1c/0x26 [ 104.001150] [<ffffffe000206c64>] machine_kexec+0x22/0xd0 [ 104.006463] [<ffffffe000291a7e>] __crash_kexec+0x6a/0xa4 [ 104.011774] [<ffffffe000bcf3fa>] panic+0xfc/0x2b0 [ 104.016480] [<ffffffe000656ca4>] sysrq_reset_seq_param_set+0x0/0x70 [ 104.022745] [<ffffffe000657310>] __handle_sysrq+0x8c/0x154 [ 104.028229] [<ffffffe0006577e8>] write_sysrq_trigger+0x5a/0x6a [ 104.034061] [<ffffffe0003d90e0>] proc_reg_write+0x58/0xd4 [ 104.039459] [<ffffffe00036cff4>] vfs_write+0x7e/0x254 [ 104.044509] [<ffffffe00036d2f6>] ksys_write+0x58/0xbe [ 104.049558] [<ffffffe00036d36a>] sys_write+0xe/0x16 [ 104.054434] [<ffffffe000201b9a>] ret_from_syscall+0x0/0x2 [ 104.067863] Will call new kernel at ecc00000 from hart id 0 [ 104.074939] FDT image at fc5ee000 [ 104.079523] Bye... With the patch we can got clear output, [ 67.740553] sysrq: Trigger a crash [ 67.744166] Kernel panic - not syncing: sysrq triggered crash [ 67.809123] CPU1: off [ 67.865210] CPU2: off [ 67.909075] CPU3: off [ 67.919123] Starting crashdump kernel... [ 67.924900] Will call new kernel at ecc00000 from hart id 0 [ 67.932045] FDT image at fc5ee000 [ 67.935560] Bye... Fixes: `0e105f1d00` ("riscv: use hart id instead of cpu id on machine_kexec") Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-2-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Ben Dooks	5b03e52f74	RISC-V: Declare cpu_ops_spinwait in <asm/cpu_ops.h> commit `da6d2128e5` upstream. The cpu_ops_spinwait is used in a couple of places in arch/riscv and is causing a sparse warning due to no declaration. Add this to <asm/cpu_ops.h> with the others to fix the following: arch/riscv/kernel/cpu_ops_spinwait.c:16:29: warning: symbol 'cpu_ops_spinwait' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220714071811.187491-1-ben.dooks@sifive.com [Palmer: Drop the extern from cpu_ops.c] Fixes: `2ffc48fc70` ("RISC-V: Move spinwait booting method to its own config") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Ben Dooks	027a537d14	RISC-V: cpu_ops_spinwait.c should include head.h commit `e4aa991c05` upstream. Running sparse shows cpu_ops_spinwait.c is missing two definitions found in head.h, so include it to stop the following warnings: arch/riscv/kernel/cpu_ops_spinwait.c:15:6: warning: symbol '__cpu_spinwait_stack_pointer' was not declared. Should it be static? arch/riscv/kernel/cpu_ops_spinwait.c:16:6: warning: symbol '__cpu_spinwait_task_pointer' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220713215306.94675-1-ben.dooks@sifive.com Fixes: `c78f94f35c` ("RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Mark Kettenis	a0fce741f8	riscv: dts: starfive: correct number of external interrupts commit `a208acf0ea` upstream. The PLIC integrated on the Vic_U7_Core integrated on the StarFive JH7100 SoC actually supports 133 external interrupts. 127 of these are exposed to the outside world; the remainder are used by other devices that are part of the core-complex such as the L2 cache controller. But all 133 interrupts are external interrupts as far as the PLIC is concerned. Fix the property so that the driver can manage these additional interrupts, which is important since the interrupts for the L2 cache controller are enabled by default. Fixes: `ec85362fb1` ("RISC-V: Add initial StarFive JH7100 device tree") Signed-off-by: Mark Kettenis <kettenis@openbsd.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220707185529.19509-1-kettenis@openbsd.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Conor Dooley	ac474f4964	dt-bindings: riscv: fix SiFive l2-cache's cache-sets commit `b60cf8e59e` upstream. Fix device tree schema validation error messages for the SiFive Unmatched: ' cache-sets:0:0: 1024 was expected'. The existing bindings allow for just 1024 cache-sets but the fu740 on Unmatched the has 2048 cache-sets. The ISA itself permits any arbitrary power of two, however this is not supported by dt-schema. The RTL for the IP, to which the number of cache-sets is a tunable parameter, has been released publicly so speculatively adding a small number of "reasonable" values seems unwise also. Instead, as the binding only supports two distinct controllers: add 2048 and explicitly lock it to the fu740's l2 cache while limiting 1024 to the l2 cache on the fu540. Fixes: `af951c3a11` ("dt-bindings: riscv: Update l2 cache DT documentation to add support for SiFive FU740") Reported-by: Atul Khare <atulkhare@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220803185359.942928-1-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:10 +02:00
Chen Lifu	7b6930bfd2	riscv: lib: uaccess: fix CSR_STATUS SR_SUM bit commit `c08b4848f5` upstream. Since commit `5d8544e2d0` ("RISC-V: Generic library routines and assembly") and commit `ebcbd75e39` ("riscv: Fix the bug in memory access fixup code"), if __clear_user and __copy_user return from an fixup branch, CSR_STATUS SR_SUM bit will be set, it is a vulnerability, so that S-mode memory accesses to pages that are accessible by U-mode will success. Disable S-mode access to U-mode memory should clear SR_SUM bit. Fixes: `5d8544e2d0` ("RISC-V: Generic library routines and assembly") Fixes: `ebcbd75e39` ("riscv: Fix the bug in memory access fixup code") Signed-off-by: Chen Lifu <chenlifu@huawei.com> Reviewed-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://lore.kernel.org/r/20220615014714.1650349-1-chenlifu@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Yipeng Zou	3811d51778	riscv:uprobe fix SR_SPIE set/clear handling commit `3dbe582940` upstream. In riscv the process of uprobe going to clear spie before exec the origin insn,and set spie after that.But When access the page which origin insn has been placed a page fault may happen and irq was disabled in arch_uprobe_pre_xol function,It cause a WARN as follows. There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol. We can just remove it. [ 31.684157] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1488 [ 31.684677] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 76, name: work [ 31.684929] preempt_count: 0, expected: 0 [ 31.685969] CPU: 2 PID: 76 Comm: work Tainted: G [ 31.686542] Hardware name: riscv-virtio,qemu (DT) [ 31.686797] Call Trace: [ 31.687053] [<ffffffff80006442>] dump_backtrace+0x30/0x38 [ 31.687699] [<ffffffff80812118>] show_stack+0x40/0x4c [ 31.688141] [<ffffffff8081817a>] dump_stack_lvl+0x44/0x5c [ 31.688396] [<ffffffff808181aa>] dump_stack+0x18/0x20 [ 31.688653] [<ffffffff8003e454>] __might_resched+0x114/0x122 [ 31.688948] [<ffffffff8003e4b2>] __might_sleep+0x50/0x7a [ 31.689435] [<ffffffff80822676>] down_read+0x30/0x130 [ 31.689728] [<ffffffff8000b650>] do_page_fault+0x166/x446 [ 31.689997] [<ffffffff80003c0c>] ret_from_exception+0x0/0xc Fixes: `74784081aa` ("riscv: Add uprobes supported") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220721065820.245755-1-zouyipeng@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Helge Deller	7bff6c34fa	parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode commit `6431e92fc8` upstream. For all syscalls in 32-bit compat mode on 64-bit kernels the upper 32-bits of the 64-bit registers are zeroed out, so a negative 32-bit signed value will show up as positive 64-bit signed value. This behaviour breaks the io_pgetevents_time64() syscall which expects signed 64-bit values for the "min_nr" and "nr" parameters. Fix this by switching to the compat_sys_io_pgetevents_time64() syscall, which uses "compat_long_t" types for those parameters. Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
William Dean	99930403b0	parisc: Check the return value of ioremap() in lba_driver_probe() commit `cf59f34d7f` upstream. The function ioremap() in lba_driver_probe() can fail, so its return value should be checked. Fixes: `4bdc0d676a` ("remove ioremap_nocache and devm_ioremap_nocache") Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v5.6+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Helge Deller	d83dc72ff4	parisc: Drop pa_swapper_pg_lock spinlock commit `3fbc9a7de0` upstream. This spinlock was dropped with commit `b7795074a0` ("parisc: Optimize per-pagetable spinlocks") in kernel v5.12. Remove it to silence a sparse warning. Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: kernel test robot <lkp@intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Helge Deller	ae13346307	parisc: Fix device names in /proc/iomem commit `cab56b51ec` upstream. Fix the output of /proc/iomem to show the real hardware device name including the pa_pathname, e.g. "Merlin 160 Core Centronics [8:16:0]". Up to now only the pa_pathname ("[8:16.0]") was shown. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Jiachen Zhang	eb06c01c33	ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() commit `dd524b7f31` upstream. Some code paths cannot guarantee the inode have any dentry alias. So WARN_ON() all !dentry may flood the kernel logs. For example, when an overlayfs inode is watched by inotifywait (1), and someone is trying to read the /proc/$(pidof inotifywait)/fdinfo/INOTIFY_FD, at that time if the dentry has been reclaimed by kernel (such as echo 2 > /proc/sys/vm/drop_caches), there will be a WARN_ON(). The printed call stack would be like: ? show_mark_fhandle+0xf0/0xf0 show_mark_fhandle+0x4a/0xf0 ? show_mark_fhandle+0xf0/0xf0 ? seq_vprintf+0x30/0x50 ? seq_printf+0x53/0x70 ? show_mark_fhandle+0xf0/0xf0 inotify_fdinfo+0x70/0x90 show_fdinfo.isra.4+0x53/0x70 seq_show+0x130/0x170 seq_read+0x153/0x440 vfs_read+0x94/0x150 ksys_read+0x5f/0xe0 do_syscall_64+0x59/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 So let's drop WARN_ON() to avoid kernel log flooding. Reported-by: Hongbo Yin <yinhongbo@bytedance.com> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Signed-off-by: Tianci Zhang <zhangtianci.1997@bytedance.com> Fixes: `8ed5eec9d6` ("ovl: encode pure upper file handles") Cc: <stable@vger.kernel.org> # v4.16 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
John Allen	caa395aa16	crypto: ccp - Use kzalloc for sev ioctl interfaces to prevent kernel memory leak commit `13dc15a3f5` upstream. For some sev ioctl interfaces, input may be passed that is less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the buffer, the sev ioctl interfaces with the issue may return uninitialized slab memory. Currently, all of the ioctl interfaces in the ccp driver are safe, but to prevent future problems, change all ioctl interfaces that allocate memory with kmalloc to use kzalloc and memset the data buffer to zero in sev_ioctl_do_platform_status. Fixes: `38103671aa` ("crypto: ccp: Use the stack and common buffer for status commands") Fixes: `e799035609` ("crypto: ccp: Implement SEV_PEK_CSR ioctl command") Fixes: `76a2b524a4` ("crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command") Fixes: `d6112ea0cb` ("crypto: ccp - introduce SEV_GET_ID2 command") Cc: stable@vger.kernel.org Reported-by: Andy Nguyen <theflow@google.com> Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Peter Gonda <pgonda@google.com> Signed-off-by: John Allen <john.allen@amd.com> Reviewed-by: Peter Gonda <pgonda@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:09 +02:00
Al Viro	526198569d	fix short copy handling in copy_mc_pipe_to_iter() commit `c3497fd009` upstream. Unlike other copying operations on ITER_PIPE, copy_mc_to_iter() can result in a short copy. In that case we need to trim the unused buffers, as well as the length of partially filled one - it's not enough to set ->head, ->iov_offset and ->count to reflect how much had we copied. Not hard to fix, fortunately... I'd put a helper (pipe_discard_from(pipe, head)) into pipe_fs_i.h, rather than iov_iter.c - it has nothing to do with iov_iter and having it will allow us to avoid an ugly kludge in fs/splice.c. We could put it into lib/iov_iter.c for now and move it later, but I don't see the point going that way... Cc: stable@kernel.org # 4.19+ Fixes: `ca146f6f09` "lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()" Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Lukas Wunner	d49bb8cf9b	usbnet: Fix linkwatch use-after-free on disconnect commit `a69e617e53` upstream. usbnet uses the work usbnet_deferred_kevent() to perform tasks which may sleep. On disconnect, completion of the work was originally awaited in ->ndo_stop(). But in 2003, that was moved to ->disconnect() by historic commit "[PATCH] USB: usbnet, prevent exotic rtnl deadlock": https://git.kernel.org/tglx/history/c/0f138bbfd83c The change was made because back then, the kernel's workqueue implementation did not allow waiting for a single work. One had to wait for completion of all work by calling flush_scheduled_work(), and that could deadlock when waiting for usbnet_deferred_kevent() with rtnl_mutex held in ->ndo_stop(). The commit solved one problem but created another: It causes a use-after-free in USB Ethernet drivers aqc111.c, asix_devices.c, ax88179_178a.c, ch9200.c and smsc75xx.c: * If the drivers receive a link change interrupt immediately before disconnect, they raise EVENT_LINK_RESET in their (non-sleepable) ->status() callback and schedule usbnet_deferred_kevent(). * usbnet_deferred_kevent() invokes the driver's ->link_reset() callback, which calls netif_carrier_{on,off}(). * That in turn schedules the work linkwatch_event(). Because usbnet_deferred_kevent() is awaited after unregister_netdev(), netif_carrier_{on,off}() may operate on an unregistered netdev and linkwatch_event() may run after free_netdev(), causing a use-after-free. In 2010, usbnet was changed to only wait for a single instance of usbnet_deferred_kevent() instead of all work by commit `23f333a2bf` ("drivers/net: don't use flush_scheduled_work()"). Unfortunately the commit neglected to move the wait back to ->ndo_stop(). Rectify that omission at long last. Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/ Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/netdev/20220315113841.GA22337@pengutronix.de/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Acked-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/d1c87ebe9fc502bffcd1576e238d685ad08321e4.1655987888.git.lukas@wunner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Helge Deller	b1b2d5f295	fbcon: Fix accelerated fbdev scrolling while logo is still shown commit `3866cba87d` upstream. There is no need to directly skip over to the SCROLL_REDRAW case while the logo is still shown. When using DRM, this change has no effect because the code will reach the SCROLL_REDRAW case immediately anyway. But if you run an accelerated fbdev driver and have FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION enabled, console scrolling is slowed down by factors so that it feels as if you use a 9600 baud terminal. So, drop those unnecessary checks and speed up fbdev console acceleration during bootup. Cc: stable@vger.kernel.org # v5.10+ Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Helge Deller <deller@gmx.de> Link: https://patchwork.freedesktop.org/patch/msgid/YpkYxk7wsBPx3po+@p100 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Helge Deller	665df39ab2	fbcon: Fix boundary checks for fbcon=vc:n1-n2 parameters commit `cad564ca55` upstream. The user may use the fbcon=vc:<n1>-<n2> option to tell fbcon to take over the given range (n1...n2) of consoles. The value for n1 and n2 needs to be a positive number and up to (MAX_NR_CONSOLES - 1). The given values were not fully checked against those boundaries yet. To fix the issue, convert first_fb_vc and last_fb_vc to unsigned integers and check them against the upper boundary, and make sure that first_fb_vc is smaller than last_fb_vc. Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Helge Deller <deller@gmx.de> Link: https://patchwork.freedesktop.org/patch/msgid/YpkYRMojilrtZIgM@p100 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Rafael J. Wysocki	4fff6401ba	thermal: sysfs: Fix cooling_device_stats_setup() error code path commit `d5a8aa5d7d` upstream. If cooling_device_stats_setup() fails to create the stats object, it must clear the last slot in cooling_device_attr_groups that was initially empty (so as to make it possible to add stats attributes to the cooling device attribute groups). Failing to do so may cause the stats attributes to be created by mistake for a device that doesn't have a stats object, because the slot in question might be populated previously during the registration of another cooling device. Fixes: `8ea229511e` ("thermal: Add cooling device's statistics in sysfs") Reported-by: Di Shen <di.shen@unisoc.com> Tested-by: Di Shen <di.shen@unisoc.com> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Yang Xu	22359c722f	fs: Add missing umask strip in vfs_tmpfile commit `ac6800e279` upstream. All creation paths except for O_TMPFILE handle umask in the vfs directly if the filesystem doesn't support or enable POSIX ACLs. If the filesystem does then umask handling is deferred until posix_acl_create(). Because, O_TMPFILE misses umask handling in the vfs it will not honor umask settings. Fix this by adding the missing umask handling. Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fujitsu.com Fixes: `60545d0d46` ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...") Cc: <stable@vger.kernel.org> # 4.19+ Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
David Howells	e08fca013d	vfs: Check the truncate maximum size in inode_newsize_ok() commit `e2ebff9c57` upstream. If something manages to set the maximum file size to MAX_OFFSET+1, this can cause the xfs and ext4 filesystems at least to become corrupt. Ordinarily, the kernel protects against userspace trying this by checking the value early in the truncate() and ftruncate() system calls calls - but there are at least two places that this check is bypassed: (1) Cachefiles will round up the EOF of the backing file to DIO block size so as to allow DIO on the final block - but this might push the offset negative. It then calls notify_change(), but this inadvertently bypasses the checking. This can be triggered if someone puts an 8EiB-1 file on a server for someone else to try and access by, say, nfs. (2) ksmbd doesn't check the value it is given in set_end_of_file_info() and then calls vfs_truncate() directly - which also bypasses the check. In both cases, it is potentially possible for a network filesystem to cause a disk filesystem to be corrupted: cachefiles in the client's cache filesystem; ksmbd in the server's filesystem. nfsd is okay as it checks the value, but we can then remove this check too. Fix this by adding a check to inode_newsize_ok(), as called from setattr_prepare(), thereby catching the issue as filesystems set up to perform the truncate with minimal opportunity for bypassing the new check. Fixes: `1f08c925e7` ("cachefiles: Implement backing file wrangling") Fixes: `f441584858` ("cifsd: add file operations") Signed-off-by: David Howells <dhowells@redhat.com> Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Cc: stable@kernel.org Acked-by: Alexander Viro <viro@zeniv.linux.org.uk> cc: Steve French <sfrench@samba.org> cc: Hyunchul Lee <hyc.lee@gmail.com> cc: Chuck Lever <chuck.lever@oracle.com> cc: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Tetsuo Handa	777a462e1a	tty: vt: initialize unicode screen buffer commit `af77c56aa3` upstream. syzbot reports kernel infoleak at vcs_read() [1], for buffer can be read immediately after resize operation. Initialize buffer using kzalloc(). ---------- #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/fb.h> int main(int argc, char *argv[]) { struct fb_var_screeninfo var = { }; const int fb_fd = open("/dev/fb0", 3); ioctl(fb_fd, FBIOGET_VSCREENINFO, &var); var.yres = 0x21; ioctl(fb_fd, FBIOPUT_VSCREENINFO, &var); return read(open("/dev/vcsu", O_RDONLY), &var, sizeof(var)) == -1; } ---------- Link: https://syzkaller.appspot.com/bug?extid=31a641689d43387f05d3 [1] Cc: stable <stable@vger.kernel.org> Reported-by: syzbot <syzbot+31a641689d43387f05d3@syzkaller.appspotmail.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/4ef053cf-e796-fb5e-58b7-3ae58242a4ad@I-love.SAKURA.ne.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:08 +02:00
Bedant Patnaik	10ee32dee0	ALSA: hda/realtek: Add a quirk for HP OMEN 15 (8786) mute LED commit `30267718fe` upstream. Board ID 8786 seems to be another variant of the Omen 15 that needs ALC285_FIXUP_HP_MUTE_LED for working mute LED. Signed-off-by: Bedant Patnaik <bedant.patnaik@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220809142455.6473-1-bedant.patnaik@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Meng Tang	f2098aeee8	ALSA: hda/realtek: Add quirk for another Asus K42JZ model commit `f882c4bef9` upstream. There is another Asus K42JZ model with the PCI SSID 1043:1313 that requires the quirk ALC269VB_FIXUP_ASUS_MIC_NO_PRESENCE. Add the corresponding entry to the quirk table. Signed-off-by: Meng Tang <tangmeng@uniontech.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220805074534.20003-1-tangmeng@uniontech.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Allen Ballway	4855a85093	ALSA: hda/cirrus - support for iMac 12,1 model commit `74bba640d6` upstream. The 12,1 model requires the same configuration as the 12,2 model to enable headphones but has a different codec SSID. Adds 12,1 SSID for matching quirk. [ re-sorted in SSID order by tiwai ] Signed-off-by: Allen Ballway <ballway@chromium.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220810152701.1.I902c2e591bbf8de9acb649d1322fa1f291849266@changeid Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Meng Tang	a871cc6234	ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model commit `f83bb25924` upstream. There is another LENOVO 20149 (Type1Sku0) Notebook model with CX20590, the device PCI SSID is 17aa:3977, which headphones are not responding, that requires the quirk CXT_PINCFG_LENOVO_NOTEBOOK. Add the corresponding entry to the quirk table. Signed-off-by: Meng Tang <tangmeng@uniontech.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220808073406.19460-1-tangmeng@uniontech.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Paolo Bonzini	0679a58860	KVM: x86: revalidate steal time cache if MSR value changes commit `901d3765fa` upstream. Commit `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the steal time data is written at the address used by the old kernel instead of the old one. While at it, rename the variable from gfn to gpa since it is a plain physical address and not a right-shifted one. Reported-by: Dave Young <ruyang@redhat.com> Reported-by: Xiaoying Yan <yiyan@redhat.com> Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Paolo Bonzini	61274b9bf2	KVM: x86: do not report preemption if the steal time cache is stale commit `c3c28d24d9` upstream. Commit `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status", 2021-11-11) open coded the previous call to kvm_map_gfn, but in doing so it dropped the comparison between the cached guest physical address and the one in the MSR. This cause an incorrect cache hit if the guest modifies the steal time address while the memslots remain the same. This can happen with kexec, in which case the preempted bit is written at the address used by the old kernel instead of the old one. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Fixes: `7e2175ebd6` ("KVM: x86: Fix recording of guest steal time / preempted status") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Sean Christopherson	58336cce02	KVM: x86: Tag kvm_mmu_x86_module_init() with __init commit `982bae43f1` upstream. Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists is to initialize variables when kvm.ko is loaded, i.e. it must never be called after module initialization. Fixes: `1d0e848060` ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:07 +02:00
Vitaly Kuznetsov	5149d3bcd0	KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 commit `156b9d76e8` upstream. Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to hang upon boot or shortly after when a non-default TSC frequency was set for L1. The issue is observed on a host where TSC scaling is supported. The problem appears to be that Windows doesn't use TSC scaling for its guests, even when the feature is advertised, and KVM filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from L1's VMCS. This leads to L2 running with the default frequency (matching host's) while L1 is running with an altered one. Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when it was set for L1. TSC_MULTIPLIER is already correctly computed and written by prepare_vmcs02(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: `d041b5ea93` ("KVM: nVMX: Enable nested TSC scaling") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220712135009.952805-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	358da81ee2	KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP commit `2626206963` upstream. When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set the error code to the selector. Intel SDM's says nothing about the #GP, but AMD's APM explicitly states that both LLDT and LTR set the error code to the selector, not zero. Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0), but the KVM code in question is specific to the base from the descriptor. Fixes: `e37a75a13c` ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	a9ba2acb13	KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks commit `ec6e4d8632` upstream. Wait to mark the TSS as busy during LTR emulation until after all fault checks for the LTR have passed. Specifically, don't mark the TSS busy if the new TSS base is non-canonical. Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the only reason for the early check was to avoid marking a !PRESENT TSS as busy, i.e. the common !PRESENT is now done before setting the busy bit. Fixes: `e37a75a13c` ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	2165b6415d	KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 commit `c7d855c2af` upstream. Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed per the associated nested VMX MSRs' fixed0/1 settings. KVM cannot rely on hardware to perform the checks, even for the few checks that have higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in hardware while running the guest, (b) there may incompatible CR0/CR4 bits that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may have further restricted the allowed CR0/CR4 values by manipulating the guest's nested VMX MSRs. Note, despite a very strong desire to throw shade at Jim, commit `70f3aac964` ("kvm: nVMX: Remove superfluous VMX instruction fault checks") is not to blame for the buggy behavior (though the comment...). That commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks (though it did erroneously drop the CPL check, but that has already been remedied). KVM may force CR0.PE=1, but will do so only when also forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033 Fixes: `ec378aeef9` ("KVM: nVMX: Implement VMXON and VMXOFF") Reported-by: Eric Li <ercli@ucdavis.edu> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	32298c99a5	KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks commit `ca58f3aa53` upstream. Check that the guest (L2) and host (L1) CR4 values that would be loaded by nested VM-Enter and VM-Exit respectively are valid with respect to KVM's (L0 host) allowed CR4 bits. Failure to check KVM reserved bits would allow L1 to load an illegal CR4 (or trigger hardware VM-Fail or failed VM-Entry) by massaging guest CPUID to allow features that are not supported by KVM. Amusingly, KVM itself is an accomplice in its doom, as KVM adjusts L1's MSR_IA32_VMX_CR4_FIXED1 to allow L1 to enable bits for L2 based on L1's CPUID model. Note, although nested_{guest,host}_cr4_valid() are _currently_ used if and only if the vCPU is post-VMXON (nested.vmxon == true), that may not be true in the future, e.g. emulating VMXON has a bug where it doesn't check the allowed/required CR0/CR4 bits. Cc: stable@vger.kernel.org Fixes: `3899152ccb` ("KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	6137d1b6b7	KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value commit `f8ae08f978` upstream. Restrict the nVMX MSRs based on KVM's config, not based on the guest's current config. Using the guest's config to audit the new config prevents userspace from restoring the original config (KVM's config) if at any point in the past the guest's config was restricted in any way. Fixes: `62cc6b9dc6` ("KVM: nVMX: support restore of VMX capability MSRs") Cc: stable@vger.kernel.org Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	4ef0d3a74d	KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits commit `c33f6f2228` upstream. Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	8a42dc4fa5	KVM: Do not incorporate page offset into gfn=>pfn cache user address commit `3ba2c95ea1` upstream. Don't adjust the userspace address in the gfn=>pfn cache by the page offset from the gpa. KVM should never use the user address directly, and all KVM operations that translate a user address to something else require the user address to be page aligned. Ignoring the offset will allow the cache to reuse a gfn=>hva translation in the unlikely event that the page offset of the gpa changes, but the gfn does not. And more importantly, not having to (un)adjust the user address will simplify a future bug fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220429210025.3293691-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:06 +02:00
Sean Christopherson	3540accf65	KVM: Fix multiple races in gfn=>pfn cache refresh commit `58cd407ca4` upstream. Rework the gfn=>pfn cache (gpc) refresh logic to address multiple races between the cache itself, and between the cache and mmu_notifier events. The existing refresh code attempts to guard against races with the mmu_notifier by speculatively marking the cache valid, and then marking it invalid if a mmu_notifier invalidation occurs. That handles the case where an invalidation occurs between dropping and re-acquiring gpc->lock, but it doesn't handle the scenario where the cache is refreshed after the cache was invalidated by the notifier, but before the notifier elevates mmu_notifier_count. The gpc refresh can't use the "retry" helper as its invalidation occurs _before_ mmu_notifier_count is elevated and before mmu_notifier_range_start is set/updated. CPU0 CPU1 ---- ---- gfn_to_pfn_cache_invalidate_start() \| -> gpc->valid = false; kvm_gfn_to_pfn_cache_refresh() \| \|-> gpc->valid = true; hva_to_pfn_retry() \| -> acquire kvm->mmu_lock kvm->mmu_notifier_count == 0 mmu_seq == kvm->mmu_notifier_seq drop kvm->mmu_lock return pfn 'X' acquire kvm->mmu_lock kvm_inc_notifier_count() drop kvm->mmu_lock() kernel frees pfn 'X' kvm_gfn_to_pfn_cache_check() \| \|-> gpc->valid == true caller accesses freed pfn 'X' Key off of mn_active_invalidate_count to detect that a pfncache refresh needs to wait for an in-progress mmu_notifier invalidation. While mn_active_invalidate_count is not guaranteed to be stable, it is guaranteed to be elevated prior to an invalidation acquiring gpc->lock, so either the refresh will see an active invalidation and wait, or the invalidation will run after the refresh completes. Speculatively marking the cache valid is itself flawed, as a concurrent kvm_gfn_to_pfn_cache_check() would see a valid cache with stale pfn/khva values. The KVM Xen use case explicitly allows/wants multiple users; even though the caches are allocated per vCPU, __kvm_xen_has_interrupt() can read a different vCPU (or vCPUs). Address this race by invalidating the cache prior to dropping gpc->lock (this is made possible by fixing the above mmu_notifier race). Complicating all of this is the fact that both the hva=>pfn resolution and mapping of the kernel address can sleep, i.e. must be done outside of gpc->lock. Fix the above races in one fell swoop, trying to fix each individual race is largely pointless and essentially impossible to test, e.g. closing one hole just shifts the focus to the other hole. Fixes: `982ed0de47` ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Mingwei Zhang <mizhang@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220429210025.3293691-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Sean Christopherson	f6cea8d247	KVM: Fully serialize gfn=>pfn cache refresh via mutex commit `93984f19e7` upstream. Protect gfn=>pfn cache refresh with a mutex to fully serialize refreshes. The refresh logic doesn't protect against - concurrent unmaps, or refreshes with different GPAs (which may or may not happen in practice, for example if a cache is only used under vcpu->mutex; but it's allowed in the code) - a false negative on the memslot generation. If the first refresh sees a stale memslot generation, it will refresh the hva and generation before moving on to the hva=>pfn translation. If it then drops gpc->lock, a different user of the cache can come along, acquire gpc->lock, see that the memslot generation is fresh, and skip the hva=>pfn update due to the userspace address also matching (because it too was updated). The refresh path can already sleep during hva=>pfn resolution, so wrap the refresh with a mutex to ensure that any given refresh runs to completion before other callers can start their refresh. Cc: stable@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220429210025.3293691-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Sean Christopherson	69d7292c89	KVM: Put the extra pfn reference when reusing a pfn in the gpc cache commit `3dddf65b4f` upstream. Put the struct page reference to pfn acquired by hva_to_pfn() when the old and new pfns for a gfn=>pfn cache match. The cache already has a reference via the old/current pfn, and will only put one reference when the cache is done with the pfn. Fixes: `982ed0de47` ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220429210025.3293691-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Sean Christopherson	99280706c3	KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper commit `345b0fd6fe` upstream. Drop the @pga param from __release_gpc() and rename the helper to make it more obvious that the cache itself is not being released. The helper will be reused by a future commit to release a pfn+khva combination that is _never_ associated with the cache, at which point the current name would go from slightly misleading to blatantly wrong. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220429210025.3293691-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Nico Boehr	ad9fa7c8aa	KVM: s390: pv: don't present the ecall interrupt twice commit `c3f0e5fd2d` upstream. When the SIGP interpretation facility is present and a VCPU sends an ecall to another VCPU in enabled wait, the sending VCPU receives a 56 intercept (partial execution), so KVM can wake up the receiving CPU. Note that the SIGP interpretation facility will take care of the interrupt delivery and KVM's only job is to wake the receiving VCPU. For PV, the sending VCPU will receive a 108 intercept (pv notify) and should continue like in the non-PV case, i.e. wake the receiving VCPU. For PV and non-PV guests the interrupt delivery will occur through the SIGP interpretation facility on SIE entry when SIE finds the X bit in the status field set. However, in handle_pv_notification(), there was no special handling for SIGP, which leads to interrupt injection being requested by KVM for the next SIE entry. This results in the interrupt being delivered twice: once by the SIGP interpretation facility and once by KVM through the IICTL. Add the necessary special handling in handle_pv_notification(), similar to handle_partial_execution(), which simply wakes the receiving VCPU and leave interrupt delivery to the SIGP interpretation facility. In contrast to external calls, emergency calls are not interpreted but also cause a 108 intercept, which is why we still need to call handle_instruction() for SIGP orders other than ecall. Since kvm_s390_handle_sigp_pei() is now called for all SIGP orders which cause a 108 intercept - even if they are actually handled by handle_instruction() - move the tracepoint in kvm_s390_handle_sigp_pei() to avoid possibly confusing trace messages. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Cc: <stable@vger.kernel.org> # 5.7 Fixes: `da24a0cc58` ("KVM: s390: protvirt: Instruction emulation") Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220718130434.73302-1-nrb@linux.ibm.com Message-Id: <20220718130434.73302-1-nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Maciej S. Szmigiero	3d4e2d884d	KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0 commit `f17c31c48e` upstream. Don't BUG/WARN on interrupt injection due to GIF being cleared, since it's trivial for userspace to force the situation via KVM_SET_VCPU_EVENTS (even if having at least a WARN there would be correct for KVM internally generated injections). kernel BUG at arch/x86/kvm/svm/svm.c:3386! invalid opcode: 0000 [#1] SMP CPU: 15 PID: 926 Comm: smm_test Not tainted 5.17.0-rc3+ #264 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:svm_inject_irq+0xab/0xb0 [kvm_amd] Code: <0f> 0b 0f 1f 00 0f 1f 44 00 00 80 3d ac b3 01 00 00 55 48 89 f5 53 RSP: 0018:ffffc90000b37d88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810a234ac0 RCX: 0000000000000006 RDX: 0000000000000000 RSI: ffffc90000b37df7 RDI: ffff88810a234ac0 RBP: ffffc90000b37df7 R08: ffff88810a1fa410 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff888109571000 R14: ffff88810a234ac0 R15: 0000000000000000 FS: 0000000001821380(0000) GS:ffff88846fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f74fc550008 CR3: 000000010a6fe000 CR4: 0000000000350ea0 Call Trace: <TASK> inject_pending_event+0x2f7/0x4c0 [kvm] kvm_arch_vcpu_ioctl_run+0x791/0x17a0 [kvm] kvm_vcpu_ioctl+0x26d/0x650 [kvm] __x64_sys_ioctl+0x82/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Fixes: `219b65dcf6` ("KVM: SVM: Improve nested interrupt injection") Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <35426af6e123cbe91ec7ce5132ce72521f02b1b5.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Sean Christopherson	b810a13ea0	KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case commit `764643a6be` upstream. If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in vmcs12. When restoring nested state, e.g. after migration, without a nested run pending, prepare_vmcs02() will propagate nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into vmcs02.GUEST_IA32_DEBUGCTL. If userspace restores nested state before MSRs, then loading garbage is a non-issue as loading DEBUGCTL will also update vmcs02. But if usersepace restores MSRs first, then KVM is responsible for propagating L2's value, which is actually thrown into vmcs01, into vmcs02. Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state is all kinds of bizarre and ideally would not be supported. Sadly, some VMMs do exactly that and rely on KVM to make things work. Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL to vmcs02 across RSM may corrupt L2's DEBUGCTL. But KVM's entire VMX+SMM emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor. Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com Fixes: `8fcc4b5923` ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:05 +02:00
Sean Christopherson	ba29fef9b7	KVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case commit `fa578398a0` upstream. If a nested run isn't pending, snapshot vmcs01.GUEST_BNDCFGS irrespective of whether or not VM_ENTRY_LOAD_BNDCFGS is set in vmcs12. When restoring nested state, e.g. after migration, without a nested run pending, prepare_vmcs02() will propagate nested.vmcs01_guest_bndcfgs to vmcs02, i.e. will load garbage/zeros into vmcs02.GUEST_BNDCFGS. If userspace restores nested state before MSRs, then loading garbage is a non-issue as loading BNDCFGS will also update vmcs02. But if usersepace restores MSRs first, then KVM is responsible for propagating L2's value, which is actually thrown into vmcs01, into vmcs02. Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state is all kinds of bizarre and ideally would not be supported. Sadly, some VMMs do exactly that and rely on KVM to make things work. Note, there's still a lurking SMM bug, as propagating vmcs01.GUEST_BNDFGS to vmcs02 across RSM may corrupt L2's BNDCFGS. But KVM's entire VMX+SMM emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor. Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com Fixes: `62cf9bd811` ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS") Cc: stable@vger.kernel.org Cc: Lei Wang <lei4.wang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:40:04 +02:00

1 2 3 4 5 ...

1093370 commits