linux-stable/arch/powerpc
Greg Kurz 31a88c82b4 KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
The EQ page is allocated by the guest and then passed to the hypervisor
with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
before handing it over to the HW. This reference is dropped either when
the guest issues the H_INT_RESET hcall or when the KVM device is released.
But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
In both cases the existing EQ page reference is leaked because we simply
overwrite it in the XIVE queue structure without calling put_page().

This is especially visible when the guest memory is backed with huge pages:
start a VM up to the guest userspace, either reboot it or unplug a vCPU,
quit QEMU. The leak is observed by comparing the value of HugePages_Free in
/proc/meminfo before and after the VM is run.

Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
platform level. This isn't the case right now because the various XIVE
drivers have different allocation needs. It could maybe worth introducing
hooks for this purpose instead of exposing XIVE internals to the drivers,
but this is certainly a huge work to be done later.

In the meantime, for easier backport, fix both vCPU unplug and guest reboot
leaks by introducing a wrapper around xive_native_configure_queue() that
does the necessary cleanup.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c5 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-11-21 16:24:41 +11:00
..
boot kbuild: remove ar-option and KBUILD_ARFLAGS 2019-10-01 09:20:33 +09:00
configs powerpc/configs: Enable secure guest support in pseries and ppc64 defconfigs 2019-08-30 09:56:30 +10:00
crypto treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
include KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch op 2019-10-22 16:29:02 +11:00
kernel powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devices 2019-09-27 09:04:17 +10:00
kvm KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one 2019-11-21 16:24:41 +11:00
lib powerpc/memcpy: Fix stack corruption for smaller sizes 2019-09-12 09:27:00 +10:00
math-emu treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mm libnvdimm fixes v5.4-rc1 2019-09-29 10:33:41 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-17 20:20:36 -07:00
oprofile treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333 2019-06-05 17:37:06 +02:00
perf powerpc/perf: fix imc allocation failure handling 2019-08-20 21:22:20 +10:00
platforms spufs: fix a crash in spufs_create_root() 2019-10-11 16:57:41 +11:00
purgatory treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
sysdev KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag 2019-09-24 12:46:26 +10:00
tools treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xmon powerpc/xmon: Improve output of XIVE interrupts 2019-09-14 00:58:47 +10:00
Kbuild treewide: Add SPDX license identifier - Kbuild 2019-05-30 11:32:33 -07:00
Kconfig powerpc updates for 5.4 2019-09-20 11:48:06 -07:00
Kconfig.debug powerpc/xmon: add read-only mode 2019-05-03 02:54:57 +10:00
Makefile powerpc updates for 5.4 2019-09-20 11:48:06 -07:00
Makefile.postlink kbuild: add $(BASH) to run scripts with bash-extension 2019-09-04 22:54:13 +09:00