linux-stable/drivers
David Hildenbrand 3fcebf9020 mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy
Currently, the "auto-movable" online policy does not allow for hotplugged
KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
can have, primarily, because there is no coordiantion across memory
devices and we don't want to create zone-imbalances accidentially when
unplugging memory.

However, within a single memory device it's different.  Let's allow for
KERNEL memory within a dynamic memory group to allow for more MOVABLE
within the same memory group.  The only thing we have to take care of is
that the managing driver avoids zone imbalances by unplugging MOVABLE
memory first, otherwise there can be corner cases where unplug of memory
could result in (accidential) zone imbalances.

virtio-mem is the only user of dynamic memory groups and recently added
support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
don't need a new toggle to enable it for dynamic memory groups.

We limit this handling to dynamic memory groups, because:

* We want to keep the runtime overhead for collecting stats when
  onlining a single memory block small.  We tend to have only a handful of
  dynamic memory groups, but we can have quite some static memory groups
  (e.g., 256 DIMMs).

* It doesn't make too much sense for static memory groups, as we try
  onlining all applicable memory blocks either completely to ZONE_MOVABLE
  or not.  In ordinary operation, we won't have a mixture of zones within
  a static memory group.

When adding memory to a dynamic memory group, we'll first online memory to
ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
online the next unit(s) to ZONE_NORMAL, until we can online the next
unit(s) to ZONE_MOVABLE.

For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
result in a layout like:

  [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
  ^ movable memory due to early kernel memory
			   ^ allows for more movable memory ...
			      ^-----^ ... here
				       ^ allows for more movable memory ...
				          ^-----^ ... here

While the created layout is sub-optimal when it comes to contiguous zones,
it gives us the maximum flexibility when dynamically growing/shrinking a
device; we can grow small VMs really big in small steps, and still shrink
reliably to e.g., 1/4 of the maximum VM size in this example, removing
full memory blocks along with meta data more reliably.

Mark dynamic memory groups in the xarray such that we can efficiently
iterate over them when collecting stats.  In usual setups, we have one
virtio-mem device per NUMA node, and usually only a small number of NUMA
nodes.

Note: for now, there seems to be no compelling reason to make this
behavior configurable.

Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hui Zhu <teawater@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 11:50:23 -07:00
..
accessibility TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
acpi ACPI: memhotplug: use a single static memory group for a single memory device 2021-09-08 11:50:23 -07:00
amba
android
ata libata-5.14-2021-07-30 2021-07-30 10:56:47 -07:00
atm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
auxdisplay
base mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy 2021-09-08 11:50:23 -07:00
bcma
block block-5.14-2021-08-27 2021-08-27 16:08:29 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bus Networking fixes for 5.14(-rc8?), including fixes from can and bpf. 2021-08-26 13:20:22 -07:00
cdrom block: remove REQ_OP_SCSI_{IN,OUT} 2021-06-30 15:34:19 -06:00
char tpm_ftpm_tee: Free and unregister TEE shared memory during kexec 2021-07-21 07:55:50 +02:00
clk One hot fix for a NULL pointer deref in the Renesas usb clk driver 2021-08-29 12:52:17 -07:00
clocksource This round has a diffstat dominated by Qualcomm clk drivers. Honestly though 2021-07-01 13:26:16 -07:00
comedi Staging / IIO driver patches for 5.14-rc1 2021-07-05 14:01:53 -07:00
connector
counter
cpufreq Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm 2021-08-17 20:52:07 +02:00
cpuidle cpuidle: teo: Rename two local variables in teo_select() 2021-08-03 15:18:57 +02:00
crypto ARM: SoC changes for 5.14 2021-07-10 09:22:44 -07:00
cxl
dax dax/kmem: use a single static memory group for a single probed unit 2021-09-08 11:50:23 -07:00
dca
devfreq PM / devfreq: passive: Fix get_target_freq when not using required-opp 2021-06-24 10:37:35 +09:00
dio
dma dmaengine fixes for v5.14 2021-08-06 11:08:24 -07:00
dma-buf Short summary of fixes pull: 2021-07-13 15:15:17 +02:00
edac EDAC/igen6: fix core dependency AGAIN 2021-07-15 11:59:59 -07:00
eisa
extcon Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
firewire Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
firmware Ard says: 2021-08-15 06:38:26 -10:00
fpga fpga: dfl: fme: Fix cpu hotplug issue in performance reporting 2021-07-27 11:05:16 -07:00
fsi
gnss
gpio gpio: tqmx86: really make IRQ optional 2021-08-02 17:17:27 +02:00
gpu drm/imx: imx-drm alignment and plane offset fixes 2021-08-27 10:49:53 +10:00
greybus
hid HID: ft260: fix device removal due to USB disconnect 2021-07-29 12:38:32 +02:00
hsi
hv Drivers: hv: vmbus: Fix duplicate CPU assignments within a device 2021-07-19 09:26:31 +00:00
hwmon Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
hwspinlock
hwtracing Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
i2c i2c: dev: zero out array used for i2c reads from userspace 2021-08-10 22:54:10 +02:00
i3c I3C for 5.14 2021-07-10 11:53:06 -07:00
idle
iio iio: adc: Fix incorrect exit of for-loop 2021-07-31 14:46:05 +01:00
infiniband RDMA/rxe: Zero out index member of struct rxe_queue 2021-08-20 15:48:58 -03:00
input This pull request contains the following changes for UML: 2021-07-09 10:19:13 -07:00
interconnect Revert "interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate" 2021-08-12 09:24:39 +03:00
iommu iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() 2021-08-18 13:15:58 +02:00
ipack ipack: tpci200: fix memory leak in the tpci200_register 2021-08-13 10:24:37 +02:00
irqchip irqchip fixes for 5.14, take #1 2021-07-09 15:35:13 +02:00
isdn TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
leds This contains quite a lot of fixes, with more fixes in my inbox that 2021-07-03 11:57:42 -07:00
lightnvm
macintosh
mailbox mbox: add polarfire soc system controller mailbox 2021-06-26 12:06:48 -05:00
mcb mcb: Use DEFINE_RES_MEM() helper macro and fix the end address 2021-06-24 15:56:25 +02:00
md block-5.14-2021-08-07 2021-08-07 10:26:21 -07:00
media media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor() 2021-08-26 18:52:30 +02:00
memory
memstick for-5.14/block-2021-06-29 2021-06-30 12:12:56 -07:00
message scsi: message: mptfc: Switch from pci_ to dma_ API 2021-06-22 23:00:01 -04:00
mfd Driver core changes for 5.14-rc1 2021-07-05 13:51:41 -07:00
misc Merge tag 'at24-fixes-for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current 2021-07-20 22:28:56 +02:00
mmc Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711" 2021-08-27 16:30:36 +02:00
most
mtd MTD core fixes: 2021-08-16 06:36:01 -10:00
mux
net Revert "net: really fix the build..." 2021-08-26 11:08:32 -07:00
nfc nfc: nfcsim: fix use after free during module unload 2021-07-28 10:20:16 +01:00
ntb
nubus
nvdimm libnvdimm/region: Fix label activation vs errors 2021-08-11 11:54:43 -07:00
nvme block-5.14-2021-07-24 2021-07-24 12:57:06 -07:00
nvmem Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
of Devicetree updates for v5.14: 2021-07-03 10:54:08 -07:00
opp opp: core: Check for pending links before reading required_opp pointers 2021-08-23 12:44:55 +05:30
parisc kernel.h: split out panic and oops helpers 2021-07-01 11:06:04 -07:00
parport
pci PCI/MSI: Skip masking MSI-X on Xen PV 2021-08-27 00:27:15 +02:00
pcmcia pcmcia: i82092: fix a null pointer dereference bug 2021-07-23 08:08:54 +02:00
perf
phy USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
pinctrl pinctrl: amd: Fix an issue with shutdown when system set to s0ix 2021-08-12 11:16:40 +02:00
platform platform/x86: gigabyte-wmi: add support for B450M S2H V2 2021-08-18 19:39:31 +02:00
pnp Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
power power: supply: Fix fall-through warnings for Clang 2021-07-13 14:50:47 -05:00
powercap
pps
ps3
ptp ptp_pch: Restore dependency on PCI 2021-08-16 11:11:06 +01:00
pwm pwm: ep93xx: Ensure configuring period and duty_cycle isn't wrongly skipped 2021-07-08 16:09:30 +02:00
rapidio
ras
regulator regulator: Fixes for v5.14 2021-07-21 12:37:49 -07:00
remoteproc remoteproc updates for v5.14 2021-07-07 10:50:03 -07:00
reset reset: reset-zynqmp: Fixed the argument data type 2021-08-23 12:55:18 +02:00
rpmsg
rtc RTC for 5.14 2021-07-10 16:19:10 -07:00
s390 Networking fixes for 5.14-rc6, including fixes from netfilter, bpf, 2021-08-12 16:24:03 -10:00
sbus
scsi SCSI fixes on 20210828 2021-08-28 11:39:16 -07:00
sh
siox siox: Simplify error handling via dev_err_probe() 2021-06-24 15:46:34 +02:00
slimbus slimbus: ngd: reset dma setup during runtime pm 2021-08-13 10:22:30 +02:00
soc NXP/FSL SoC driver fixes for v5.14 2021-08-16 22:42:02 +02:00
soundwire Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
spi spi: Fixes for v5.14 2021-08-06 11:15:02 -07:00
spmi spmi: hisi-spmi-controller: move driver from staging 2021-06-25 10:02:05 +02:00
ssb
staging Revert "media: dvb header files: move some headers to staging" 2021-08-23 09:49:09 -07:00
target scsi: target: Fix NULL dereference on XCOPY completion 2021-07-20 23:18:22 -04:00
tc
tee tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag 2021-07-21 07:55:50 +02:00
thermal - Add rk3568 sensor support (Finley Xiao) 2021-07-10 11:43:25 -07:00
thunderbolt Revert "thunderbolt: Hide authorized attribute if router does not support PCIe tunnels" 2021-07-27 18:14:25 +02:00
tty serial: 8250_pci: Avoid irq sharing for MSI(-X) interrupts. 2021-07-30 13:06:19 +02:00
uio
usb usb: gadget: u_audio: fix race condition on endpoint stop 2021-08-27 16:07:23 +02:00
vdpa virtio,vhost,vdpa: bugfixes 2021-08-16 06:16:25 -10:00
vfio VFIO update for v5.14-rc1 2021-07-03 11:49:33 -07:00
vhost vringh: Use wiov->used to check for read/write desc order 2021-08-11 06:44:24 -04:00
video drm fixes for 5.14-rc2 2021-07-16 11:14:54 -07:00
virt virt: acrn: Do hcall_destroy_vm() before resource release 2021-07-27 16:48:45 +02:00
virtio virtio-mem: use a single dynamic memory group for a single virtio-mem device 2021-09-08 11:50:23 -07:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.14-rc1 tag 2021-07-07 12:57:46 -07:00
xen xen: branch for v5.14-rc6 2021-08-14 06:31:22 -10:00
zorro
Kconfig
Makefile hyperv-next for 5.14 2021-06-29 11:21:35 -07:00