linux-stable/Documentation
Muchun Song 78f39084b4 mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl
We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and
reboot the server to enable or disable the feature of optimizing vmemmap
pages associated with HugeTLB pages.  However, rebooting usually takes a
long time.  So add a sysctl to enable or disable the feature at runtime
without rebooting.  Why we need this?  There are 3 use cases.

1) The feature of minimizing overhead of struct page associated with
   each HugeTLB is disabled by default without passing
   "hugetlb_free_vmemmap=on" to the boot cmdline.  When we (ByteDance)
   deliver the servers to the users who want to enable this feature, they
   have to configure the grub (change boot cmdline) and reboot the
   servers, whereas rebooting usually takes a long time (we have thousands
   of servers).  It's a very bad experience for the users.  So we need a
   approach to enable this feature after rebooting.  This is a use case in
   our practical environment.

2) Some use cases are that HugeTLB pages are allocated 'on the fly'
   instead of being pulled from the HugeTLB pool, those workloads would be
   affected with this feature enabled.  Those workloads could be
   identified by the characteristics of they never explicitly allocating
   huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages'
   and then let the pages be allocated from the buddy allocator at fault
   time.  We can confirm it is a real use case from the commit
   099730d674.  For those workloads, the page fault time could be ~2x
   slower than before.  We suspect those users want to disable this
   feature if the system has enabled this before and they don't think the
   memory savings benefit is enough to make up for the performance drop.

3) If the workload which wants vmemmap pages to be optimized and the
   workload which wants to set 'nr_overcommit_hugepages' and does not want
   the extera overhead at fault time when the overcommitted pages be
   allocated from the buddy allocator are deployed in the same server. 
   The user could enable this feature and set 'nr_hugepages' and
   'nr_overcommit_hugepages', then disable the feature.  In this case, the
   overcommited HugeTLB pages will not encounter the extra overhead at
   fault time.

Link: https://lkml.kernel.org/r/20220512041142.39501-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 16:48:56 -07:00
..
ABI Docs/{ABI,admin-guide}/damon: Update for 'state' sysfs file input keyword, 'commit' 2022-05-13 07:20:09 -07:00
accounting - A bunch of fixes: forced idle time accounting, utilization values 2022-01-23 17:35:27 +02:00
admin-guide mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl 2022-05-13 16:48:56 -07:00
arc
arm Documentation: arm: marvell: Extend Avanta list 2022-01-27 11:22:34 -07:00
arm64 Merge branch 'for-next/mte' into for-next/core 2022-03-14 19:01:23 +00:00
block block: remove biodoc.rst 2022-02-15 07:47:52 -07:00
bpf docs: netdev: move the netdev-FAQ to the process pages 2022-03-31 10:49:39 +02:00
cdrom Documentation: Fix links for udftools project and pktcdvd tool 2022-02-15 16:15:33 -07:00
core-api XArray update for 5.18: 2022-04-01 13:40:44 -07:00
cpu-freq cpufreq: Reintroduce ready() callback 2022-02-09 13:18:49 +05:30
crypto
dev-tools kasan: move boot parameters section in documentation 2022-05-13 07:20:19 -07:00
devicetree Networking fixes for 5.18-rc5, including fixes from bluetooth, bpf 2022-04-28 12:34:50 -07:00
doc-guide docs: discourage use of list tables 2022-01-07 09:33:13 -07:00
driver-api Merge drm-misc/drm-misc-next-fixes into drm-misc-fixes 2022-04-05 11:39:22 +02:00
fault-injection
fb
features nds32: Remove the architecture 2022-03-07 13:54:59 +01:00
filesystems doc: update documentation for swap_activate and swap_rw 2022-05-09 18:20:48 -07:00
firmware-guide Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2022-03-26 12:46:08 -07:00
firmware_class
fpga
gpu pci-v5.18-changes 2022-03-25 13:02:05 -07:00
hid
hwmon Char/Misc and other driver updates for 5.18-rc1 2022-03-28 12:27:35 -07:00
i2c i2c: i801: Add support for Intel Raptor Lake PCH-S 2022-02-15 10:03:40 +01:00
ia64
ide
iio
infiniband
input Input: docs: add more details on the use of BTN_TOOL 2022-03-01 15:46:03 +01:00
isdn
kbuild kbuild: Make $(LLVM) more flexible 2022-03-31 12:03:46 +09:00
kernel-hacking docs: fix typo in Documentation/kernel-hacking/locking.rst 2022-01-27 11:22:33 -07:00
leds
litmus-tests
livepatch
locking Documentation: Fix duplicate statement about raw_spinlock_t type 2022-03-25 13:30:08 -06:00
m68k
maintainer Some late-arriving documentation improvements. This is mostly build-system 2022-03-31 12:10:42 -07:00
mhi
mips
misc-devices
netlabel
networking doc/ip-sysctl: add bc_forwarding 2022-04-20 10:31:43 +01:00
nios2
nvdimm
openrisc
parisc
PCI PCI/doc: cleanup references to the legacy PCI DMA API 2022-03-30 16:54:24 +02:00
pcmcia
peci docs: Add PECI documentation 2022-02-09 08:04:44 +01:00
power Documentation: EM: Describe new registration method using DT 2022-03-03 09:35:04 +05:30
powerpc
process docs: netdev: move the netdev-FAQ to the process pages 2022-03-31 10:49:39 +02:00
RCU
riscv Documentation: riscv: remove non-existent directory from table of contents 2022-03-31 16:18:56 -07:00
s390
scheduler Changes in this cycle were: 2022-03-22 14:39:12 -07:00
scsi scsi: ufs: docs: UFS documentation corrections 2022-03-08 22:49:49 -05:00
security selinux/stable-5.18 PR 20220321 2022-03-21 20:47:54 -07:00
sh
sound ALSA: hda/realtek: Add alc256-samsung-headphone fixup 2022-03-22 21:51:02 +01:00
sparc
sphinx docs: sphinx/requirements: Limit jinja2<3.1 2022-03-30 13:44:54 -06:00
sphinx-static
spi spi: pxa2xx_spi: Convert to use GPIO descriptors 2022-01-31 15:17:27 +00:00
staging remoteproc: Change rproc_shutdown() to return a status 2022-03-11 14:31:55 -06:00
target
timers
tools Real Time Analysis Tool updates for 5.18 2022-03-23 11:08:10 -07:00
trace Updates to Tracing: 2022-04-03 12:26:01 -07:00
translations Kbuild -std=gnu11 updates for v5.18 2022-03-25 11:48:01 -07:00
tty
usb usb: gadget: f_uac2: Optionally determine bInterval for HS and SS 2022-01-31 14:26:18 +01:00
userspace-api platform-drivers-x86 for v5.18-1 2022-03-25 12:14:39 -07:00
virt Documentation: KVM: Add SPDX-License-Identifier tag 2022-04-11 13:28:56 -04:00
vm Documentation/vm: rework "Temporary Virtual Mappings" section 2022-05-13 16:48:55 -07:00
w1
watchdog
x86 - More noinstr fixes 2022-03-25 12:34:53 -07:00
xtensa
.gitignore
arch.rst
asm-annotations.rst linkage: remove SYM_FUNC_{START,END}_ALIAS() 2022-02-22 16:21:34 +00:00
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py docs: pdfdocs: Pull LaTeX preamble part out of conf.py 2022-02-24 12:26:13 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst docs: Add PECI documentation 2022-02-09 08:04:44 +01:00
Kconfig
logo.gif
Makefile docs: Makefile: Add -no-shell-escape option to LATEXOPTS 2022-02-14 12:50:17 -07:00
memory-barriers.txt
SubmittingPatches
watch_queue.rst