linux-stable/fs
Ryan Roberts 3485b88390 mm: thp: introduce multi-size THP sysfs interface
In preparation for adding support for anonymous multi-size THP, introduce
new sysfs structure that will be used to control the new behaviours.  A
new directory is added under transparent_hugepage for each supported THP
size, and contains an `enabled` file, which can be set to "inherit" (to
inherit the global setting), "always", "madvise" or "never".  For now, the
kernel still only supports PMD-sized anonymous THP, so only 1 directory is
populated.

The first half of the change converts transhuge_vma_suitable() and
hugepage_vma_check() so that they take a bitfield of orders for which the
user wants to determine support, and the functions filter out all the
orders that can't be supported, given the current sysfs configuration and
the VMA dimensions.  The resulting functions are renamed to
thp_vma_suitable_orders() and thp_vma_allowable_orders() respectively. 
Convenience functions that take a single, unencoded order and return a
boolean are also defined as thp_vma_suitable_order() and
thp_vma_allowable_order().

The second half of the change implements the new sysfs interface.  It has
been done so that each supported THP size has a `struct thpsize`, which
describes the relevant metadata and is itself a kobject.  This is pretty
minimal for now, but should make it easy to add new per-thpsize files to
the interface if needed in future (e.g.  per-size defrag).  Rather than
keep the `enabled` state directly in the struct thpsize, I've elected to
directly encode it into huge_anon_orders_[always|madvise|inherit]
bitfields since this reduces the amount of work required in
thp_vma_allowable_orders() which is called for every page fault.

See Documentation/admin-guide/mm/transhuge.rst, as modified by this
commit, for details of how the new sysfs interface works.

[ryan.roberts@arm.com: fix build warning when CONFIG_SYSFS is disabled]
  Link: https://lkml.kernel.org/r/20231211125320.3997543-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20231207161211.2374093-4-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Barry Song <v-songbaohua@oppo.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Itaru Kitayama <itaru.kitayama@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20 14:48:12 -08:00
..
9p Bunch of small fixes: 2023-11-04 09:20:04 -10:00
adfs adfs: convert to new timestamp accessors 2023-10-18 13:26:18 +02:00
affs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
afs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
autofs autofs: add: new_inode check in autofs_fill_super() 2023-11-20 14:56:36 +01:00
bcachefs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
befs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
bfs bfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
btrfs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
cachefiles - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
ceph fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
coda coda: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
configfs configfs: convert to new timestamp accessors 2023-10-18 13:26:19 +02:00
cramfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
crypto This update includes the following changes: 2023-11-02 16:15:30 -10:00
debugfs wireless fixes: 2023-11-29 19:43:34 -08:00
devpts devpts: convert to new timestamp accessors 2023-10-18 13:26:20 +02:00
dlm dlm: slow down filling up processing queue 2023-10-12 15:21:00 -05:00
ecryptfs fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
efivarfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
efs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
erofs MAINTAINERS: erofs: add EROFS webpage 2023-11-17 19:55:46 +08:00
exfat exfat: fix ctime is not updated 2023-11-03 22:24:11 +09:00
exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined 2023-10-28 16:16:19 +02:00
ext2 fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
ext4 fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
f2fs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
fscache
fuse vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
gfs2 list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
hfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hfsplus vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
hostfs hostfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hpfs hpfs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
hugetlbfs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
iomap mm: add folio_fill_tail() and use it in iomap 2023-12-10 16:51:36 -08:00
isofs isofs: convert to new timestamp accessors 2023-10-18 14:08:22 +02:00
jbd2 Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
jffs2 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
jfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
kernfs Driver core changes for 6.7-rc1 2023-11-03 15:15:47 -10:00
lockd SUNRPC: change how svc threads are asked to exit. 2023-10-16 12:44:04 -04:00
minix minix: convert to new timestamp accessors 2023-10-18 14:08:23 +02:00
netfs netfs: Only call folio_start_fscache() one time for each folio 2023-09-18 12:03:46 -07:00
nfs list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
nfs_common
nfsd list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
nilfs2 nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage() 2023-12-06 16:12:50 -08:00
nls nls: Hide new NLS_UCS2_UTILS 2023-08-31 12:07:34 -05:00
notify vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ntfs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
ntfs3 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ocfs2 fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
omfs omfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
openpromfs openpromfs: convert to new timestamp accessors 2023-10-18 14:08:25 +02:00
orangefs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
overlayfs vfs-6.7-rc3.fixes 2023-11-24 09:45:40 -08:00
proc mm: thp: introduce multi-size THP sysfs interface 2023-12-20 14:48:12 -08:00
pstore pstore updates for v6.7-rc1 2023-10-30 19:26:39 -10:00
qnx4 qnx4: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
qnx6 qnx6: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
quota Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
ramfs ramfs: convert to new timestamp accessors 2023-10-18 14:08:26 +02:00
reiserfs Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
romfs vfs-6.7.ctime 2023-10-30 09:47:13 -10:00
smb smb: do not test the return value of folio_start_writeback() 2023-12-10 16:51:37 -08:00
squashfs squashfs: squashfs_read_data need to check if the length is 0 2023-12-06 16:12:45 -08:00
sysfs kernfs: sysfs: support custom llseek method for sysfs entries 2023-10-05 13:42:11 +02:00
sysv sysv: convert to new timestamp accessors 2023-10-18 14:08:28 +02:00
tracefs eventfs: Make sure that parent->d_inode is locked in creating files/dirs 2023-11-22 18:37:33 -05:00
ubifs This pull request contains updates for UBI and UBIFS 2023-11-05 08:28:32 -10:00
udf \n 2023-11-02 08:19:51 -10:00
ufs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
unicode
vboxsf vboxsf: convert to new timestamp accessors 2023-10-18 14:08:29 +02:00
verity fsverity: skip PKCS#7 parser when keyring is empty 2023-08-20 10:33:43 -07:00
xfs list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
zonefs fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
aio.c aio: Annotate struct kioctx_table with __counted_by 2023-09-20 14:22:01 +02:00
anon_inodes.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
attr.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
bad_inode.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
binfmt_elf.c binfmt_elf: Only report padzero() errors when PROT_WRITE 2023-10-03 19:48:44 -07:00
binfmt_elf_fdpic.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c execve updates for v6.7-rc1 2023-10-30 19:28:19 -10:00
binfmt_script.c
buffer.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
char_dev.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c mm: convert DAX lock/unlock page to lock/unlock folio 2023-10-04 10:32:20 -07:00
dcache.c list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
direct-io.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
drop_caches.c fs: drop_caches: draining pages before dropping caches 2023-08-18 10:12:11 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-07-11 11:41:34 +02:00
eventpoll.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
exec.c mm/mremap: allow moves within the same VMA for stack moves 2023-10-04 10:32:20 -07:00
fcntl.c treewide: mark stuff as __ro_after_init 2023-10-18 14:43:23 -07:00
fhandle.c exportfs: add helpers to check if filesystem can encode/decode file handles 2023-10-24 17:57:45 +02:00
file.c file, i915: fix file reference for mmap_singleton() 2023-10-25 22:17:04 +02:00
file_table.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
filesystems.c
fs-writeback.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
fs_context.c fs: factor out vfs_parse_monolithic_sep() helper 2023-10-12 18:53:36 +03:00
fs_parser.c
fs_pin.c
fs_struct.c kill do_each_thread() 2023-08-21 13:46:25 -07:00
fs_types.c
fsopen.c fsconfig: ensure that dirfd is set to aux 2023-09-22 14:09:06 +02:00
init.c fs: add a new SB_I_NOUMASK flag 2023-10-19 11:02:47 +02:00
inode.c list_lru: allow explicit memcg and NUMA node selection 2023-12-12 10:57:01 -08:00
internal.h fs: store real path instead of fake path in backing file f_path 2023-10-19 11:03:15 +02:00
ioctl.c v6.6-vfs.super 2023-08-28 11:04:18 -07:00
Kconfig fs/Kconfig: make hugetlbfs a menuconfig 2023-12-10 16:51:51 -08:00
Kconfig.binfmt riscv: support the elf-fdpic binfmt loader 2023-08-23 14:17:43 -07:00
kernel_read_file.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
libfs.c libfs: getdents() should return 0 after reaching EOD 2023-11-20 15:34:22 +01:00
locks.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
Makefile bcachefs: Initial commit 2023-10-22 17:08:07 -04:00
mbcache.c mbcache: dynamically allocate the mbcache shrinker 2023-10-04 10:32:25 -07:00
mnt_idmapping.c fs: export mnt_idmap_get/mnt_idmap_put 2023-11-03 23:28:33 +01:00
mount.h
mpage.c buffer: remove folio_create_empty_buffers() 2023-10-25 16:47:10 -07:00
namei.c vfs-6.7.misc 2023-10-30 09:14:19 -10:00
namespace.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
nsfs.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
open.c fs: store real path instead of fake path in backing file f_path 2023-10-19 11:03:15 +02:00
pipe.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
pnode.c
pnode.h
posix_acl.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
proc_namespace.c
read_write.c fs: Fix one kernel-doc comment 2023-08-15 08:32:45 +02:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
stack.c fs: convert core infrastructure to new timestamp accessors 2023-10-18 13:26:15 +02:00
stat.c fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-11-18 14:54:07 +01:00
statfs.c
super.c overlayfs update for 6.7-rc1 2023-11-07 11:46:31 -08:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
utimes.c
xattr.c xattr: make the xattr array itself const 2023-10-09 16:24:16 +02:00