tools/headers: Synchronize kernel ABI headers
After the SPDX license tags were added a number of tooling headers got out of
sync with their kernel variants, generating lots of build warnings.
Sync them:
- tools/arch/x86/include/asm/disabled-features.h,
tools/arch/x86/include/asm/required-features.h,
tools/include/linux/hash.h:
Remove the SPDX tag where the kernel version does not have it.
- tools/include/asm-generic/bitops/__fls.h,
tools/include/asm-generic/bitops/arch_hweight.h,
tools/include/asm-generic/bitops/const_hweight.h,
tools/include/asm-generic/bitops/fls.h,
tools/include/asm-generic/bitops/fls64.h,
tools/include/uapi/asm-generic/ioctls.h,
tools/include/uapi/asm-generic/mman-common.h,
tools/include/uapi/sound/asound.h,
tools/include/uapi/linux/kvm.h,
tools/include/uapi/linux/perf_event.h,
tools/include/uapi/linux/sched.h,
tools/include/uapi/linux/vhost.h,
tools/include/uapi/sound/asound.h:
Add the SPDX tag of the respective kernel header.
- tools/include/uapi/linux/bpf_common.h,
tools/include/uapi/linux/fcntl.h,
tools/include/uapi/linux/hw_breakpoint.h,
tools/include/uapi/linux/mman.h,
tools/include/uapi/linux/stat.h,
Change the tag to the kernel header version:
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
Also sync other header details:
- include/uapi/sound/asound.h:
Fix pointless end of line whitespace noise the header grew in this cycle.
- tools/arch/x86/lib/memcpy_64.S:
Sync the code and add tools/include/asm/export.h with dummy wrappers
to support building the kernel side code in a tooling header environment.
- tools/include/uapi/asm-generic/mman.h,
tools/include/uapi/linux/bpf.h:
Sync other details that don't impact tooling's use of the ABIs.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-03 11:18:37 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
2016-09-12 12:54:29 +00:00
|
|
|
#ifndef __ASM_GENERIC_MMAN_COMMON_H
|
|
|
|
#define __ASM_GENERIC_MMAN_COMMON_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
Author: Michael S. Tsirkin <mst@mellanox.co.il>, Mellanox Technologies Ltd.
|
|
|
|
Based on: asm-xxx/mman.h
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define PROT_READ 0x1 /* page can be read */
|
|
|
|
#define PROT_WRITE 0x2 /* page can be written */
|
|
|
|
#define PROT_EXEC 0x4 /* page can be executed */
|
|
|
|
#define PROT_SEM 0x8 /* page may be used for atomic ops */
|
2020-02-12 13:53:06 +00:00
|
|
|
/* 0x10 reserved for arch-specific use */
|
|
|
|
/* 0x20 reserved for arch-specific use */
|
2016-09-12 12:54:29 +00:00
|
|
|
#define PROT_NONE 0x0 /* page can not be accessed */
|
|
|
|
#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
|
|
|
|
#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */
|
|
|
|
|
tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
To deal with the move of some defines from asm-generic/mmap-common.h to
linux/mman.h done in:
746c9398f5ac ("arch: move common mmap flags to linux/mman.h")
The generated mmap_flags array stays the same:
$ tools/perf/trace/beauty/mmap_flags.sh
static const char *mmap_flags[] = {
[ilog2(0x40) + 1] = "32BIT",
[ilog2(0x01) + 1] = "SHARED",
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
[ilog2(0x8000) + 1] = "POPULATE",
[ilog2(0x10000) + 1] = "NONBLOCK",
[ilog2(0x20000) + 1] = "STACK",
[ilog2(0x40000) + 1] = "HUGETLB",
[ilog2(0x80000) + 1] = "SYNC",
};
$
And to have the system's sys/mman.h find the definition of MAP_SHARED
and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h
in a way that keeps it the same as the kernel's, need for keeping the
Android's NDK cross build working.
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-25 17:06:07 +00:00
|
|
|
/* 0x01 - 0x03 are defined in linux/mman.h */
|
2016-09-12 12:54:29 +00:00
|
|
|
#define MAP_TYPE 0x0f /* Mask for type of mapping */
|
|
|
|
#define MAP_FIXED 0x10 /* Interpret addr exactly */
|
|
|
|
#define MAP_ANONYMOUS 0x20 /* don't use a file */
|
|
|
|
|
tools headers UAPI: Update tools's copy of mman.h headers
To pick up the changes from:
8aa3c927ec10 ("mm/mmap: move common defines to mman-common.h")
22fcea6f85f2 ("mm: move MAP_SYNC to asm-generic/mman-common.h")
0bf5f9492389 ("mm: fix the MAP_UNINITIALIZED flag")
To address the following perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman.h' differs from latest version at 'include/uapi/asm-generic/mman.h'
diff -u tools/include/uapi/asm-generic/mman.h include/uapi/asm-generic/mman.h
That ends up just moving a bit the auto-generated code->string tables:
$ tools/perf/trace/beauty/mmap_flags.sh > before
$ cp include/uapi/asm-generic/mman.h tools/include/uapi/asm-generic/mman.h
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/mmap_flags.sh > after
$ diff -u before after
--- before 2019-07-26 12:45:02.948335904 -0300
+++ after 2019-07-26 12:48:05.342893539 -0300
@@ -4,15 +4,15 @@
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
+ [ilog2(0x008000) + 1] = "POPULATE",
+ [ilog2(0x010000) + 1] = "NONBLOCK",
+ [ilog2(0x020000) + 1] = "STACK",
+ [ilog2(0x040000) + 1] = "HUGETLB",
+ [ilog2(0x080000) + 1] = "SYNC",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
- [ilog2(0x8000) + 1] = "POPULATE",
- [ilog2(0x10000) + 1] = "NONBLOCK",
- [ilog2(0x20000) + 1] = "STACK",
- [ilog2(0x40000) + 1] = "HUGETLB",
- [ilog2(0x80000) + 1] = "SYNC",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fzqvzni9megaurmsp0k4vy27@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-26 15:49:00 +00:00
|
|
|
/* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
|
|
|
|
#define MAP_POPULATE 0x008000 /* populate (prefault) pagetables */
|
|
|
|
#define MAP_NONBLOCK 0x010000 /* do not block on IO */
|
|
|
|
#define MAP_STACK 0x020000 /* give out an address that is best suited for process/thread stacks */
|
|
|
|
#define MAP_HUGETLB 0x040000 /* create a huge page mapping */
|
|
|
|
#define MAP_SYNC 0x080000 /* perform synchronous page faults for the mapping */
|
2018-04-16 06:18:22 +00:00
|
|
|
#define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */
|
|
|
|
|
tools headers UAPI: Update tools's copy of mman.h headers
To pick up the changes from:
8aa3c927ec10 ("mm/mmap: move common defines to mman-common.h")
22fcea6f85f2 ("mm: move MAP_SYNC to asm-generic/mman-common.h")
0bf5f9492389 ("mm: fix the MAP_UNINITIALIZED flag")
To address the following perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman.h' differs from latest version at 'include/uapi/asm-generic/mman.h'
diff -u tools/include/uapi/asm-generic/mman.h include/uapi/asm-generic/mman.h
That ends up just moving a bit the auto-generated code->string tables:
$ tools/perf/trace/beauty/mmap_flags.sh > before
$ cp include/uapi/asm-generic/mman.h tools/include/uapi/asm-generic/mman.h
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/mmap_flags.sh > after
$ diff -u before after
--- before 2019-07-26 12:45:02.948335904 -0300
+++ after 2019-07-26 12:48:05.342893539 -0300
@@ -4,15 +4,15 @@
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
+ [ilog2(0x008000) + 1] = "POPULATE",
+ [ilog2(0x010000) + 1] = "NONBLOCK",
+ [ilog2(0x020000) + 1] = "STACK",
+ [ilog2(0x040000) + 1] = "HUGETLB",
+ [ilog2(0x080000) + 1] = "SYNC",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
- [ilog2(0x8000) + 1] = "POPULATE",
- [ilog2(0x10000) + 1] = "NONBLOCK",
- [ilog2(0x20000) + 1] = "STACK",
- [ilog2(0x40000) + 1] = "HUGETLB",
- [ilog2(0x80000) + 1] = "SYNC",
};
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fzqvzni9megaurmsp0k4vy27@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-26 15:49:00 +00:00
|
|
|
#define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be
|
|
|
|
* uninitialized */
|
|
|
|
|
2016-09-12 12:54:29 +00:00
|
|
|
/*
|
|
|
|
* Flags for mlock
|
|
|
|
*/
|
|
|
|
#define MLOCK_ONFAULT 0x01 /* Lock pages in range after they are faulted in, do not prefault */
|
|
|
|
|
|
|
|
#define MS_ASYNC 1 /* sync memory asynchronously */
|
|
|
|
#define MS_INVALIDATE 2 /* invalidate the caches */
|
|
|
|
#define MS_SYNC 4 /* synchronous memory sync */
|
|
|
|
|
|
|
|
#define MADV_NORMAL 0 /* no further special treatment */
|
|
|
|
#define MADV_RANDOM 1 /* expect random page references */
|
|
|
|
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
|
|
|
|
#define MADV_WILLNEED 3 /* will need these pages */
|
|
|
|
#define MADV_DONTNEED 4 /* don't need these pages */
|
|
|
|
|
|
|
|
/* common parameters: try to keep these consistent across architectures */
|
|
|
|
#define MADV_FREE 8 /* free pages only if memory pressure */
|
|
|
|
#define MADV_REMOVE 9 /* remove these pages & resources */
|
|
|
|
#define MADV_DONTFORK 10 /* don't inherit across fork */
|
|
|
|
#define MADV_DOFORK 11 /* do inherit across fork */
|
|
|
|
#define MADV_HWPOISON 100 /* poison a page for testing */
|
|
|
|
#define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */
|
|
|
|
|
|
|
|
#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
|
|
|
|
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
|
|
|
|
|
|
|
|
#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
|
|
|
|
#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
|
|
|
|
|
|
|
|
#define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
|
|
|
|
overrides the coredump filter bits */
|
|
|
|
#define MADV_DODUMP 17 /* Clear the MADV_DONTDUMP flag */
|
|
|
|
|
2017-09-13 07:38:23 +00:00
|
|
|
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
|
|
|
|
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
|
|
|
|
|
tools headers uapi: Sync asm-generic/mman-common.h with the kernel
To pick the changes from:
1a4e58cce84e ("mm: introduce MADV_PAGEOUT")
9c276cc65a58 ("mm: introduce MADV_COLD")
That result in these changes in the tools:
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ git diff
diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
index 63b1f506ea67..c160a5354eb6 100644
--- a/tools/include/uapi/asm-generic/mman-common.h
+++ b/tools/include/uapi/asm-generic/mman-common.h
@@ -67,6 +67,9 @@
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
+#define MADV_COLD 20 /* deactivate these pages */
+#define MADV_PAGEOUT 21 /* reclaim these pages */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2019-09-27 11:29:43.346320100 -0300
+++ after 2019-09-27 11:30:03.838570439 -0300
@@ -16,6 +16,8 @@
[17] = "DODUMP",
[18] = "WIPEONFORK",
[19] = "KEEPONFORK",
+ [20] = "COLD",
+ [21] = "PAGEOUT",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, it will be able to
translate from the number to a human readable string.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-27 14:30:30 +00:00
|
|
|
#define MADV_COLD 20 /* deactivate these pages */
|
|
|
|
#define MADV_PAGEOUT 21 /* reclaim these pages */
|
|
|
|
|
tools headers UAPI: Sync asm-generic/mman-common.h with the kernel
To pick the changes from:
Fixes: 4ca9b3859dac14bb ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
That result in these changes in the tools:
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2021-07-05 14:30:15.167212621 -0300
+++ after 2021-07-05 14:30:26.638462594 -0300
@@ -18,6 +18,8 @@
[19] = "KEEPONFORK",
[20] = "COLD",
[21] = "PAGEOUT",
+ [22] = "POPULATE_READ",
+ [23] = "POPULATE_WRITE",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, it will be able to
translate from the number to a human readable string.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: David Hildenbrand <david@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-27 14:30:30 +00:00
|
|
|
#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */
|
|
|
|
#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */
|
|
|
|
|
tools headers UAPI: Sync asm-generic/mman-common.h with the kernel
To pick the changes from:
9457056ac426e5ed ("mm: madvise: MADV_DONTNEED_LOCKED")
That result in these changes in the tools:
$ diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
--- tools/include/uapi/asm-generic/mman-common.h 2022-03-29 16:17:50.461694991 -0300
+++ include/uapi/asm-generic/mman-common.h 2022-03-27 19:12:48.923250468 -0300
@@ -75,6 +75,8 @@
#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */
+#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2022-03-29 16:18:04.091044244 -0300
+++ after 2022-03-29 16:18:11.692238906 -0300
@@ -20,6 +20,7 @@
[21] = "PAGEOUT",
[22] = "POPULATE_READ",
[23] = "POPULATE_WRITE",
+ [24] = "DONTNEED_LOCKED",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, 'perf trace' will
be able to translate from the number to a human readable string and to
use the strings in tracepoint filter expressions.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkNcUfeh795yqGMV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-27 14:30:30 +00:00
|
|
|
#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */
|
|
|
|
|
mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse
This idea was introduced by David Rientjes[1].
Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request
a synchronous collapse of memory at their own expense.
The benefits of this approach are:
* CPU is charged to the process that wants to spend the cycles for the
THP
* Avoid unpredictable timing of khugepaged collapse
Semantics
This call is independent of the system-wide THP sysfs settings, but will
fail for memory marked VM_NOHUGEPAGE. If the ranges provided span
multiple VMAs, the semantics of the collapse over each VMA is independent
from the others. This implies a hugepage cannot cross a VMA boundary. If
collapse of a given hugepage-aligned/sized region fails, the operation may
continue to attempt collapsing the remainder of memory specified.
The memory ranges provided must be page-aligned, but are not required to
be hugepage-aligned. If the memory ranges are not hugepage-aligned, the
start/end of the range will be clamped to the first/last hugepage-aligned
address covered by said range. The memory ranges must span at least one
hugepage-sized region.
All non-resident pages covered by the range will first be
swapped/faulted-in, before being internally copied onto a freshly
allocated hugepage. Unmapped pages will have their data directly
initialized to 0 in the new hugepage. However, for every eligible
hugepage aligned/sized region to-be collapsed, at least one page must
currently be backed by memory (a PMD covering the address range must
already exist).
Allocation for the new hugepage may enter direct reclaim and/or
compaction, regardless of VMA flags. When the system has multiple NUMA
nodes, the hugepage will be allocated from the node providing the most
native pages. This operation operates on the current state of the
specified process and makes no persistent changes or guarantees on how
pages will be mapped, constructed, or faulted in the future
Return Value
If all hugepage-sized/aligned regions covered by the provided range were
either successfully collapsed, or were already PMD-mapped THPs, this
operation will be deemed successful. On success, process_madvise(2)
returns the number of bytes advised, and madvise(2) returns 0. Else, -1
is returned and errno is set to indicate the error for the most-recently
attempted hugepage collapse. Note that many failures might have occurred,
since the operation may continue to collapse in the event a single
hugepage-sized/aligned region fails.
ENOMEM Memory allocation failed or VMA not found
EBUSY Memcg charging failed
EAGAIN Required resource temporarily unavailable. Try again
might succeed.
EINVAL Other error: No PMD found, subpage doesn't have Present
bit set, "Special" page no backed by struct page, VMA
incorrectly sized, address not page-aligned, ...
Most notable here is ENOMEM and EBUSY (new to madvise) which are intended
to provide the caller with actionable feedback so they may take an
appropriate fallback measure.
Use Cases
An immediate user of this new functionality are malloc() implementations
that manage memory in hugepage-sized chunks, but sometimes subrelease
memory back to the system in native-sized chunks via MADV_DONTNEED;
zapping the pmd. Later, when the memory is hot, the implementation could
madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage
coverage and dTLB performance. TCMalloc is such an implementation that
could benefit from this[2].
Only privately-mapped anon memory is supported for now, but additional
support for file, shmem, and HugeTLB high-granularity mappings[2] is
expected. File and tmpfs/shmem support would permit:
* Backing executable text by THPs. Current support provided by
CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which
might impair services from serving at their full rated load after
(re)starting. Tricks like mremap(2)'ing text onto anonymous memory to
immediately realize iTLB performance prevents page sharing and demand
paging, both of which increase steady state memory footprint. With
MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance
and lower RAM footprints.
* Backing guest memory by hugapages after the memory contents have been
migrated in native-page-sized chunks to a new host, in a
userfaultfd-based live-migration stack.
[1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/
[2] https://github.com/google/tcmalloc/tree/master/tcmalloc
[jrdr.linux@gmail.com: avoid possible memory leak in failure path]
Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com
[zokeefe@google.com add missing kfree() to madvise_collapse()]
Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/
Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com
[zokeefe@google.com: delay computation of hpage boundaries until use]]
Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-06 23:59:27 +00:00
|
|
|
#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */
|
|
|
|
|
2016-09-12 12:54:29 +00:00
|
|
|
/* compatibility flags */
|
|
|
|
#define MAP_FILE 0
|
|
|
|
|
2016-10-25 20:02:11 +00:00
|
|
|
#define PKEY_DISABLE_ACCESS 0x1
|
|
|
|
#define PKEY_DISABLE_WRITE 0x2
|
|
|
|
#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
|
|
|
|
PKEY_DISABLE_WRITE)
|
|
|
|
|
2016-09-12 12:54:29 +00:00
|
|
|
#endif /* __ASM_GENERIC_MMAN_COMMON_H */
|