- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe

Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
 
 - Some kmemleak fixes from Patrick Wang and Waiman Long
 
 - DAMON updates from SeongJae Park
 
 - memcg debug/visibility work from Roman Gushchin
 
 - vmalloc speedup from Uladzislau Rezki
 
 - more folio conversion work from Matthew Wilcox
 
 - enhancements for coherent device memory mapping from Alex Sierra
 
 - addition of shared pages tracking and CoW support for fsdax, from
   Shiyang Ruan
 
 - hugetlb optimizations from Mike Kravetz
 
 - Mel Gorman has contributed some pagealloc changes to improve latency
   and realtime behaviour.
 
 - mprotect soft-dirty checking has been improved by Peter Xu
 
 - Many other singleton patches all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
 jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
 SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
 =w/UH
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Most of the MM queue. A few things are still pending.

  Liam's maple tree rework didn't make it. This has resulted in a few
  other minor patch series being held over for next time.

  Multi-gen LRU still isn't merged as we were waiting for mapletree to
  stabilize. The current plan is to merge MGLRU into -mm soon and to
  later reintroduce mapletree, with a view to hopefully getting both
  into 6.1-rc1.

  Summary:

   - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
     Lin, Yang Shi, Anshuman Khandual and Mike Rapoport

   - Some kmemleak fixes from Patrick Wang and Waiman Long

   - DAMON updates from SeongJae Park

   - memcg debug/visibility work from Roman Gushchin

   - vmalloc speedup from Uladzislau Rezki

   - more folio conversion work from Matthew Wilcox

   - enhancements for coherent device memory mapping from Alex Sierra

   - addition of shared pages tracking and CoW support for fsdax, from
     Shiyang Ruan

   - hugetlb optimizations from Mike Kravetz

   - Mel Gorman has contributed some pagealloc changes to improve
     latency and realtime behaviour.

   - mprotect soft-dirty checking has been improved by Peter Xu

   - Many other singleton patches all over the place"

 [ XFS merge from hell as per Darrick Wong in

   https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]

* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
  tools/testing/selftests/vm/hmm-tests.c: fix build
  mm: Kconfig: fix typo
  mm: memory-failure: convert to pr_fmt()
  mm: use is_zone_movable_page() helper
  hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
  hugetlbfs: cleanup some comments in inode.c
  hugetlbfs: remove unneeded header file
  hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
  hugetlbfs: use helper macro SZ_1{K,M}
  mm: cleanup is_highmem()
  mm/hmm: add a test for cross device private faults
  selftests: add soft-dirty into run_vmtests.sh
  selftests: soft-dirty: add test for mprotect
  mm/mprotect: fix soft-dirty check in can_change_pte_writable()
  mm: memcontrol: fix potential oom_lock recursion deadlock
  mm/gup.c: fix formatting in check_and_migrate_movable_page()
  xfs: fail dax mount if reflink is enabled on a partition
  mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
  userfaultfd: don't fail on unrecognized features
  hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
  ...
This commit is contained in:
Linus Torvalds 2022-08-05 16:32:45 -07:00
commit 6614a3c316
380 changed files with 7173 additions and 3224 deletions

View File

@ -22,6 +22,7 @@ Description:
MMUPageSize: 4 kB
Rss: 884 kB
Pss: 385 kB
Pss_Dirty: 68 kB
Pss_Anon: 301 kB
Pss_File: 80 kB
Pss_Shmem: 4 kB

View File

@ -41,7 +41,7 @@ Description: Kernel Samepage Merging daemon sysfs interface
sleep_millisecs: how many milliseconds ksm should sleep between
scans.
See Documentation/vm/ksm.rst for more information.
See Documentation/mm/ksm.rst for more information.
What: /sys/kernel/mm/ksm/merge_across_nodes
Date: January 2013

View File

@ -37,7 +37,7 @@ Description:
The alloc_calls file is read-only and lists the kernel code
locations from which allocations for this cache were performed.
The alloc_calls file only contains information if debugging is
enabled for that cache (see Documentation/vm/slub.rst).
enabled for that cache (see Documentation/mm/slub.rst).
What: /sys/kernel/slab/<cache>/alloc_fastpath
Date: February 2008
@ -219,7 +219,7 @@ Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
Description:
The free_calls file is read-only and lists the locations of
object frees if slab debugging is enabled (see
Documentation/vm/slub.rst).
Documentation/mm/slub.rst).
What: /sys/kernel/slab/<cache>/free_fastpath
Date: February 2008

View File

@ -1237,6 +1237,13 @@ PAGE_SIZE multiple when read back.
the target cgroup. If less bytes are reclaimed than the
specified amount, -EAGAIN is returned.
Please note that the proactive reclaim (triggered by this
interface) is not meant to indicate memory pressure on the
memory cgroup. Therefore socket memory balancing triggered by
the memory reclaim normally is not exercised in this case.
This means that the networking layer will not adapt based on
reclaim induced by memory.reclaim.
memory.peak
A read-only single value file which exists on non-root
cgroups.
@ -1441,6 +1448,24 @@ PAGE_SIZE multiple when read back.
workingset_nodereclaim
Number of times a shadow node has been reclaimed
pgscan (npn)
Amount of scanned pages (in an inactive LRU list)
pgsteal (npn)
Amount of reclaimed pages
pgscan_kswapd (npn)
Amount of scanned pages by kswapd (in an inactive LRU list)
pgscan_direct (npn)
Amount of scanned pages directly (in an inactive LRU list)
pgsteal_kswapd (npn)
Amount of reclaimed pages by kswapd
pgsteal_direct (npn)
Amount of reclaimed pages directly
pgfault (npn)
Total number of page faults incurred
@ -1450,12 +1475,6 @@ PAGE_SIZE multiple when read back.
pgrefill (npn)
Amount of scanned pages (in an active LRU list)
pgscan (npn)
Amount of scanned pages (in an inactive LRU list)
pgsteal (npn)
Amount of reclaimed pages
pgactivate (npn)
Amount of pages moved to the active LRU list

View File

@ -1728,9 +1728,11 @@
Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
the default is on.
This is not compatible with memory_hotplug.memmap_on_memory.
If both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.
Note that the vmemmap pages may be allocated from the added
memory block itself when memory_hotplug.memmap_on_memory is
enabled, those vmemmap pages cannot be optimized even if this
feature is enabled. Other vmemmap pages not allocated from
the added memory block itself do not be affected.
hung_task_panic=
[KNL] Should the hung task detector generate panics.
@ -3073,10 +3075,12 @@
[KNL,X86,ARM] Boolean flag to enable this feature.
Format: {on | off (default)}
When enabled, runtime hotplugged memory will
allocate its internal metadata (struct pages)
from the hotadded memory which will allow to
hotadd a lot of memory without requiring
additional memory to do so.
allocate its internal metadata (struct pages,
those vmemmap pages cannot be optimized even
if hugetlb_free_vmemmap is enabled) from the
hotadded memory which will allow to hotadd a
lot of memory without requiring additional
memory to do so.
This feature is disabled by default because it
has some implication on large (e.g. GB)
allocations in some configurations (e.g. small
@ -3086,10 +3090,6 @@
Note that even when enabled, there are a few cases where
the feature is not effective.
This is not compatible with hugetlb_free_vmemmap. If
both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.
memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
Format: <integer>
default : 0 <disable>
@ -5502,7 +5502,7 @@
cache (risks via metadata attacks are mostly
unchanged). Debug options disable merging on their
own.
For more information see Documentation/vm/slub.rst.
For more information see Documentation/mm/slub.rst.
slab_max_order= [MM, SLAB]
Determines the maximum allowed order for slabs.
@ -5516,13 +5516,13 @@
slub_debug can create guard zones around objects and
may poison objects when not in use. Also tracks the
last alloc / free. For more information see
Documentation/vm/slub.rst.
Documentation/mm/slub.rst.
slub_max_order= [MM, SLUB]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. For more information see
Documentation/vm/slub.rst.
Documentation/mm/slub.rst.
slub_min_objects= [MM, SLUB]
The minimum number of objects per slab. SLUB will
@ -5531,12 +5531,12 @@
the number of objects indicated. The higher the number
of objects the smaller the overhead of tracking slabs
and the less frequently locks need to be acquired.
For more information see Documentation/vm/slub.rst.
For more information see Documentation/mm/slub.rst.
slub_min_order= [MM, SLUB]
Determines the minimum page order for slabs. Must be
lower than slub_max_order.
For more information see Documentation/vm/slub.rst.
For more information see Documentation/mm/slub.rst.
slub_merge [MM, SLUB]
Same with slab_merge.

View File

@ -125,7 +125,7 @@ processor. Each bank is referred to as a `node` and for each node Linux
constructs an independent memory management subsystem. A node has its
own set of zones, lists of free and used pages and various statistics
counters. You can find more details about NUMA in
:ref:`Documentation/vm/numa.rst <numa>` and in
:ref:`Documentation/mm/numa.rst <numa>` and in
:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`.
Page cache

View File

@ -4,7 +4,7 @@
Monitoring Data Accesses
========================
:doc:`DAMON </vm/damon/index>` allows light-weight data access monitoring.
:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring.
Using DAMON, users can analyze the memory access patterns of their systems and
optimize those.
@ -14,3 +14,4 @@ optimize those.
start
usage
reclaim
lru_sort

View File

@ -0,0 +1,294 @@
.. SPDX-License-Identifier: GPL-2.0
=============================
DAMON-based LRU-lists Sorting
=============================
DAMON-based LRU-lists Sorting (DAMON_LRU_SORT) is a static kernel module that
aimed to be used for proactive and lightweight data access pattern based
(de)prioritization of pages on their LRU-lists for making LRU-lists a more
trusworthy data access pattern source.
Where Proactive LRU-lists Sorting is Required?
==============================================
As page-granularity access checking overhead could be significant on huge
systems, LRU lists are normally not proactively sorted but partially and
reactively sorted for special events including specific user requests, system
calls and memory pressure. As a result, LRU lists are sometimes not so
perfectly prepared to be used as a trustworthy access pattern source for some
situations including reclamation target pages selection under sudden memory
pressure.
Because DAMON can identify access patterns of best-effort accuracy while
inducing only user-specified range of overhead, proactively running
DAMON_LRU_SORT could be helpful for making LRU lists more trustworthy access
pattern source with low and controlled overhead.
How It Works?
=============
DAMON_LRU_SORT finds hot pages (pages of memory regions that showing access
rates that higher than a user-specified threshold) and cold pages (pages of
memory regions that showing no access for a time that longer than a
user-specified threshold) using DAMON, and prioritizes hot pages while
deprioritizing cold pages on their LRU-lists. To avoid it consuming too much
CPU for the prioritizations, a CPU time usage limit can be configured. Under
the limit, it prioritizes and deprioritizes more hot and cold pages first,
respectively. System administrators can also configure under what situation
this scheme should automatically activated and deactivated with three memory
pressure watermarks.
Its default parameters for hotness/coldness thresholds and CPU quota limit are
conservatively chosen. That is, the module under its default parameters could
be widely used without harm for common situations while providing a level of
benefits for systems having clear hot/cold access patterns under memory
pressure while consuming only a limited small portion of CPU time.
Interface: Module Parameters
============================
To use this feature, you should first ensure your system is running on a kernel
that is built with ``CONFIG_DAMON_LRU_SORT=y``.
To let sysadmins enable or disable it and tune for the given system,
DAMON_LRU_SORT utilizes module parameters. That is, you can put
``damon_lru_sort.<parameter>=<value>`` on the kernel boot command line or write
proper values to ``/sys/modules/damon_lru_sort/parameters/<parameter>`` files.
Below are the description of each parameter.
enabled
-------
Enable or disable DAMON_LRU_SORT.
You can enable DAMON_LRU_SORT by setting the value of this parameter as ``Y``.
Setting it as ``N`` disables DAMON_LRU_SORT. Note that DAMON_LRU_SORT could do
no real monitoring and LRU-lists sorting due to the watermarks-based activation
condition. Refer to below descriptions for the watermarks parameter for this.
commit_inputs
-------------
Make DAMON_LRU_SORT reads the input parameters again, except ``enabled``.
Input parameters that updated while DAMON_LRU_SORT is running are not applied
by default. Once this parameter is set as ``Y``, DAMON_LRU_SORT reads values
of parametrs except ``enabled`` again. Once the re-reading is done, this
parameter is set as ``N``. If invalid parameters are found while the
re-reading, DAMON_LRU_SORT will be disabled.
hot_thres_access_freq
---------------------
Access frequency threshold for hot memory regions identification in permil.
If a memory region is accessed in frequency of this or higher, DAMON_LRU_SORT
identifies the region as hot, and mark it as accessed on the LRU list, so that
it could not be reclaimed under memory pressure. 50% by default.
cold_min_age
------------
Time threshold for cold memory regions identification in microseconds.
If a memory region is not accessed for this or longer time, DAMON_LRU_SORT
identifies the region as cold, and mark it as unaccessed on the LRU list, so
that it could be reclaimed first under memory pressure. 120 seconds by
default.
quota_ms
--------
Limit of time for trying the LRU lists sorting in milliseconds.
DAMON_LRU_SORT tries to use only up to this time within a time window
(quota_reset_interval_ms) for trying LRU lists sorting. This can be used
for limiting CPU consumption of DAMON_LRU_SORT. If the value is zero, the
limit is disabled.
10 ms by default.
quota_reset_interval_ms
-----------------------
The time quota charge reset interval in milliseconds.
The charge reset interval for the quota of time (quota_ms). That is,
DAMON_LRU_SORT does not try LRU-lists sorting for more than quota_ms
milliseconds or quota_sz bytes within quota_reset_interval_ms milliseconds.
1 second by default.
wmarks_interval
---------------
The watermarks check time interval in microseconds.
Minimal time to wait before checking the watermarks, when DAMON_LRU_SORT is
enabled but inactive due to its watermarks rule. 5 seconds by default.
wmarks_high
-----------
Free memory rate (per thousand) for the high watermark.
If free memory of the system in bytes per thousand bytes is higher than this,
DAMON_LRU_SORT becomes inactive, so it does nothing but periodically checks the
watermarks. 200 (20%) by default.
wmarks_mid
----------
Free memory rate (per thousand) for the middle watermark.
If free memory of the system in bytes per thousand bytes is between this and
the low watermark, DAMON_LRU_SORT becomes active, so starts the monitoring and
the LRU-lists sorting. 150 (15%) by default.
wmarks_low
----------
Free memory rate (per thousand) for the low watermark.
If free memory of the system in bytes per thousand bytes is lower than this,
DAMON_LRU_SORT becomes inactive, so it does nothing but periodically checks the
watermarks. 50 (5%) by default.
sample_interval
---------------
Sampling interval for the monitoring in microseconds.
The sampling interval of DAMON for the cold memory monitoring. Please refer to
the DAMON documentation (:doc:`usage`) for more detail. 5ms by default.
aggr_interval
-------------
Aggregation interval for the monitoring in microseconds.
The aggregation interval of DAMON for the cold memory monitoring. Please
refer to the DAMON documentation (:doc:`usage`) for more detail. 100ms by
default.
min_nr_regions
--------------
Minimum number of monitoring regions.
The minimal number of monitoring regions of DAMON for the cold memory
monitoring. This can be used to set lower-bound of the monitoring quality.
But, setting this too high could result in increased monitoring overhead.
Please refer to the DAMON documentation (:doc:`usage`) for more detail. 10 by
default.
max_nr_regions
--------------
Maximum number of monitoring regions.
The maximum number of monitoring regions of DAMON for the cold memory
monitoring. This can be used to set upper-bound of the monitoring overhead.
However, setting this too low could result in bad monitoring quality. Please
refer to the DAMON documentation (:doc:`usage`) for more detail. 1000 by
defaults.
monitor_region_start
--------------------
Start of target memory region in physical address.
The start physical address of memory region that DAMON_LRU_SORT will do work
against. By default, biggest System RAM is used as the region.
monitor_region_end
------------------
End of target memory region in physical address.
The end physical address of memory region that DAMON_LRU_SORT will do work
against. By default, biggest System RAM is used as the region.
kdamond_pid
-----------
PID of the DAMON thread.
If DAMON_LRU_SORT is enabled, this becomes the PID of the worker thread. Else,
-1.
nr_lru_sort_tried_hot_regions
-----------------------------
Number of hot memory regions that tried to be LRU-sorted.
bytes_lru_sort_tried_hot_regions
--------------------------------
Total bytes of hot memory regions that tried to be LRU-sorted.
nr_lru_sorted_hot_regions
-------------------------
Number of hot memory regions that successfully be LRU-sorted.
bytes_lru_sorted_hot_regions
----------------------------
Total bytes of hot memory regions that successfully be LRU-sorted.
nr_hot_quota_exceeds
--------------------
Number of times that the time quota limit for hot regions have exceeded.
nr_lru_sort_tried_cold_regions
------------------------------
Number of cold memory regions that tried to be LRU-sorted.
bytes_lru_sort_tried_cold_regions
---------------------------------
Total bytes of cold memory regions that tried to be LRU-sorted.
nr_lru_sorted_cold_regions
--------------------------
Number of cold memory regions that successfully be LRU-sorted.
bytes_lru_sorted_cold_regions
-----------------------------
Total bytes of cold memory regions that successfully be LRU-sorted.
nr_cold_quota_exceeds
---------------------
Number of times that the time quota limit for cold regions have exceeded.
Example
=======
Below runtime example commands make DAMON_LRU_SORT to find memory regions
having >=50% access frequency and LRU-prioritize while LRU-deprioritizing
memory regions that not accessed for 120 seconds. The prioritization and
deprioritization is limited to be done using only up to 1% CPU time to avoid
DAMON_LRU_SORT consuming too much CPU time for the (de)prioritization. It also
asks DAMON_LRU_SORT to do nothing if the system's free memory rate is more than
50%, but start the real works if it becomes lower than 40%. If DAMON_RECLAIM
doesn't make progress and therefore the free memory rate becomes lower than
20%, it asks DAMON_LRU_SORT to do nothing again, so that we can fall back to
the LRU-list based page granularity reclamation. ::
# cd /sys/modules/damon_lru_sort/parameters
# echo 500 > hot_thres_access_freq
# echo 120000000 > cold_min_age
# echo 10 > quota_ms
# echo 1000 > quota_reset_interval_ms
# echo 500 > wmarks_high
# echo 400 > wmarks_mid
# echo 200 > wmarks_low
# echo Y > enabled

View File

@ -48,12 +48,6 @@ DAMON_RECLAIM utilizes module parameters. That is, you can put
``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write
proper values to ``/sys/modules/damon_reclaim/parameters/<parameter>`` files.
Note that the parameter values except ``enabled`` are applied only when
DAMON_RECLAIM starts. Therefore, if you want to apply new parameter values in
runtime and DAMON_RECLAIM is already enabled, you should disable and re-enable
it via ``enabled`` parameter file. Writing of the new values to proper
parameter values should be done before the re-enablement.
Below are the description of each parameter.
enabled
@ -268,4 +262,4 @@ granularity reclamation. ::
.. [1] https://research.google/pubs/pub48551/
.. [2] https://lwn.net/Articles/787611/
.. [3] https://www.kernel.org/doc/html/latest/vm/free_page_reporting.html
.. [3] https://www.kernel.org/doc/html/latest/mm/free_page_reporting.html

View File

@ -30,11 +30,11 @@ DAMON provides below interfaces for different users.
<sysfs_interface>`. This will be removed after next LTS kernel is released,
so users should move to the :ref:`sysfs interface <sysfs_interface>`.
- *Kernel Space Programming Interface.*
:doc:`This </vm/damon/api>` is for kernel space programmers. Using this,
:doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by
writing kernel space DAMON application programs for you. You can even extend
DAMON for various address spaces. For detail, please refer to the interface
:doc:`document </vm/damon/api>`.
:doc:`document </mm/damon/api>`.
.. _sysfs_interface:
@ -185,7 +185,7 @@ controls the monitoring overhead, exist. You can set and get the values by
writing to and rading from the files.
For more details about the intervals and monitoring regions range, please refer
to the Design document (:doc:`/vm/damon/design`).
to the Design document (:doc:`/mm/damon/design`).
contexts/<N>/targets/
---------------------
@ -264,6 +264,8 @@ that can be written to and read from the file and their meaning are as below.
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
- ``lru_prio``: Prioritize the region on its LRU lists.
- ``lru_deprio``: Deprioritize the region on its LRU lists.
- ``stat``: Do nothing but count the statistics
schemes/<N>/access_pattern/
@ -402,7 +404,7 @@ Attributes
Users can get and set the ``sampling interval``, ``aggregation interval``,
``update interval``, and min/max number of monitoring target regions by
reading from and writing to the ``attrs`` file. To know about the monitoring
attributes in detail, please refer to the :doc:`/vm/damon/design`. For
attributes in detail, please refer to the :doc:`/mm/damon/design`. For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
1000, and then check it again::

View File

@ -36,6 +36,7 @@ the Linux memory management.
numa_memory_policy
numaperf
pagemap
shrinker_debugfs
soft-dirty
swap_numa
transhuge

View File

@ -0,0 +1,135 @@
.. _shrinker_debugfs:
==========================
Shrinker Debugfs Interface
==========================
Shrinker debugfs interface provides a visibility into the kernel memory
shrinkers subsystem and allows to get information about individual shrinkers
and interact with them.
For each shrinker registered in the system a directory in **<debugfs>/shrinker/**
is created. The directory's name is composed from the shrinker's name and an
unique id: e.g. *kfree_rcu-0* or *sb-xfs:vda1-36*.
Each shrinker directory contains **count** and **scan** files, which allow to
trigger *count_objects()* and *scan_objects()* callbacks for each memcg and
numa node (if applicable).
Usage:
------
1. *List registered shrinkers*
::
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
2. *Get information about a specific shrinker*
::
$ cd sb-btrfs\:vda2-24/
$ ls
count scan
3. *Count objects*
Each line in the output has the following format::
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1> ...
<cgroup inode id> <nr of objects on node 0> <nr of objects on node 1> ...
...
If there are no objects on all numa nodes, a line is omitted. If there
are no objects at all, the output might be empty.
If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed
as cgroup inode id. If the shrinker is not numa-aware, 0's are printed
for all nodes except the first one.
::
$ cat count
1 224 2
21 98 0
55 818 10
2367 2 0
2401 30 0
225 13 0
599 35 0
939 124 0
1041 3 0
1075 1 0
1109 1 0
1279 60 0
1313 7 0
1347 39 0
1381 3 0
1449 14 0
1483 63 0
1517 53 0
1551 6 0
1585 1 0
1619 6 0
1653 40 0
1687 11 0
1721 8 0
1755 4 0
1789 52 0
1823 888 0
1857 1 0
1925 2 0
1959 32 0
2027 22 0
2061 9 0
2469 799 0
2537 861 0
2639 1 0
2707 70 0
2775 4 0
2877 84 0
293 1 0
735 8 0
4. *Scan objects*
The expected input format::
<cgroup inode id> <numa id> <number of objects to scan>
For a non-memcg-aware shrinker or on a system with no memory
cgrups **0** should be passed as cgroup id.
::
$ cd /sys/kernel/debug/shrinker/
$ cd sb-btrfs\:vda2-24/
$ cat count | head -n 5
1 212 0
21 97 0
55 802 5
2367 2 0
225 13 0
$ echo "55 0 200" > scan
$ cat count | head -n 5
1 212 0
21 96 0
55 752 5
2367 2 0
225 13 0

View File

@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
hugetlb_optimize_vmemmap
========================
This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter)
is configured or the size of 'struct page' (a structure defined in
include/linux/mm_types.h) is not power of two (an unusual system config could
This knob is not available when the size of 'struct page' (a structure defined
in include/linux/mm_types.h) is not power of two (an unusual system config could
result in this).
Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages
@ -760,7 +759,7 @@ and don't use much of it.
The default value is 0.
See Documentation/vm/overcommit-accounting.rst and
See Documentation/mm/overcommit-accounting.rst and
mm/util.c::__vm_enough_memory() for more information.

View File

@ -86,7 +86,7 @@ Memory management
=================
How to allocate and use memory in the kernel. Note that there is a lot
more memory-management documentation in Documentation/vm/index.rst.
more memory-management documentation in Documentation/mm/index.rst.
.. toctree::
:maxdepth: 1

View File

@ -174,7 +174,6 @@ mapping:
- ``kmemleak_alloc_phys``
- ``kmemleak_free_part_phys``
- ``kmemleak_not_leak_phys``
- ``kmemleak_ignore_phys``
Dealing with false positives/negatives

View File

@ -448,6 +448,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
MMUPageSize: 4 kB
Rss: 892 kB
Pss: 374 kB
Pss_Dirty: 0 kB
Shared_Clean: 892 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
@ -479,7 +480,9 @@ dirty shared and private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has
in memory, where each page is divided by the number of processes sharing it.
So if a process has 1000 pages all to itself, and 1000 shared with one other
process, its PSS will be 1500.
process, its PSS will be 1500. "Pss_Dirty" is the portion of PSS which
consists of dirty pages. ("Pss_Clean" is not included, but it can be
calculated by subtracting "Pss_Dirty" from "Pss".)
Note that even a page which is part of a MAP_SHARED mapping, but has only
a single pte mapped, i.e. is currently used by only one process, is accounted
@ -514,8 +517,10 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
does not take into account swapped out page of underlying shmem objects.
"Locked" indicates whether the mapping is locked in memory or not.
"THPeligible" indicates whether the mapping is eligible for allocating THP
pages - 1 if true, 0 otherwise. It just shows the current status.
pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
It just shows the current status.
"VmFlags" field deserves a separate description. This member represents the
kernel flags associated with the particular virtual memory area in two letter
@ -1109,7 +1114,7 @@ CommitLimit
yield a CommitLimit of 7.3G.
For more details, see the memory overcommit documentation
in vm/overcommit-accounting.
in mm/overcommit-accounting.
Committed_AS
The amount of memory presently allocated on the system.
The committed memory is a sum of all of the memory which

View File

@ -128,7 +128,7 @@ needed).
sound/index
crypto/index
filesystems/index
vm/index
mm/index
bpf/index
usb/index
PCI/index

View File

@ -170,7 +170,7 @@ The users of `ZONE_DEVICE` are:
* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/vm/hmm.rst.
Documentation/mm/hmm.rst.
* p2pdma: Create `struct page` objects to allow peer devices in a
PCI/-E topology to coordinate direct-DMA operations between themselves,

View File

@ -13,7 +13,7 @@
监测数据访问
============
:doc:`DAMON </vm/damon/index>` 允许轻量级的数据访问监测。使用DAMON
:doc:`DAMON </mm/damon/index>` 允许轻量级的数据访问监测。使用DAMON
用户可以分析他们系统的内存访问模式,并优化它们。
.. toctree::

View File

@ -229,4 +229,4 @@ DAMON_RECLAIM再次什么都不做这样我们就可以退回到基于LRU列
.. [1] https://research.google/pubs/pub48551/
.. [2] https://lwn.net/Articles/787611/
.. [3] https://www.kernel.org/doc/html/latest/vm/free_page_reporting.html
.. [3] https://www.kernel.org/doc/html/latest/mm/free_page_reporting.html

View File

@ -33,9 +33,9 @@ DAMON 为不同的用户提供了下面这些接口。
口相同。这将在下一个LTS内核发布后被移除所以用户应该转移到
:ref:`sysfs interface <sysfs_interface>`
- *内核空间编程接口。*
:doc:`这 </vm/damon/api>` 这是为内核空间程序员准备的。使用它,用户可以通过为你编写内
:doc:`这 </mm/damon/api>` 这是为内核空间程序员准备的。使用它,用户可以通过为你编写内
核空间的DAMON应用程序最灵活有效地利用DAMON的每一个功能。你甚至可以为各种地址空间扩展DAMON。
详细情况请参考接口 :doc:`文件 </vm/damon/api>`。
详细情况请参考接口 :doc:`文件 </mm/damon/api>`。
sysfs接口
=========
@ -148,7 +148,7 @@ contexts/<N>/monitoring_attrs/
``nr_regions`` 目录下有两个文件分别用于DAMON监测区域的下限和上限``min````max``
这两个文件控制着监测的开销。你可以通过向这些文件的写入和读出来设置和获取这些值。
关于间隔和监测区域范围的更多细节,请参考设计文件 (:doc:`/vm/damon/design`)。
关于间隔和监测区域范围的更多细节,请参考设计文件 (:doc:`/mm/damon/design`)。
contexts/<N>/targets/
---------------------
@ -320,7 +320,7 @@ DAMON导出了八个文件, ``attrs``, ``target_ids``, ``init_regions``,
----
用户可以通过读取和写入 ``attrs`` 文件获得和设置 ``采样间隔````聚集间隔````更新间隔``
以及监测目标区域的最小/最大数量。要详细了解监测属性,请参考 `:doc:/vm/damon/design` 。例如,
以及监测目标区域的最小/最大数量。要详细了解监测属性,请参考 `:doc:/mm/damon/design` 。例如,
下面的命令将这些值设置为5ms、100ms、1000ms、10和1000然后再次检查::
# cd <debugfs>/damon

View File

@ -101,7 +101,7 @@ Todolist:
========
如何在内核中分配和使用内存。请注意,在
:doc:`/vm/index` 中有更多的内存管理文档。
:doc:`/mm/index` 中有更多的内存管理文档。
.. toctree::
:maxdepth: 1

View File

@ -118,7 +118,7 @@ TODOList:
sound/index
filesystems/index
scheduler/index
vm/index
mm/index
peci/index
TODOList:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/active_mm.rst
:Original: Documentation/mm/active_mm.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/balance.rst
:Original: Documentation/mm/balance.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/damon/api.rst
:Original: Documentation/mm/damon/api.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/damon/design.rst
:Original: Documentation/mm/damon/design.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/damon/faq.rst
:Original: Documentation/mm/damon/faq.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/damon/index.rst
:Original: Documentation/mm/damon/index.rst
:翻译:
@ -14,7 +14,7 @@ DAMON:数据访问监视器
==========================
DAMON是Linux内核的一个数据访问监控框架子系统。DAMON的核心机制使其成为
(该核心机制详见(Documentation/translations/zh_CN/vm/damon/design.rst)
(该核心机制详见(Documentation/translations/zh_CN/mm/damon/design.rst)
- *准确度* 监测输出对DRAM级别的内存管理足够有用但可能不适合CPU Cache级别
- *轻量级* (监控开销低到可以在线应用),以及
@ -30,4 +30,3 @@ DAMON是Linux内核的一个数据访问监控框架子系统。DAMON的核心
faq
design
api

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/free_page_reporting.rst
:Original: Documentation/mm/free_page_reporting.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/free_page_reporting.rst
:Original: Documentation/mm/frontswap.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/highmem.rst
:Original: Documentation/mm/highmem.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/hmm.rst
:Original: Documentation/mm/hmm.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/hugetlbfs_reserv.rst
:Original: Documentation/mm/hugetlbfs_reserv.rst
:翻译:

View File

@ -1,5 +1,5 @@
:Original: Documentation/vm/hwpoison.rst
:Original: Documentation/mm/hwpoison.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/index.rst
:Original: Documentation/mm/index.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/ksm.rst
:Original: Documentation/mm/ksm.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/memory-model.rst
:Original: Documentation/mm/memory-model.rst
:翻译:
@ -129,7 +129,7 @@ ZONE_DEVICE
* pmem: 通过DAX映射将平台持久性内存作为直接I/O目标使用。
* hmm: 用 `->page_fault()``->page_free()` 事件回调扩展 `ZONE_DEVICE`
以允许设备驱动程序协调与设备内存相关的内存管理事件通常是GPU内存。参见/vm/hmm.rst。
以允许设备驱动程序协调与设备内存相关的内存管理事件通常是GPU内存。参见Documentation/mm/hmm.rst。
* p2pdma: 创建 `struct page` 对象允许PCI/E拓扑结构中的peer设备协调它们之间的
直接DMA操作即绕过主机内存。

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/mmu_notifier.rst
:Original: Documentation/mm/mmu_notifier.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/numa.rst
:Original: Documentation/mm/numa.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/overcommit-accounting.rst
:Original: Documentation/mm/overcommit-accounting.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/page_frags.rst
:Original: Documentation/mm/page_frags.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/index.rst
:Original: Documentation/mm/page_migration.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/page_owner.rst
:Original: Documentation/mm/page_owner.rst
:翻译:

View File

@ -1,6 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
:Original: Documentation/vm/page_table_check.rst
:Original: Documentation/mm/page_table_check.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/remap_file_pages.rst
:Original: Documentation/mm/remap_file_pages.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/split_page_table_lock.rst
:Original: Documentation/mm/split_page_table_lock.rst
:翻译:

View File

@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: ../disclaimer-zh_CN.rst
:Original: Documentation/vm/vmalloced-kernel-stacks.rst
:Original: Documentation/mm/vmalloced-kernel-stacks.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/z3fold.rst
:Original: Documentation/mm/z3fold.rst
:翻译:

View File

@ -1,4 +1,4 @@
:Original: Documentation/vm/zsmalloc.rst
:Original: Documentation/mm/zsmalloc.rst
:翻译:

View File

@ -128,7 +128,7 @@ TODOList:
* security/index
* sound/index
* crypto/index
* vm/index
* mm/index
* bpf/index
* usb/index
* PCI/index

View File

@ -1,3 +0,0 @@
# SPDX-License-Identifier: GPL-2.0-only
page-types
slabinfo

View File

@ -5668,7 +5668,7 @@ L: linux-mm@kvack.org
S: Maintained
F: Documentation/ABI/testing/sysfs-kernel-mm-damon
F: Documentation/admin-guide/mm/damon/
F: Documentation/vm/damon/
F: Documentation/mm/damon/
F: include/linux/damon.h
F: include/trace/events/damon.h
F: mm/damon/
@ -9252,7 +9252,7 @@ HMM - Heterogeneous Memory Management
M: Jérôme Glisse <jglisse@redhat.com>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/vm/hmm.rst
F: Documentation/mm/hmm.rst
F: include/linux/hmm*
F: lib/test_hmm*
F: mm/hmm*
@ -9350,8 +9350,8 @@ L: linux-mm@kvack.org
S: Maintained
F: Documentation/ABI/testing/sysfs-kernel-mm-hugepages
F: Documentation/admin-guide/mm/hugetlbpage.rst
F: Documentation/vm/hugetlbfs_reserv.rst
F: Documentation/vm/vmemmap_dedup.rst
F: Documentation/mm/hugetlbfs_reserv.rst
F: Documentation/mm/vmemmap_dedup.rst
F: fs/hugetlbfs/
F: include/linux/hugetlb.h
F: mm/hugetlb.c
@ -15338,7 +15338,7 @@ M: Pasha Tatashin <pasha.tatashin@soleen.com>
M: Andrew Morton <akpm@linux-foundation.org>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/vm/page_table_check.rst
F: Documentation/mm/page_table_check.rst
F: include/linux/page_table_check.h
F: mm/page_table_check.c
@ -22480,7 +22480,7 @@ M: Nitin Gupta <ngupta@vflare.org>
R: Sergey Senozhatsky <senozhatsky@chromium.org>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/vm/zsmalloc.rst
F: Documentation/mm/zsmalloc.rst
F: include/linux/zsmalloc.h
F: mm/zsmalloc.c

View File

@ -116,23 +116,6 @@ struct vm_area_struct;
* arch/alpha/mm/fault.c)
*/
/* xwr */
#define __P000 _PAGE_P(_PAGE_FOE | _PAGE_FOW | _PAGE_FOR)
#define __P001 _PAGE_P(_PAGE_FOE | _PAGE_FOW)
#define __P010 _PAGE_P(_PAGE_FOE)
#define __P011 _PAGE_P(_PAGE_FOE)
#define __P100 _PAGE_P(_PAGE_FOW | _PAGE_FOR)
#define __P101 _PAGE_P(_PAGE_FOW)
#define __P110 _PAGE_P(0)
#define __P111 _PAGE_P(0)
#define __S000 _PAGE_S(_PAGE_FOE | _PAGE_FOW | _PAGE_FOR)
#define __S001 _PAGE_S(_PAGE_FOE | _PAGE_FOW)
#define __S010 _PAGE_S(_PAGE_FOE)
#define __S011 _PAGE_S(_PAGE_FOE)
#define __S100 _PAGE_S(_PAGE_FOW | _PAGE_FOR)
#define __S101 _PAGE_S(_PAGE_FOW)
#define __S110 _PAGE_S(0)
#define __S111 _PAGE_S(0)
/*
* pgprot_noncached() is only for infiniband pci support, and a real

View File

@ -155,6 +155,10 @@ retry:
if (fault_signal_pending(fault, regs))
return;
/* The fault is fully completed (including releasing mmap lock) */
if (fault & VM_FAULT_COMPLETED)
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;

View File

@ -280,3 +280,25 @@ mem_init(void)
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
memblock_free_all();
}
static const pgprot_t protection_map[16] = {
[VM_NONE] = _PAGE_P(_PAGE_FOE | _PAGE_FOW |
_PAGE_FOR),
[VM_READ] = _PAGE_P(_PAGE_FOE | _PAGE_FOW),
[VM_WRITE] = _PAGE_P(_PAGE_FOE),
[VM_WRITE | VM_READ] = _PAGE_P(_PAGE_FOE),
[VM_EXEC] = _PAGE_P(_PAGE_FOW | _PAGE_FOR),
[VM_EXEC | VM_READ] = _PAGE_P(_PAGE_FOW),
[VM_EXEC | VM_WRITE] = _PAGE_P(0),
[VM_EXEC | VM_WRITE | VM_READ] = _PAGE_P(0),
[VM_SHARED] = _PAGE_S(_PAGE_FOE | _PAGE_FOW |
_PAGE_FOR),
[VM_SHARED | VM_READ] = _PAGE_S(_PAGE_FOE | _PAGE_FOW),
[VM_SHARED | VM_WRITE] = _PAGE_S(_PAGE_FOE),
[VM_SHARED | VM_WRITE | VM_READ] = _PAGE_S(_PAGE_FOE),
[VM_SHARED | VM_EXEC] = _PAGE_S(_PAGE_FOW | _PAGE_FOR),
[VM_SHARED | VM_EXEC | VM_READ] = _PAGE_S(_PAGE_FOW),
[VM_SHARED | VM_EXEC | VM_WRITE] = _PAGE_S(0),
[VM_SHARED | VM_EXEC | VM_WRITE | VM_READ] = _PAGE_S(0)
};
DECLARE_VM_GET_PAGE_PROT

View File

@ -72,24 +72,6 @@
* This is to enable COW mechanism
*/
/* xwr */
#define __P000 PAGE_U_NONE
#define __P001 PAGE_U_R
#define __P010 PAGE_U_R /* Pvt-W => !W */
#define __P011 PAGE_U_R /* Pvt-W => !W */
#define __P100 PAGE_U_X_R /* X => R */
#define __P101 PAGE_U_X_R
#define __P110 PAGE_U_X_R /* Pvt-W => !W and X => R */
#define __P111 PAGE_U_X_R /* Pvt-W => !W */
#define __S000 PAGE_U_NONE
#define __S001 PAGE_U_R
#define __S010 PAGE_U_W_R /* W => R */
#define __S011 PAGE_U_W_R
#define __S100 PAGE_U_X_R /* X => R */
#define __S101 PAGE_U_X_R
#define __S110 PAGE_U_X_W_R /* X => R */
#define __S111 PAGE_U_X_W_R
#ifndef __ASSEMBLY__
#define pte_write(pte) (pte_val(pte) & _PAGE_WRITE)

Some files were not shown because too many files have changed in this diff Show More