- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames

from hotplugged memory rather than only from main memory.  Series
   "implement "memmap on memory" feature on s390".
 
 - More folio conversions from Matthew Wilcox in the series
 
 	"Convert memcontrol charge moving to use folios"
 	"mm: convert mm counter to take a folio"
 
 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes.  The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".
 
 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.
 
 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools".  Measured improvements are modest.
 
 - zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
   zswap: simplify zswap_swapoff()".
 
 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is hotplugged
   as system memory.
 
 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.
 
 - More DAMON work from SeongJae Park in the series
 
 	"mm/damon: make DAMON debugfs interface deprecation unignorable"
 	"selftests/damon: add more tests for core functionalities and corner cases"
 	"Docs/mm/damon: misc readability improvements"
 	"mm/damon: let DAMOS feeds and tame/auto-tune itself"
 
 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving policy
   wherein we allocate memory across nodes in a weighted fashion rather
   than uniformly.  This is beneficial in heterogeneous memory environments
   appearing with CXL.
 
 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
 
 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".
 
 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format.  Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.
 
 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP".  Mainly
   targeted at arm64, this significantly speeds up fork() when the process
   has a large number of pte-mapped folios.
 
 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP".  It
   implements batching during munmap() and other pte teardown situations.
   The microbenchmark improvements are nice.
 
 - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
   Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings").  Kernel build times on arm64 improved nicely.  Ryan's series
   "Address some contpte nits" provides some followup work.
 
 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page faults.
   He has also added a reproducer under the selftest code.
 
 - In the series "selftests/mm: Output cleanups for the compaction test",
   Mark Brown did what the title claims.
 
 - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
 
 - Even more zswap material from Nhat Pham.  The series "fix and extend
   zswap kselftests" does as claimed.
 
 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
   our handling of DAX on archiecctures which have virtually aliasing data
   caches.  The arm architecture is the main beneficiary.
 
 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
   improvements in worst-case mmap_lock hold times during certain
   userfaultfd operations.
 
 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series
 
 	"page_owner: print stacks and their outstanding allocations"
 	"page_owner: Fixup and cleanup"
 
 - Uladzislau Rezki has contributed some vmalloc scalability improvements
   in his series "Mitigate a vmap lock contention".  It realizes a 12x
   improvement for a certain microbenchmark.
 
 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".
 
 - Some zsmalloc maintenance work from Chengming Zhou in the series
 
 	"mm/zsmalloc: fix and optimize objects/page migration"
 	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
 
 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0.  This a step along the path to implementaton of the merging of
   large anonymous folios.  The series is named "Enable >0 order folio
   memory compaction".
 
 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages() to
   an iterator".
 
 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".
 
 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios.  The
   series is named "Split a folio to any lower order folios".
 
 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.
 
 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".
 
 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which are
   configured to use large numbers of hugetlb pages.
 
 - Matthew Wilcox's series "PageFlags cleanups" does that.
 
 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also.  S390 is affected.
 
 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".
 
 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
 
 - Also, of course, many singleton patches to many things.  Please see
   the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
 joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
 TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
 =TG55
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
This commit is contained in:
Linus Torvalds 2024-03-14 17:43:30 -07:00
commit 902861e34c
364 changed files with 12369 additions and 6055 deletions

View File

@ -0,0 +1,153 @@
What: /sys/bus/dax/devices/daxX.Y/align
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RW) Provides a way to specify an alignment for a dax device.
Values allowed are constrained by the physical address ranges
that back the dax device, and also by arch requirements.
What: /sys/bus/dax/devices/daxX.Y/mapping
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(WO) Provides a way to allocate a mapping range under a dax
device. Specified in the format <start>-<end>.
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
What: /sys/bus/dax/devices/daxX.Y/mapping[0..N]/page_offset
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RO) A dax device may have multiple constituent discontiguous
address ranges. These are represented by the different
'mappingX' subdirectories. The 'start' attribute indicates the
start physical address for the given range. The 'end' attribute
indicates the end physical address for the given range. The
'page_offset' attribute indicates the offset of the current
range in the dax device.
What: /sys/bus/dax/devices/daxX.Y/resource
Date: June, 2019
KernelVersion: v5.3
Contact: nvdimm@lists.linux.dev
Description:
(RO) The resource attribute indicates the starting physical
address of a dax device. In case of a device with multiple
constituent ranges, it indicates the starting address of the
first range.
What: /sys/bus/dax/devices/daxX.Y/size
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RW) The size attribute indicates the total size of a dax
device. For creating subdivided dax devices, or for resizing
an existing device, the new size can be written to this as
part of the reconfiguration process.
What: /sys/bus/dax/devices/daxX.Y/numa_node
Date: November, 2019
KernelVersion: v5.5
Contact: nvdimm@lists.linux.dev
Description:
(RO) If NUMA is enabled and the platform has affinitized the
backing device for this dax device, emit the CPU node
affinity for this device.
What: /sys/bus/dax/devices/daxX.Y/target_node
Date: February, 2019
KernelVersion: v5.1
Contact: nvdimm@lists.linux.dev
Description:
(RO) The target-node attribute is the Linux numa-node that a
device-dax instance may create when it is online. Prior to
being online the device's 'numa_node' property reflects the
closest online cpu node which is the typical expectation of a
device 'numa_node'. Once it is online it becomes its own
distinct numa node.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/available_size
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RO) The available_size attribute tracks available dax region
capacity. This only applies to volatile hmem devices, not pmem
devices, since pmem devices are defined by nvdimm namespace
boundaries.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/size
Date: July, 2017
KernelVersion: v5.1
Contact: nvdimm@lists.linux.dev
Description:
(RO) The size attribute indicates the size of a given dax region
in bytes.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/align
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RO) The align attribute indicates alignment of the dax region.
Changes on align may not always be valid, when say certain
mappings were created with 2M and then we switch to 1G. This
validates all ranges against the new value being attempted, post
resizing.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/seed
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RO) The seed device is a concept for dynamic dax regions to be
able to split the region amongst multiple sub-instances. The
seed device, similar to libnvdimm seed devices, is a device
that starts with zero capacity allocated and unbound to a
driver.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/create
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(RW) The create interface to the dax region provides a way to
create a new unconfigured dax device under the given region, which
can then be configured (with a size etc.) and then probed.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/delete
Date: October, 2020
KernelVersion: v5.10
Contact: nvdimm@lists.linux.dev
Description:
(WO) The delete interface for a dax region provides for deletion
of any 0-sized and idle dax devices.
What: $(readlink -f /sys/bus/dax/devices/daxX.Y)/../dax_region/id
Date: July, 2017
KernelVersion: v5.1
Contact: nvdimm@lists.linux.dev
Description:
(RO) The id attribute indicates the region id of a dax region.
What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory
Date: January, 2024
KernelVersion: v6.8
Contact: nvdimm@lists.linux.dev
Description:
(RW) Control the memmap_on_memory setting if the dax device
were to be hotplugged as system memory. This determines whether
the 'altmap' for the hotplugged memory will be placed on the
device being hotplugged (memmap_on_memory=1) or if it will be
placed on regular memory (memmap_on_memory=0). This attribute
must be set before the device is handed over to the 'kmem'
driver (i.e. hotplugged into system-ram). Additionally, this
depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled
memmap_on_memory parameter for memory_hotplug. This is
typically set on the kernel command line -
memory_hotplug.memmap_on_memory set to 'true' or 'force'."

View File

@ -23,3 +23,9 @@ Date: Feb 2021
Contact: Minchan Kim <minchan@kernel.org>
Description:
the number of pages CMA API failed to allocate
What: /sys/kernel/mm/cma/<cma-heap-name>/release_pages_success
Date: Feb 2024
Contact: Anshuman Khandual <anshuman.khandual@arm.com>
Description:
the number of pages CMA API succeeded to release

View File

@ -34,7 +34,9 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
kdamond. Writing 'update_schemes_tried_bytes' to the file
updates only '.../tried_regions/total_bytes' files of this
kdamond. Writing 'clear_schemes_tried_regions' to the file
removes contents of the 'tried_regions' directory.
removes contents of the 'tried_regions' directory. Writing
'update_schemes_effective_quotas' to the file updates
'.../quotas/effective_bytes' files of this kdamond.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
Date: Mar 2022
@ -208,6 +210,12 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the size
quota of the scheme in bytes.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/effective_bytes
Date: Feb 2024
Contact: SeongJae Park <sj@kernel.org>
Description: Reading from this file gets the effective size quota of the
scheme in bytes, which adjusted for the time quota and goals.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/reset_interval_ms
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
@ -221,6 +229,12 @@ Description: Writing a number 'N' to this file creates the number of
directories for setting automatic tuning of the scheme's
aggressiveness named '0' to 'N-1' under the goals/ directory.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_metric
Date: Feb 2024
Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the quota
auto-tuning goal metric.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/target_value
Date: Nov 2023
Contact: SeongJae Park <sj@kernel.org>

View File

@ -0,0 +1,4 @@
What: /sys/kernel/mm/mempolicy/
Date: January 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Interface for Mempolicy

View File

@ -0,0 +1,25 @@
What: /sys/kernel/mm/mempolicy/weighted_interleave/
Date: January 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Configuration Interface for the Weighted Interleave policy
What: /sys/kernel/mm/mempolicy/weighted_interleave/nodeN
Date: January 2024
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Weight configuration interface for nodeN
The interleave weight for a memory node (N). These weights are
utilized by tasks which have set their mempolicy to
MPOL_WEIGHTED_INTERLEAVE.
These weights only affect new allocations, and changes at runtime
will not cause migrations on already allocated pages.
The minimum weight for a node is always 1.
Minimum weight: 1
Maximum weight: 255
Writing an empty string or `0` will reset the weight to the
system default. The system default may be set by the kernel
or drivers at boot or during hotplug events.

View File

@ -65,11 +65,11 @@ Defines the beginning of the text section. In general, _stext indicates
the kernel start address. Used to convert a virtual address from the
direct kernel map to a physical address.
vmap_area_list
--------------
VMALLOC_START
-------------
Stores the virtual area list. makedumpfile gets the vmalloc start value
from this variable and its value is necessary for vmalloc translation.
Stores the base address of vmalloc area. makedumpfile gets this value
since is necessary for vmalloc translation.
mem_map
-------

View File

@ -117,6 +117,33 @@ milliseconds.
1 second by default.
quota_mem_pressure_us
---------------------
Desired level of memory pressure-stall time in microseconds.
While keeping the caps that set by other quotas, DAMON_RECLAIM automatically
increases and decreases the effective level of the quota aiming this level of
memory pressure is incurred. System-wide ``some`` memory PSI in microseconds
per quota reset interval (``quota_reset_interval_ms``) is collected and
compared to this value to see if the aim is satisfied. Value zero means
disabling this auto-tuning feature.
Disabled by default.
quota_autotune_feedback
-----------------------
User-specifiable feedback for auto-tuning of the effective quota.
While keeping the caps that set by other quotas, DAMON_RECLAIM automatically
increases and decreases the effective level of the quota aiming receiving this
feedback of value ``10,000`` from the user. DAMON_RECLAIM assumes the feedback
value and the quota are positively proportional. Value zero means disabling
this auto-tuning feature.
Disabled by default.
wmarks_interval
---------------

View File

@ -83,10 +83,10 @@ comma (",").
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
│ │ │ │ │ │ │ │ │ 0/target_value,current_value
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,memcg_id
@ -153,6 +153,9 @@ Users can write below commands for the kdamond to the ``state`` file.
- ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme
action tried regions directory for each DAMON-based operation scheme of the
kdamond.
- ``update_schemes_effective_bytes``: Update the contents of
``effective_bytes`` files for each DAMON-based operation scheme of the
kdamond. For more details, refer to :ref:`quotas directory <sysfs_quotas>`.
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
@ -180,19 +183,14 @@ In each context directory, two files (``avail_operations`` and ``operations``)
and three directories (``monitoring_attrs``, ``targets``, and ``schemes``)
exist.
DAMON supports multiple types of monitoring operations, including those for
virtual address space and the physical address space. You can get the list of
available monitoring operations set on the currently running kernel by reading
DAMON supports multiple types of :ref:`monitoring operations
<damon_design_configurable_operations_set>`, including those for virtual address
space and the physical address space. You can get the list of available
monitoring operations set on the currently running kernel by reading
``avail_operations`` file. Based on the kernel configuration, the file will
list some or all of below keywords.
- vaddr: Monitor virtual address spaces of specific processes
- fvaddr: Monitor fixed virtual address ranges
- paddr: Monitor the physical address space of the system
Please refer to :ref:`regions sysfs directory <sysfs_regions>` for detailed
differences between the operations sets in terms of the monitoring target
regions.
list different available operation sets. Please refer to the :ref:`design
<damon_operations_set>` for the list of all available operation sets and their
brief explanations.
You can set and get what type of monitoring operations DAMON will use for the
context by writing one of the keywords listed in ``avail_operations`` file and
@ -247,17 +245,11 @@ process to the ``pid_target`` file.
targets/<N>/regions
-------------------
When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to
the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the
monitoring target regions so that entire memory mappings of target processes
can be covered. However, users could want to set the initial monitoring region
to specific address ranges.
In contrast, DAMON do not automatically sets and updates the monitoring target
regions when ``fvaddr`` or ``paddr`` monitoring operations sets are being used
(``fvaddr`` or ``paddr`` have written to the ``contexts/<N>/operations``).
Therefore, users should set the monitoring target regions by themselves in the
cases.
In case of ``fvaddr`` or ``paddr`` monitoring operations sets, users are
required to set the monitoring target address ranges. In case of ``vaddr``
operations set, it is not mandatory, but users can optionally set the initial
monitoring region to specific address ranges. Please refer to the :ref:`design
<damon_design_vaddr_target_regions_construction>` for more details.
For such cases, users can explicitly set the initial monitoring target regions
as they want, by writing proper values to the files under this directory.
@ -302,27 +294,8 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``,
The ``action`` file is for setting and getting the scheme's :ref:`action
<damon_design_damos_action>`. The keywords that can be written to and read
from the file and their meaning are as below.
Note that support of each action depends on the running DAMON operations set
:ref:`implementation <sysfs_context>`.
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``lru_prio``: Prioritize the region on its LRU lists.
Supported by ``paddr`` operations set.
- ``lru_deprio``: Deprioritize the region on its LRU lists.
Supported by ``paddr`` operations set.
- ``stat``: Do nothing but count the statistics.
Supported by all operations sets.
from the file and their meaning are same to those of the list on
:ref:`design doc <damon_design_damos_action>`.
The ``apply_interval_us`` file is for setting and getting the scheme's
:ref:`apply_interval <damon_design_damos>` in microseconds.
@ -350,8 +323,9 @@ schemes/<N>/quotas/
The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
DAMON-based operation scheme.
Under ``quotas`` directory, three files (``ms``, ``bytes``,
``reset_interval_ms``) and two directores (``weights`` and ``goals``) exist.
Under ``quotas`` directory, four files (``ms``, ``bytes``,
``reset_interval_ms``, ``effective_bytes``) and two directores (``weights`` and
``goals``) exist.
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
``reset interval`` in milliseconds by writing the values to the three files,
@ -359,7 +333,17 @@ respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds
for applying the ``action`` to memory regions of the ``access_pattern``, and to
apply the action to only up to ``bytes`` bytes of memory regions within the
``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
quota limits.
quota limits unless at least one :ref:`goal <sysfs_schemes_quota_goals>` is
set.
The time quota is internally transformed to a size quota. Between the
transformed size quota and user-specified size quota, smaller one is applied.
Based on the user-specified :ref:`goal <sysfs_schemes_quota_goals>`, the
effective size quota is further adjusted. Reading ``effective_bytes`` returns
the current effective size quota. The file is not updated in real time, so
users should ask DAMON sysfs interface to update the content of the file for
the stats by writing a special keyword, ``update_schemes_effective_bytes`` to
the relevant ``kdamonds/<N>/state`` file.
Under ``weights`` directory, three files (``sz_permil``,
``nr_accesses_permil``, and ``age_permil``) exist.
@ -382,11 +366,11 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used.
Each goal directory contains two files, namely ``target_value`` and
``current_value``. Users can set and get any number to those files to set the
feedback. User space main workload's latency or throughput, system metrics
like free memory ratio or memory pressure stall time (PSI) could be example
metrics for the values. Note that users should write
Each goal directory contains three files, namely ``target_metric``,
``target_value`` and ``current_value``. Users can set and get the three
parameters for the quota auto-tuning goals that specified on the :ref:`design
doc <damon_design_damos_quotas_auto_tuning>` by writing to and reading from each
of the files. Note that users should further write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
directory <sysfs_kdamond>` to pass the feedback to DAMON.
@ -579,11 +563,11 @@ monitoring results recording.
While the monitoring is turned on, you could record the tracepoint events and
show results using tracepoint supporting tools like ``perf``. For example::
# echo on > monitor_on
# echo on > kdamonds/0/state
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# echo off > kdamonds/0/state
# perf script
kdamond.0 46568 [027] 79357.842179: damon:damon_aggregated: target_id=0 nr_regions=11 122509119488-135708762112: 0 864
[...]
@ -628,9 +612,17 @@ debugfs Interface (DEPRECATED!)
move, please report your usecase to damon@lists.linux.dev and
linux-mm@kvack.org.
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
DAMON exports nine files, ``DEPRECATED``, ``attrs``, ``target_ids``,
``init_regions``, ``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``,
``mk_contexts`` and ``rm_contexts`` under its debugfs directory,
``<debugfs>/damon/``.
``DEPRECATED`` is a read-only file for the DAMON debugfs interface deprecation
notice. Reading it returns the deprecation notice, as below::
# cat DEPRECATED
DAMON debugfs interface is deprecated, so users should move to DAMON_SYSFS. If you cannot, please report your usecase to damon@lists.linux.dev and linux-mm@kvack.org.
Attributes
@ -755,19 +747,17 @@ Action
~~~~~~
The ``<action>`` is a predefined integer for memory management :ref:`actions
<damon_design_damos_action>`. The supported numbers and their meanings are as
below.
<damon_design_damos_action>`. The mapping between the ``<action>`` values and
the memory management actions is as below. For the detailed meaning of the
action and DAMON operations set supporting each action, please refer to the
list on :ref:`design doc <damon_design_damos_action>`.
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
``target`` is ``paddr``.
- 1: Call ``madvise()`` for the region with ``MADV_COLD``. Ignored if
``target`` is ``paddr``.
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. Ignored if
``target`` is ``paddr``.
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. Ignored if
``target`` is ``paddr``.
- 5: Do nothing but count the statistics
- 0: ``willneed``
- 1: ``cold``
- 2: ``pageout``
- 3: ``hugepage``
- 4: ``nohugepage``
- 5: ``stat``
Quota
~~~~~
@ -848,16 +838,16 @@ Turning On/Off
Setting the files as described above doesn't incur effect unless you explicitly
start the monitoring. You can start, stop, and check the current status of the
monitoring by writing to and reading from the ``monitor_on`` file. Writing
``on`` to the file starts the monitoring of the targets with the attributes.
Writing ``off`` to the file stops those. DAMON also stops if every target
process is terminated. Below example commands turn on, off, and check the
status of DAMON::
monitoring by writing to and reading from the ``monitor_on_DEPRECATED`` file.
Writing ``on`` to the file starts the monitoring of the targets with the
attributes. Writing ``off`` to the file stops those. DAMON also stops if
every target process is terminated. Below example commands turn on, off, and
check the status of DAMON::
# cd <debugfs>/damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
# echo on > monitor_on_DEPRECATED
# echo off > monitor_on_DEPRECATED
# cat monitor_on_DEPRECATED
off
Please note that you cannot write to the above-mentioned debugfs files while
@ -873,11 +863,11 @@ can get the pid of the thread by reading the ``kdamond_pid`` file. When the
monitoring is turned off, reading the file returns ``none``. ::
# cd <debugfs>/damon
# cat monitor_on
# cat monitor_on_DEPRECATED
off
# cat kdamond_pid
none
# echo on > monitor_on
# echo on > monitor_on_DEPRECATED
# cat kdamond_pid
18594
@ -907,5 +897,5 @@ directory by putting the name of the context to the ``rm_contexts`` file. ::
# ls foo
# ls: cannot access 'foo': No such file or directory
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
root directory only.
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on_DEPRECATED`` files
are in the root directory only.

View File

@ -250,6 +250,15 @@ MPOL_PREFERRED_MANY
can fall back to all existing numa nodes. This is effectively
MPOL_PREFERRED allowed for a mask rather than a single node.
MPOL_WEIGHTED_INTERLEAVE
This mode operates the same as MPOL_INTERLEAVE, except that
interleaving behavior is executed based on weights set in
/sys/kernel/mm/mempolicy/weighted_interleave/
Weighted interleave allocates pages on nodes according to a
weight. For example if nodes [0,1] are weighted [5,2], 5 pages
will be allocated on node0 for every 2 pages allocated on node1.
NUMA memory policy supports the following optional mode flags:
MPOL_F_STATIC_NODES

View File

@ -169,7 +169,7 @@ Error reports
A typical KASAN report looks like this::
==================================================================
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
@ -179,8 +179,8 @@ A typical KASAN report looks like this::
print_address_description+0x73/0x280
kasan_report+0x144/0x187
__asan_report_store1_noabort+0x17/0x20
kmalloc_oob_right+0xa8/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0xa8/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -200,8 +200,8 @@ A typical KASAN report looks like this::
save_stack+0x43/0xd0
kasan_kmalloc+0xa7/0xd0
kmem_cache_alloc_trace+0xe1/0x1b0
kmalloc_oob_right+0x56/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0x56/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -531,15 +531,15 @@ When a test passes::
When a test fails due to a failed ``kmalloc``::
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
Expected ptr is not null, but is
not ok 4 - kmalloc_large_oob_right
not ok 5 - kmalloc_large_oob_right
When a test fails due to a missing KASAN report::
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
not ok 44 - kmalloc_double_kzfree
not ok 28 - kmalloc_double_kzfree
At the end the cumulative status of all KASAN tests is printed. On success::
@ -555,7 +555,7 @@ There are a few ways to run KUnit-compatible KASAN tests.
1. Loadable module
With ``CONFIG_KUNIT`` enabled, KASAN-KUnit tests can be built as a loadable
module and run by loading ``test_kasan.ko`` with ``insmod`` or ``modprobe``.
module and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
2. Built-In

View File

@ -31,6 +31,8 @@ DAMON subsystem is configured with three layers including
interfaces for the user space, on top of the core layer.
.. _damon_design_configurable_operations_set:
Configurable Operations Set
---------------------------
@ -63,6 +65,8 @@ modules that built on top of the core layer using the API, which can be easily
used by the user space end users.
.. _damon_operations_set:
Operations Set Layer
====================
@ -71,16 +75,26 @@ The monitoring operations are defined in two parts:
1. Identification of the monitoring target address range for the address space.
2. Access check of specific address range in the target space.
DAMON currently provides the implementations of the operations for the physical
and virtual address spaces. Below two subsections describe how those work.
DAMON currently provides below three operation sets. Below two subsections
describe how those work.
- vaddr: Monitor virtual address spaces of specific processes
- fvaddr: Monitor fixed virtual address ranges
- paddr: Monitor the physical address space of the system
.. _damon_design_vaddr_target_regions_construction:
VMA-based Target Address Range Construction
-------------------------------------------
This is only for the virtual address space monitoring operations
implementation. That for the physical address space simply asks users to
manually set the monitoring target address ranges.
A mechanism of ``vaddr`` DAMON operations set that automatically initializes
and updates the monitoring target address regions so that entire memory
mappings of the target processes can be covered.
This mechanism is only for the ``vaddr`` operations set. In cases of
``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the
monitoring target address ranges.
Only small parts in the super-huge virtual address space of the processes are
mapped to the physical memory and accessed. Thus, tracking the unmapped
@ -294,9 +308,29 @@ not mandated to support all actions of the list. Hence, the availability of
specific DAMOS action depends on what operations set is selected to be used
together.
Applying an action to a region is considered as changing the region's
characteristics. Hence, DAMOS resets the age of regions when an action is
applied to those.
The list of the supported actions, their meaning, and DAMON operations sets
that supports each action are as below.
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``pageout``: Reclaim the region.
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``lru_prio``: Prioritize the region on its LRU lists.
Supported by ``paddr`` operations set.
- ``lru_deprio``: Deprioritize the region on its LRU lists.
Supported by ``paddr`` operations set.
- ``stat``: Do nothing but count the statistics.
Supported by all operations sets.
Applying the actions except ``stat`` to a region is considered as changing the
region's characteristics. Hence, DAMOS resets the age of regions when any such
actions are applied to those.
.. _damon_design_damos_access_pattern:
@ -364,12 +398,28 @@ Aim-oriented Feedback-driven Auto-tuning
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Automatic feedback-driven quota tuning. Instead of setting the absolute quota
value, users can repeatedly provide numbers representing how much of their goal
for the scheme is achieved as feedback. DAMOS then automatically tunes the
value, users can specify the metric of their interest, and what target value
they want the metric value to be. DAMOS then automatically tunes the
aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
is over achieving the goal, it decreases the quota.
The goal can be specified with three parameters, namely ``target_metric``,
``target_value``, and ``current_value``. The auto-tuning mechanism tries to
make ``current_value`` of ``target_metric`` be same to ``target_value``.
Currently, two ``target_metric`` are provided.
- ``user_input``: User-provided value. Users could use any metric that they
has interest in for the value. Use space main workload's latency or
throughput, system metrics like free memory ratio or memory pressure stall
time (PSI) could be examples. Note that users should explicitly set
``current_value`` on their own in this case. In other words, users should
repeatedly provide the feedback.
- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information
in microseconds that measured from last quota reset to next quota reset.
DAMOS does the measurement on its own, so only ``target_value`` need to be
set by users at the initial time. In other words, DAMOS does self-feedback.
.. _damon_design_damos_watermarks:

View File

@ -21,8 +21,8 @@ be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the
memory management subsystem maintainer.
Note again the patches for review should be made against the mm-unstable
tree[1] whenever possible. damon/next is only for preview of others' works in
progress.
tree [1]_ whenever possible. damon/next is only for preview of others' works
in progress.
Submit checklist addendum
-------------------------
@ -41,8 +41,8 @@ Further doing below and putting the results will be helpful.
Key cycle dates
---------------
Patches can be sent anytime. Key cycle dates of the mm-unstable[1] and
mm-stable[3] trees depend on the memory management subsystem maintainer.
Patches can be sent anytime. Key cycle dates of the mm-unstable [1]_ and
mm-stable [3]_ trees depend on the memory management subsystem maintainer.
Review cadence
--------------

View File

@ -24,6 +24,11 @@ fragmentation statistics can be obtained through gfp flag information of
each page. It is already implemented and activated if page owner is
enabled. Other usages are more than welcome.
It can also be used to show all the stacks and their outstanding
allocations, which gives us a quick overview of where the memory is going
without the need to screen through all the pages and match the allocation
and free operation.
page owner is disabled by default. So, if you'd like to use it, you need
to add "page_owner=on" to your boot cmdline. If the kernel is built
with page owner and page owner is disabled in runtime due to not enabling
@ -68,6 +73,46 @@ Usage
4) Analyze information from page owner::
cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
cat stacks.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x7e6/0x2140
__alloc_pages+0x18a/0x370
new_slab+0xc8/0x580
___slab_alloc+0x1f2/0xaf0
__slab_alloc.isra.86+0x22/0x40
kmem_cache_alloc+0x31b/0x350
__khugepaged_enter+0x39/0x100
dup_mmap+0x1c7/0x5ce
copy_process+0x1afe/0x1c90
kernel_clone+0x9a/0x3c0
__do_sys_clone+0x66/0x90
do_syscall_64+0x7f/0x160
entry_SYSCALL_64_after_hwframe+0x6c/0x74
stack_count: 234
...
...
echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
cat stacks_7000.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x7e6/0x2140
__alloc_pages+0x18a/0x370
alloc_pages_mpol+0xdf/0x1e0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb0/0x100
page_cache_ra_unbounded+0x97/0x180
filemap_fault+0x4b4/0x1200
__do_fault+0x2d/0x110
do_pte_missing+0x4b0/0xa30
__handle_mm_fault+0x7fa/0xb70
handle_mm_fault+0x125/0x300
do_user_addr_fault+0x3c9/0x840
exc_page_fault+0x68/0x150
asm_exc_page_fault+0x22/0x30
stack_count: 8248
...
cat /sys/kernel/debug/page_owner > page_owner_full.txt
./page_owner_sort page_owner_full.txt sorted_page_owner.txt

View File

@ -344,7 +344,7 @@ debugfs接口
:ref:`sysfs接口<sysfs_interface>`
DAMON导出了八个文件, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts``
``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``, ``mk_contexts``
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
@ -521,15 +521,15 @@ DAMON导出了八个文件, ``attrs``, ``target_ids``, ``init_regions``,
开关
----
除非你明确地启动监测,否则如上所述的文件设置不会产生效果。你可以通过写入和读取 ``monitor_on``
除非你明确地启动监测,否则如上所述的文件设置不会产生效果。你可以通过写入和读取 ``monitor_on_DEPRECATED``
文件来启动、停止和检查监测的当前状态。写入 ``on`` 该文件可以启动对有属性的目标的监测。写入
``off`` 该文件则停止这些目标。如果每个目标进程被终止DAMON也会停止。下面的示例命令开启、关
闭和检查DAMON的状态::
# cd <debugfs>/damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
# echo on > monitor_on_DEPRECATED
# echo off > monitor_on_DEPRECATED
# cat monitor_on_DEPRECATED
off
请注意当监测开启时你不能写到上述的debugfs文件。如果你在DAMON运行时写到这些文件将会返
@ -543,11 +543,11 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
得该线程的 ``pid`` 。当监测被 ``关闭`` 时,读取该文件不会返回任何信息::
# cd <debugfs>/damon
# cat monitor_on
# cat monitor_on_DEPRECATED
off
# cat kdamond_pid
none
# echo on > monitor_on
# echo on > monitor_on_DEPRECATED
# cat kdamond_pid
18594
@ -574,7 +574,7 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
# ls foo
# ls: cannot access 'foo': No such file or directory
注意, ``mk_contexts````rm_contexts````monitor_on`` 文件只在根目录下。
注意, ``mk_contexts````rm_contexts````monitor_on_DEPRECATED`` 文件只在根目录下。
监测结果的监测点
@ -583,9 +583,9 @@ DAMON通过一个叫做kdamond的内核线程来进行请求监测。你可以
DAMON通过一个tracepoint ``damon:damon_aggregated`` 提供监测结果. 当监测开启时,你可
以记录追踪点事件并使用追踪点支持工具如perf显示结果。比如说::
# echo on > monitor_on
# echo on > monitor_on_DEPRECATED
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# echo off > monitor_on_DEPRECATED
# perf script

View File

@ -137,7 +137,7 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
典型的KASAN报告如下所示::
==================================================================
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
@ -147,8 +147,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
print_address_description+0x73/0x280
kasan_report+0x144/0x187
__asan_report_store1_noabort+0x17/0x20
kmalloc_oob_right+0xa8/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0xa8/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -168,8 +168,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行参数的影响。当它被启用
save_stack+0x43/0xd0
kasan_kmalloc+0xa7/0xd0
kmem_cache_alloc_trace+0xe1/0x1b0
kmalloc_oob_right+0x56/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0x56/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -421,15 +421,15 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
当由于 ``kmalloc`` 失败而导致测试失败时::
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
Expected ptr is not null, but is
not ok 4 - kmalloc_large_oob_right
not ok 5 - kmalloc_large_oob_right
当由于缺少KASAN报告而导致测试失败时::
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
not ok 44 - kmalloc_double_kzfree
not ok 28 - kmalloc_double_kzfree
最后打印所有KASAN测试的累积状态。成功::
@ -445,7 +445,7 @@ KASAN连接到vmap基础架构以懒清理未使用的影子内存。
1. 可加载模块
启用 ``CONFIG_KUNIT``KASAN-KUnit测试可以构建为可加载模块并通过使用
``insmod````modprobe`` 加载 ``test_kasan.ko`` 来运行。
``insmod````modprobe`` 加载 ``kasan_test.ko`` 来运行。
2. 内置

View File

@ -344,7 +344,7 @@ debugfs接口
:ref:`sysfs接口<sysfs_interface>`
DAMON導出了八個文件, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts``
``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``, ``mk_contexts``
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
@ -521,15 +521,15 @@ DAMON導出了八個文件, ``attrs``, ``target_ids``, ``init_regions``,
開關
----
除非你明確地啓動監測,否則如上所述的文件設置不會產生效果。你可以通過寫入和讀取 ``monitor_on``
除非你明確地啓動監測,否則如上所述的文件設置不會產生效果。你可以通過寫入和讀取 ``monitor_on_DEPRECATED``
文件來啓動、停止和檢查監測的當前狀態。寫入 ``on`` 該文件可以啓動對有屬性的目標的監測。寫入
``off`` 該文件則停止這些目標。如果每個目標進程被終止DAMON也會停止。下面的示例命令開啓、關
閉和檢查DAMON的狀態::
# cd <debugfs>/damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
# echo on > monitor_on_DEPRECATED
# echo off > monitor_on_DEPRECATED
# cat monitor_on_DEPRECATED
off
請注意當監測開啓時你不能寫到上述的debugfs文件。如果你在DAMON運行時寫到這些文件將會返
@ -543,11 +543,11 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
得該線程的 ``pid`` 。當監測被 ``關閉`` 時,讀取該文件不會返回任何信息::
# cd <debugfs>/damon
# cat monitor_on
# cat monitor_on_DEPRECATED
off
# cat kdamond_pid
none
# echo on > monitor_on
# echo on > monitor_on_DEPRECATED
# cat kdamond_pid
18594
@ -574,7 +574,7 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
# ls foo
# ls: cannot access 'foo': No such file or directory
注意, ``mk_contexts````rm_contexts````monitor_on`` 文件只在根目錄下。
注意, ``mk_contexts````rm_contexts````monitor_on_DEPRECATED`` 文件只在根目錄下。
監測結果的監測點
@ -583,10 +583,10 @@ DAMON通過一個叫做kdamond的內核線程來進行請求監測。你可以
DAMON通過一個tracepoint ``damon:damon_aggregated`` 提供監測結果. 當監測開啓時,你可
以記錄追蹤點事件並使用追蹤點支持工具如perf顯示結果。比如說::
# echo on > monitor_on
# echo on > monitor_on_DEPRECATED
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# echo off > monitor_on_DEPRECATED
# perf script

View File

@ -137,7 +137,7 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
典型的KASAN報告如下所示::
==================================================================
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan]
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [kasan_test]
Write of size 1 at addr ffff8801f44ec37b by task insmod/2760
CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698
@ -147,8 +147,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
print_address_description+0x73/0x280
kasan_report+0x144/0x187
__asan_report_store1_noabort+0x17/0x20
kmalloc_oob_right+0xa8/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0xa8/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -168,8 +168,8 @@ KASAN受到通用 ``panic_on_warn`` 命令行參數的影響。當它被啓用
save_stack+0x43/0xd0
kasan_kmalloc+0xa7/0xd0
kmem_cache_alloc_trace+0xe1/0x1b0
kmalloc_oob_right+0x56/0xbc [test_kasan]
kmalloc_tests_init+0x16/0x700 [test_kasan]
kmalloc_oob_right+0x56/0xbc [kasan_test]
kmalloc_tests_init+0x16/0x700 [kasan_test]
do_one_initcall+0xa5/0x3ae
do_init_module+0x1b6/0x547
load_module+0x75df/0x8070
@ -421,15 +421,15 @@ KASAN連接到vmap基礎架構以懶清理未使用的影子內存。
當由於 ``kmalloc`` 失敗而導致測試失敗時::
# kmalloc_large_oob_right: ASSERTION FAILED at lib/test_kasan.c:163
# kmalloc_large_oob_right: ASSERTION FAILED at mm/kasan/kasan_test.c:245
Expected ptr is not null, but is
not ok 4 - kmalloc_large_oob_right
not ok 5 - kmalloc_large_oob_right
當由於缺少KASAN報告而導致測試失敗時::
# kmalloc_double_kzfree: EXPECTATION FAILED at lib/test_kasan.c:974
# kmalloc_double_kzfree: EXPECTATION FAILED at mm/kasan/kasan_test.c:709
KASAN failure expected in "kfree_sensitive(ptr)", but none occurred
not ok 44 - kmalloc_double_kzfree
not ok 28 - kmalloc_double_kzfree
最後打印所有KASAN測試的累積狀態。成功::
@ -445,7 +445,7 @@ KASAN連接到vmap基礎架構以懶清理未使用的影子內存。
1. 可加載模塊
啓用 ``CONFIG_KUNIT``KASAN-KUnit測試可以構建爲可加載模塊並通過使用
``insmod````modprobe`` 加載 ``test_kasan.ko`` 來運行。
``insmod````modprobe`` 加載 ``kasan_test.ko`` 來運行。
2. 內置

View File

@ -5413,6 +5413,7 @@ R: Muchun Song <muchun.song@linux.dev>
L: cgroups@vger.kernel.org
L: linux-mm@kvack.org
S: Maintained
F: include/linux/memcontrol.h
F: mm/memcontrol.c
F: mm/swap_cgroup.c
F: samples/cgroup/*
@ -14144,15 +14145,24 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new
F: include/linux/gfp.h
F: include/linux/gfp_types.h
F: include/linux/memfd.h
F: include/linux/memory.h
F: include/linux/memory_hotplug.h
F: include/linux/memory-tiers.h
F: include/linux/mempolicy.h
F: include/linux/mempool.h
F: include/linux/memremap.h
F: include/linux/mm.h
F: include/linux/mm_*.h
F: include/linux/mmzone.h
F: include/linux/mmu_notifier.h
F: include/linux/pagewalk.h
F: include/linux/rmap.h
F: include/trace/events/ksm.h
F: mm/
F: tools/mm/
F: tools/testing/selftests/mm/
N: include/linux/page[-_]*
MEMORY MAPPING
M: Andrew Morton <akpm@linux-foundation.org>
@ -24447,6 +24457,7 @@ ZSWAP COMPRESSED SWAP CACHING
M: Johannes Weiner <hannes@cmpxchg.org>
M: Yosry Ahmed <yosryahmed@google.com>
M: Nhat Pham <nphamcs@gmail.com>
R: Chengming Zhou <chengming.zhou@linux.dev>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/admin-guide/mm/zswap.rst
@ -24454,6 +24465,7 @@ F: include/linux/zpool.h
F: include/linux/zswap.h
F: mm/zpool.c
F: mm/zswap.c
F: tools/testing/selftests/cgroup/test_zswap.c
THE REST
M: Linus Torvalds <torvalds@linux-foundation.org>

View File

@ -6,6 +6,7 @@
config ARC
def_bool y
select ARC_TIMERS
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT

View File

@ -0,0 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_ARC_CACHETYPE_H
#define __ASM_ARC_CACHETYPE_H
#include <linux/types.h>
#define cpu_dcache_is_aliasing() true
#endif

View File

@ -5,6 +5,7 @@ config ARM
select ARCH_32BIT_OFF_T
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && FRAME_POINTER && !ARM_UNWIND
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_CPU_FINALIZE_INIT if MMU
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL if MMU

View File

@ -17,7 +17,7 @@ config ARM_PTDUMP_DEBUGFS
kernel.
If in doubt, say "N"
config DEBUG_WX
config ARM_DEBUG_WX
bool "Warn on W+X mappings at boot"
depends on MMU
select ARM_PTDUMP_CORE

View File

@ -252,7 +252,7 @@ CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_GDB_SCRIPTS=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_WX=y
CONFIG_ARM_DEBUG_WX=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_TIMEOUT=-1

View File

@ -302,7 +302,7 @@ CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_GDB_SCRIPTS=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_WX=y
CONFIG_ARM_DEBUG_WX=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_TIMEOUT=-1

View File

@ -20,6 +20,8 @@ extern unsigned int cacheid;
#define icache_is_vipt_aliasing() cacheid_is(CACHEID_VIPT_I_ALIASING)
#define icache_is_pipt() cacheid_is(CACHEID_PIPT)
#define cpu_dcache_is_aliasing() (cache_is_vivt() || cache_is_vipt_aliasing())
/*
* __LINUX_ARM_ARCH__ is the minimum supported CPU architecture
* Mask out support which will never be present on newer CPUs.

View File

@ -213,7 +213,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_pfn(pmd) (__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
#define pmd_large(pmd) (pmd_val(pmd) & 2)
#define pmd_leaf(pmd) (pmd_val(pmd) & 2)
#define pmd_bad(pmd) (pmd_val(pmd) & 2)
#define pmd_present(pmd) (pmd_val(pmd))

View File

@ -118,7 +118,6 @@
PMD_TYPE_TABLE)
#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_SECT)
#define pmd_large(pmd) pmd_sect(pmd)
#define pmd_leaf(pmd) pmd_sect(pmd)
#define pud_clear(pudp) \

View File

@ -209,6 +209,8 @@ static inline void __sync_icache_dcache(pte_t pteval)
extern void __sync_icache_dcache(pte_t pteval);
#endif
#define PFN_PTE_SHIFT PAGE_SHIFT
void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr);
#define set_ptes set_ptes

View File

@ -32,10 +32,10 @@ void ptdump_check_wx(void);
#endif /* CONFIG_ARM_PTDUMP_CORE */
#ifdef CONFIG_DEBUG_WX
#define debug_checkwx() ptdump_check_wx()
#ifdef CONFIG_ARM_DEBUG_WX
#define arm_debug_checkwx() ptdump_check_wx()
#else
#define debug_checkwx() do { } while (0)
#define arm_debug_checkwx() do { } while (0)
#endif
#endif /* __ASM_PTDUMP_H */

View File

@ -60,6 +60,7 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o insn.o patch.o
obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o insn.o patch.o
obj-$(CONFIG_JUMP_LABEL) += jump_label.o insn.o patch.o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
# Main staffs in KPROBES are in arch/arm/probes/ .
obj-$(CONFIG_KPROBES) += patch.o insn.o
obj-$(CONFIG_OABI_COMPAT) += sys_oabi-compat.o

View File

@ -198,10 +198,3 @@ void machine_kexec(struct kimage *image)
soft_restart(reboot_entry_phys);
}
void arch_crash_save_vmcoreinfo(void)
{
#ifdef CONFIG_ARM_LPAE
VMCOREINFO_CONFIG(ARM_LPAE);
#endif
}

View File

@ -979,7 +979,7 @@ static int __init init_machine_late(void)
}
late_initcall(init_machine_late);
#ifdef CONFIG_KEXEC
#ifdef CONFIG_CRASH_RESERVE
/*
* The crash region must be aligned to 128MB to avoid
* zImage relocating below the reserved region.
@ -1066,7 +1066,7 @@ static void __init reserve_crashkernel(void)
}
#else
static inline void reserve_crashkernel(void) {}
#endif /* CONFIG_KEXEC */
#endif /* CONFIG_CRASH_RESERVE*/
void __init hyp_mode_check(void)
{

View File

@ -0,0 +1,10 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/vmcore_info.h>
void arch_crash_save_vmcoreinfo(void)
{
#ifdef CONFIG_ARM_LPAE
VMCOREINFO_CONFIG(ARM_LPAE);
#endif
}

View File

@ -349,12 +349,12 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start)
for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
addr = start + i * PMD_SIZE;
domain = get_domain_name(pmd);
if (pmd_none(*pmd) || pmd_large(*pmd) || !pmd_present(*pmd))
if (pmd_none(*pmd) || pmd_leaf(*pmd) || !pmd_present(*pmd))
note_page(st, addr, 4, pmd_val(*pmd), domain);
else
walk_pte(st, pmd, addr, domain);
if (SECTION_SIZE < PMD_SIZE && pmd_large(pmd[1])) {
if (SECTION_SIZE < PMD_SIZE && pmd_leaf(pmd[1])) {
addr += SECTION_SIZE;
pmd++;
domain = get_domain_name(pmd);

View File

@ -458,7 +458,7 @@ static int __mark_rodata_ro(void *unused)
void mark_rodata_ro(void)
{
stop_machine(__mark_rodata_ro, NULL, NULL);
debug_checkwx();
arm_debug_checkwx();
}
#else

View File

@ -1814,6 +1814,6 @@ void set_ptes(struct mm_struct *mm, unsigned long addr,
if (--nr == 0)
break;
ptep++;
pte_val(pteval) += PAGE_SIZE;
pteval = pte_next_pfn(pteval);
}
}

View File

@ -1519,7 +1519,7 @@ config ARCH_SUPPORTS_CRASH_DUMP
def_bool y
config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
def_bool CRASH_CORE
def_bool CRASH_RESERVE
config TRANS_TABLE
def_bool y
@ -2229,6 +2229,15 @@ config UNWIND_PATCH_PAC_INTO_SCS
select UNWIND_TABLES
select DYNAMIC_SCS
config ARM64_CONTPTE
bool "Contiguous PTE mappings for user memory" if EXPERT
depends on TRANSPARENT_HUGEPAGE
default y
help
When enabled, user mappings are configured using the PTE contiguous
bit, for any mappings that meet the size and alignment requirements.
This reduces TLB pressure and improves performance.
endmenu # "Kernel Features"
menu "Boot options"

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef _ARM64_CRASH_CORE_H
#define _ARM64_CRASH_CORE_H
#ifndef _ARM64_CRASH_RESERVE_H
#define _ARM64_CRASH_RESERVE_H
/* Current arm64 boot protocol requires 2MB alignment */
#define CRASH_ALIGN SZ_2M

View File

@ -80,7 +80,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
}
}
#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HIBERNATION)
#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_HIBERNATION)
extern bool crash_is_nosave(unsigned long pfn);
extern void crash_prepare_suspend(void);
extern void crash_post_resume(void);

View File

@ -98,7 +98,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
__pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define pte_none(pte) (!pte_val(pte))
#define pte_clear(mm,addr,ptep) set_pte(ptep, __pte(0))
#define __pte_clear(mm, addr, ptep) \
__set_pte(ptep, __pte(0))
#define pte_page(pte) (pfn_to_page(pte_pfn(pte)))
/*
@ -137,12 +138,16 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
*/
#define pte_valid_not_user(pte) \
((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
/*
* Returns true if the pte is valid and has the contiguous bit set.
*/
#define pte_valid_cont(pte) (pte_valid(pte) && pte_cont(pte))
/*
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
* so that we don't erroneously return false for pages that have been
* remapped as PROT_NONE but are yet to be flushed from the TLB.
* Note that we can't make any assumptions based on the state of the access
* flag, since ptep_clear_flush_young() elides a DSB when invalidating the
* flag, since __ptep_clear_flush_young() elides a DSB when invalidating the
* TLB.
*/
#define pte_accessible(mm, pte) \
@ -266,7 +271,7 @@ static inline pte_t pte_mkdevmap(pte_t pte)
return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL));
}
static inline void set_pte(pte_t *ptep, pte_t pte)
static inline void __set_pte(pte_t *ptep, pte_t pte)
{
WRITE_ONCE(*ptep, pte);
@ -280,6 +285,11 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
}
}
static inline pte_t __ptep_get(pte_t *ptep)
{
return READ_ONCE(*ptep);
}
extern void __sync_icache_dcache(pte_t pteval);
bool pgattr_change_is_safe(u64 old, u64 new);
@ -307,7 +317,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
if (!IS_ENABLED(CONFIG_DEBUG_VM))
return;
old_pte = READ_ONCE(*ptep);
old_pte = __ptep_get(ptep);
if (!pte_valid(old_pte) || !pte_valid(pte))
return;
@ -316,7 +326,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep,
/*
* Check for potential race with hardware updates of the pte
* (ptep_set_access_flags safely changes valid ptes without going
* (__ptep_set_access_flags safely changes valid ptes without going
* through an invalid entry).
*/
VM_WARN_ONCE(!pte_young(pte),
@ -346,23 +356,38 @@ static inline void __sync_cache_and_tags(pte_t pte, unsigned int nr_pages)
mte_sync_tags(pte, nr_pages);
}
static inline void set_ptes(struct mm_struct *mm,
unsigned long __always_unused addr,
pte_t *ptep, pte_t pte, unsigned int nr)
/*
* Select all bits except the pfn
*/
static inline pgprot_t pte_pgprot(pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
}
#define pte_advance_pfn pte_advance_pfn
static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr)
{
return pfn_pte(pte_pfn(pte) + nr, pte_pgprot(pte));
}
static inline void __set_ptes(struct mm_struct *mm,
unsigned long __always_unused addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
page_table_check_ptes_set(mm, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
for (;;) {
__check_safe_pte_update(mm, ptep, pte);
set_pte(ptep, pte);
__set_pte(ptep, pte);
if (--nr == 0)
break;
ptep++;
pte_val(pte) += PAGE_SIZE;
pte = pte_advance_pfn(pte, 1);
}
}
#define set_ptes set_ptes
/*
* Huge pte definitions.
@ -438,16 +463,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
}
/*
* Select all bits except the pfn
*/
static inline pgprot_t pte_pgprot(pte_t pte)
{
unsigned long pfn = pte_pfn(pte);
return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte));
}
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/linux/pgtable.h
@ -539,7 +554,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
{
__sync_cache_and_tags(pte, nr);
__check_safe_pte_update(mm, ptep, pte);
set_pte(ptep, pte);
__set_pte(ptep, pte);
}
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
@ -1033,8 +1048,7 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
}
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
extern int ptep_set_access_flags(struct vm_area_struct *vma,
extern int __ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty);
@ -1044,7 +1058,8 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp,
pmd_t entry, int dirty)
{
return ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty);
return __ptep_set_access_flags(vma, address, (pte_t *)pmdp,
pmd_pte(entry), dirty);
}
static inline int pud_devmap(pud_t pud)
@ -1078,12 +1093,13 @@ static inline bool pud_user_accessible_page(pud_t pud)
/*
* Atomic pte/pmd modifications.
*/
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int __ptep_test_and_clear_young(pte_t *ptep)
static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address,
pte_t *ptep)
{
pte_t old_pte, pte;
pte = READ_ONCE(*ptep);
pte = __ptep_get(ptep);
do {
old_pte = pte;
pte = pte_mkold(pte);
@ -1094,18 +1110,10 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep)
return pte_young(pte);
}
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address,
pte_t *ptep)
{
return __ptep_test_and_clear_young(ptep);
}
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
static inline int __ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
int young = ptep_test_and_clear_young(vma, address, ptep);
int young = __ptep_test_and_clear_young(vma, address, ptep);
if (young) {
/*
@ -1128,12 +1136,11 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address,
pmd_t *pmdp)
{
return ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0));
@ -1143,6 +1150,37 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
return pte;
}
static inline void __clear_full_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr, int full)
{
for (;;) {
__ptep_get_and_clear(mm, addr, ptep);
if (--nr == 0)
break;
ptep++;
addr += PAGE_SIZE;
}
}
static inline pte_t __get_and_clear_full_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
unsigned int nr, int full)
{
pte_t pte, tmp_pte;
pte = __ptep_get_and_clear(mm, addr, ptep);
while (--nr) {
ptep++;
addr += PAGE_SIZE;
tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
if (pte_dirty(tmp_pte))
pte = pte_mkdirty(pte);
if (pte_young(tmp_pte))
pte = pte_mkyoung(pte);
}
return pte;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
@ -1156,16 +1194,12 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/*
* ptep_set_wrprotect - mark read-only while trasferring potential hardware
* dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
*/
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep)
static inline void ___ptep_set_wrprotect(struct mm_struct *mm,
unsigned long address, pte_t *ptep,
pte_t pte)
{
pte_t old_pte, pte;
pte_t old_pte;
pte = READ_ONCE(*ptep);
do {
old_pte = pte;
pte = pte_wrprotect(pte);
@ -1174,12 +1208,31 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres
} while (pte_val(pte) != pte_val(old_pte));
}
/*
* __ptep_set_wrprotect - mark read-only while trasferring potential hardware
* dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
*/
static inline void __ptep_set_wrprotect(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
___ptep_set_wrprotect(mm, address, ptep, __ptep_get(ptep));
}
static inline void __wrprotect_ptes(struct mm_struct *mm, unsigned long address,
pte_t *ptep, unsigned int nr)
{
unsigned int i;
for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++)
__ptep_set_wrprotect(mm, address, ptep);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define __HAVE_ARCH_PMDP_SET_WRPROTECT
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
unsigned long address, pmd_t *pmdp)
{
ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
__ptep_set_wrprotect(mm, address, (pte_t *)pmdp);
}
#define pmdp_establish pmdp_establish
@ -1257,7 +1310,7 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
#endif /* CONFIG_ARM64_MTE */
/*
* On AArch64, the cache coherency is handled via the set_pte_at() function.
* On AArch64, the cache coherency is handled via the __set_ptes() function.
*/
static inline void update_mmu_cache_range(struct vm_fault *vmf,
struct vm_area_struct *vma, unsigned long addr, pte_t *ptep,
@ -1309,6 +1362,282 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma,
extern void ptep_modify_prot_commit(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t old_pte, pte_t new_pte);
#ifdef CONFIG_ARM64_CONTPTE
/*
* The contpte APIs are used to transparently manage the contiguous bit in ptes
* where it is possible and makes sense to do so. The PTE_CONT bit is considered
* a private implementation detail of the public ptep API (see below).
*/
extern void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
extern void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte);
extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep);
extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr);
extern void contpte_clear_full_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr, int full);
extern pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
unsigned int nr, int full);
extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);
extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);
extern void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr);
extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t entry, int dirty);
static __always_inline void contpte_try_fold(struct mm_struct *mm,
unsigned long addr, pte_t *ptep, pte_t pte)
{
/*
* Only bother trying if both the virtual and physical addresses are
* aligned and correspond to the last entry in a contig range. The core
* code mostly modifies ranges from low to high, so this is the likely
* the last modification in the contig range, so a good time to fold.
* We can't fold special mappings, because there is no associated folio.
*/
const unsigned long contmask = CONT_PTES - 1;
bool valign = ((addr >> PAGE_SHIFT) & contmask) == contmask;
if (unlikely(valign)) {
bool palign = (pte_pfn(pte) & contmask) == contmask;
if (unlikely(palign &&
pte_valid(pte) && !pte_cont(pte) && !pte_special(pte)))
__contpte_try_fold(mm, addr, ptep, pte);
}
}
static __always_inline void contpte_try_unfold(struct mm_struct *mm,
unsigned long addr, pte_t *ptep, pte_t pte)
{
if (unlikely(pte_valid_cont(pte)))
__contpte_try_unfold(mm, addr, ptep, pte);
}
#define pte_batch_hint pte_batch_hint
static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte)
{
if (!pte_valid_cont(pte))
return 1;
return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1));
}
/*
* The below functions constitute the public API that arm64 presents to the
* core-mm to manipulate PTE entries within their page tables (or at least this
* is the subset of the API that arm64 needs to implement). These public
* versions will automatically and transparently apply the contiguous bit where
* it makes sense to do so. Therefore any users that are contig-aware (e.g.
* hugetlb, kernel mapper) should NOT use these APIs, but instead use the
* private versions, which are prefixed with double underscore. All of these
* APIs except for ptep_get_lockless() are expected to be called with the PTL
* held. Although the contiguous bit is considered private to the
* implementation, it is deliberately allowed to leak through the getters (e.g.
* ptep_get()), back to core code. This is required so that pte_leaf_size() can
* provide an accurate size for perf_get_pgtable_size(). But this leakage means
* its possible a pte will be passed to a setter with the contiguous bit set, so
* we explicitly clear the contiguous bit in those cases to prevent accidentally
* setting it in the pgtable.
*/
#define ptep_get ptep_get
static inline pte_t ptep_get(pte_t *ptep)
{
pte_t pte = __ptep_get(ptep);
if (likely(!pte_valid_cont(pte)))
return pte;
return contpte_ptep_get(ptep, pte);
}
#define ptep_get_lockless ptep_get_lockless
static inline pte_t ptep_get_lockless(pte_t *ptep)
{
pte_t pte = __ptep_get(ptep);
if (likely(!pte_valid_cont(pte)))
return pte;
return contpte_ptep_get_lockless(ptep);
}
static inline void set_pte(pte_t *ptep, pte_t pte)
{
/*
* We don't have the mm or vaddr so cannot unfold contig entries (since
* it requires tlb maintenance). set_pte() is not used in core code, so
* this should never even be called. Regardless do our best to service
* any call and emit a warning if there is any attempt to set a pte on
* top of an existing contig range.
*/
pte_t orig_pte = __ptep_get(ptep);
WARN_ON_ONCE(pte_valid_cont(orig_pte));
__set_pte(ptep, pte_mknoncont(pte));
}
#define set_ptes set_ptes
static __always_inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
pte = pte_mknoncont(pte);
if (likely(nr == 1)) {
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
__set_ptes(mm, addr, ptep, pte, 1);
contpte_try_fold(mm, addr, ptep, pte);
} else {
contpte_set_ptes(mm, addr, ptep, pte, nr);
}
}
static inline void pte_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
__pte_clear(mm, addr, ptep);
}
#define clear_full_ptes clear_full_ptes
static inline void clear_full_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr, int full)
{
if (likely(nr == 1)) {
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
__clear_full_ptes(mm, addr, ptep, nr, full);
} else {
contpte_clear_full_ptes(mm, addr, ptep, nr, full);
}
}
#define get_and_clear_full_ptes get_and_clear_full_ptes
static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
unsigned int nr, int full)
{
pte_t pte;
if (likely(nr == 1)) {
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
pte = __get_and_clear_full_ptes(mm, addr, ptep, nr, full);
} else {
pte = contpte_get_and_clear_full_ptes(mm, addr, ptep, nr, full);
}
return pte;
}
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
return __ptep_get_and_clear(mm, addr, ptep);
}
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
pte_t orig_pte = __ptep_get(ptep);
if (likely(!pte_valid_cont(orig_pte)))
return __ptep_test_and_clear_young(vma, addr, ptep);
return contpte_ptep_test_and_clear_young(vma, addr, ptep);
}
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
pte_t orig_pte = __ptep_get(ptep);
if (likely(!pte_valid_cont(orig_pte)))
return __ptep_clear_flush_young(vma, addr, ptep);
return contpte_ptep_clear_flush_young(vma, addr, ptep);
}
#define wrprotect_ptes wrprotect_ptes
static __always_inline void wrprotect_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep, unsigned int nr)
{
if (likely(nr == 1)) {
/*
* Optimization: wrprotect_ptes() can only be called for present
* ptes so we only need to check contig bit as condition for
* unfold, and we can remove the contig bit from the pte we read
* to avoid re-reading. This speeds up fork() which is sensitive
* for order-0 folios. Equivalent to contpte_try_unfold().
*/
pte_t orig_pte = __ptep_get(ptep);
if (unlikely(pte_cont(orig_pte))) {
__contpte_try_unfold(mm, addr, ptep, orig_pte);
orig_pte = pte_mknoncont(orig_pte);
}
___ptep_set_wrprotect(mm, addr, ptep, orig_pte);
} else {
contpte_wrprotect_ptes(mm, addr, ptep, nr);
}
}
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
wrprotect_ptes(mm, addr, ptep, 1);
}
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
static inline int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t entry, int dirty)
{
pte_t orig_pte = __ptep_get(ptep);
entry = pte_mknoncont(entry);
if (likely(!pte_valid_cont(orig_pte)))
return __ptep_set_access_flags(vma, addr, ptep, entry, dirty);
return contpte_ptep_set_access_flags(vma, addr, ptep, entry, dirty);
}
#else /* CONFIG_ARM64_CONTPTE */
#define ptep_get __ptep_get
#define set_pte __set_pte
#define set_ptes __set_ptes
#define pte_clear __pte_clear
#define clear_full_ptes __clear_full_ptes
#define get_and_clear_full_ptes __get_and_clear_full_ptes
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
#define ptep_get_and_clear __ptep_get_and_clear
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
#define ptep_test_and_clear_young __ptep_test_and_clear_young
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
#define ptep_clear_flush_young __ptep_clear_flush_young
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define ptep_set_wrprotect __ptep_set_wrprotect
#define wrprotect_ptes __wrprotect_ptes
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
#define ptep_set_access_flags __ptep_set_access_flags
#endif /* CONFIG_ARM64_CONTPTE */
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_PGTABLE_H */

View File

@ -29,13 +29,6 @@ void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name);
static inline void ptdump_debugfs_register(struct ptdump_info *info,
const char *name) { }
#endif
void ptdump_check_wx(void);
#endif /* CONFIG_PTDUMP_CORE */
#ifdef CONFIG_DEBUG_WX
#define debug_checkwx() ptdump_check_wx()
#else
#define debug_checkwx() do { } while (0)
#endif
#endif /* __ASM_PTDUMP_H */

View File

@ -422,7 +422,7 @@ do { \
#define __flush_s2_tlb_range_op(op, start, pages, stride, tlb_level) \
__flush_tlb_range_op(op, start, pages, stride, 0, tlb_level, false, kvm_lpa2_is_enabled());
static inline void __flush_tlb_range(struct vm_area_struct *vma,
static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long stride, bool last_level,
int tlb_level)
@ -456,10 +456,19 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
__flush_tlb_range_op(vae1is, start, pages, stride, asid,
tlb_level, true, lpa2_is_enabled());
dsb(ish);
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end);
}
static inline void __flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long stride, bool last_level,
int tlb_level)
{
__flush_tlb_range_nosync(vma, start, end, stride,
last_level, tlb_level);
dsb(ish);
}
static inline void flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{

View File

@ -65,7 +65,7 @@ obj-$(CONFIG_KEXEC_FILE) += machine_kexec_file.o kexec_image.o
obj-$(CONFIG_ARM64_RELOC_TEST) += arm64-reloc-test.o
arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
obj-$(CONFIG_ARM64_MTE) += mte.o

View File

@ -103,7 +103,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data)
{
struct set_perm_data *spd = data;
const efi_memory_desc_t *md = spd->md;
pte_t pte = READ_ONCE(*ptep);
pte_t pte = __ptep_get(ptep);
if (md->attribute & EFI_MEMORY_RO)
pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
@ -111,7 +111,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data)
pte = set_pte_bit(pte, __pgprot(PTE_PXN));
else if (system_supports_bti_kernel() && spd->has_bti)
pte = set_pte_bit(pte, __pgprot(PTE_GP));
set_pte(ptep, pte);
__set_pte(ptep, pte);
return 0;
}

View File

@ -255,7 +255,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
pr_info("Starting crashdump kernel...\n");
}
#ifdef CONFIG_HIBERNATION
#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_HIBERNATION)
/*
* To preserve the crash dump kernel image, the relevant memory segments
* should be mapped again around the hibernation.

View File

@ -39,6 +39,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
return kexec_image_post_load_cleanup_default(image);
}
#ifdef CONFIG_CRASH_DUMP
static int prepare_elf_headers(void **addr, unsigned long *sz)
{
struct crash_mem *cmem;
@ -80,6 +81,7 @@ out:
kfree(cmem);
return ret;
}
#endif
/*
* Tries to add the initrd and DTB to the image. If it is not possible to find
@ -93,8 +95,8 @@ int load_other_segments(struct kimage *image,
char *cmdline)
{
struct kexec_buf kbuf;
void *headers, *dtb = NULL;
unsigned long headers_sz, initrd_load_addr = 0, dtb_len,
void *dtb = NULL;
unsigned long initrd_load_addr = 0, dtb_len,
orig_segments = image->nr_segments;
int ret = 0;
@ -102,7 +104,10 @@ int load_other_segments(struct kimage *image,
/* not allocate anything below the kernel */
kbuf.buf_min = kernel_load_addr + kernel_size;
#ifdef CONFIG_CRASH_DUMP
/* load elf core header */
void *headers;
unsigned long headers_sz;
if (image->type == KEXEC_TYPE_CRASH) {
ret = prepare_elf_headers(&headers, &headers_sz);
if (ret) {
@ -130,6 +135,7 @@ int load_other_segments(struct kimage *image,
kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
}
#endif
/* load initrd */
if (initrd) {

View File

@ -67,7 +67,7 @@ int memcmp_pages(struct page *page1, struct page *page2)
/*
* If the page content is identical but at least one of the pages is
* tagged, return non-zero to avoid KSM merging. If only one of the
* pages is tagged, set_pte_at() may zero or change the tags of the
* pages is tagged, __set_ptes() may zero or change the tags of the
* other page via mte_sync_tags().
*/
if (page_mte_tagged(page1) || page_mte_tagged(page2))

View File

@ -4,7 +4,7 @@
* Copyright (C) Huawei Futurewei Technologies.
*/
#include <linux/crash_core.h>
#include <linux/vmcore_info.h>
#include <asm/cpufeature.h>
#include <asm/memory.h>
#include <asm/pgtable-hwdef.h>
@ -23,7 +23,6 @@ void arch_crash_save_vmcoreinfo(void)
/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR);
vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END);
vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END);
vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START);
vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END);

View File

@ -1072,7 +1072,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
} else {
/*
* Only locking to serialise with a concurrent
* set_pte_at() in the VMM but still overriding the
* __set_ptes() in the VMM but still overriding the
* tags, hence ignoring the return value.
*/
try_page_mte_tagging(page);

View File

@ -3,6 +3,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \
cache.o copypage.o flush.o \
ioremap.o mmap.o pgd.o mmu.o \
context.o proc.o pageattr.o fixmap.o
obj-$(CONFIG_ARM64_CONTPTE) += contpte.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o

408
arch/arm64/mm/contpte.c Normal file
View File

@ -0,0 +1,408 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2023 ARM Ltd.
*/
#include <linux/mm.h>
#include <linux/efi.h>
#include <linux/export.h>
#include <asm/tlbflush.h>
static inline bool mm_is_user(struct mm_struct *mm)
{
/*
* Don't attempt to apply the contig bit to kernel mappings, because
* dynamically adding/removing the contig bit can cause page faults.
* These racing faults are ok for user space, since they get serialized
* on the PTL. But kernel mappings can't tolerate faults.
*/
if (unlikely(mm_is_efi(mm)))
return false;
return mm != &init_mm;
}
static inline pte_t *contpte_align_down(pte_t *ptep)
{
return PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * CONT_PTES);
}
static void contpte_try_unfold_partial(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr)
{
/*
* Unfold any partially covered contpte block at the beginning and end
* of the range.
*/
if (ptep != contpte_align_down(ptep) || nr < CONT_PTES)
contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep));
if (ptep + nr != contpte_align_down(ptep + nr)) {
unsigned long last_addr = addr + PAGE_SIZE * (nr - 1);
pte_t *last_ptep = ptep + nr - 1;
contpte_try_unfold(mm, last_addr, last_ptep,
__ptep_get(last_ptep));
}
}
static void contpte_convert(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
unsigned long start_addr;
pte_t *start_ptep;
int i;
start_ptep = ptep = contpte_align_down(ptep);
start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
pte = pfn_pte(ALIGN_DOWN(pte_pfn(pte), CONT_PTES), pte_pgprot(pte));
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) {
pte_t ptent = __ptep_get_and_clear(mm, addr, ptep);
if (pte_dirty(ptent))
pte = pte_mkdirty(pte);
if (pte_young(ptent))
pte = pte_mkyoung(pte);
}
__flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3);
__set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES);
}
void __contpte_try_fold(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
/*
* We have already checked that the virtual and pysical addresses are
* correctly aligned for a contpte mapping in contpte_try_fold() so the
* remaining checks are to ensure that the contpte range is fully
* covered by a single folio, and ensure that all the ptes are valid
* with contiguous PFNs and matching prots. We ignore the state of the
* access and dirty bits for the purpose of deciding if its a contiguous
* range; the folding process will generate a single contpte entry which
* has a single access and dirty bit. Those 2 bits are the logical OR of
* their respective bits in the constituent pte entries. In order to
* ensure the contpte range is covered by a single folio, we must
* recover the folio from the pfn, but special mappings don't have a
* folio backing them. Fortunately contpte_try_fold() already checked
* that the pte is not special - we never try to fold special mappings.
* Note we can't use vm_normal_page() for this since we don't have the
* vma.
*/
unsigned long folio_start, folio_end;
unsigned long cont_start, cont_end;
pte_t expected_pte, subpte;
struct folio *folio;
struct page *page;
unsigned long pfn;
pte_t *orig_ptep;
pgprot_t prot;
int i;
if (!mm_is_user(mm))
return;
page = pte_page(pte);
folio = page_folio(page);
folio_start = addr - (page - &folio->page) * PAGE_SIZE;
folio_end = folio_start + folio_nr_pages(folio) * PAGE_SIZE;
cont_start = ALIGN_DOWN(addr, CONT_PTE_SIZE);
cont_end = cont_start + CONT_PTE_SIZE;
if (folio_start > cont_start || folio_end < cont_end)
return;
pfn = ALIGN_DOWN(pte_pfn(pte), CONT_PTES);
prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));
expected_pte = pfn_pte(pfn, prot);
orig_ptep = ptep;
ptep = contpte_align_down(ptep);
for (i = 0; i < CONT_PTES; i++) {
subpte = pte_mkold(pte_mkclean(__ptep_get(ptep)));
if (!pte_same(subpte, expected_pte))
return;
expected_pte = pte_advance_pfn(expected_pte, 1);
ptep++;
}
pte = pte_mkcont(pte);
contpte_convert(mm, addr, orig_ptep, pte);
}
EXPORT_SYMBOL_GPL(__contpte_try_fold);
void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
/*
* We have already checked that the ptes are contiguous in
* contpte_try_unfold(), so just check that the mm is user space.
*/
if (!mm_is_user(mm))
return;
pte = pte_mknoncont(pte);
contpte_convert(mm, addr, ptep, pte);
}
EXPORT_SYMBOL_GPL(__contpte_try_unfold);
pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
{
/*
* Gather access/dirty bits, which may be populated in any of the ptes
* of the contig range. We are guaranteed to be holding the PTL, so any
* contiguous range cannot be unfolded or otherwise modified under our
* feet.
*/
pte_t pte;
int i;
ptep = contpte_align_down(ptep);
for (i = 0; i < CONT_PTES; i++, ptep++) {
pte = __ptep_get(ptep);
if (pte_dirty(pte))
orig_pte = pte_mkdirty(orig_pte);
if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
}
return orig_pte;
}
EXPORT_SYMBOL_GPL(contpte_ptep_get);
pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
{
/*
* The ptep_get_lockless() API requires us to read and return *orig_ptep
* so that it is self-consistent, without the PTL held, so we may be
* racing with other threads modifying the pte. Usually a READ_ONCE()
* would suffice, but for the contpte case, we also need to gather the
* access and dirty bits from across all ptes in the contiguous block,
* and we can't read all of those neighbouring ptes atomically, so any
* contiguous range may be unfolded/modified/refolded under our feet.
* Therefore we ensure we read a _consistent_ contpte range by checking
* that all ptes in the range are valid and have CONT_PTE set, that all
* pfns are contiguous and that all pgprots are the same (ignoring
* access/dirty). If we find a pte that is not consistent, then we must
* be racing with an update so start again. If the target pte does not
* have CONT_PTE set then that is considered consistent on its own
* because it is not part of a contpte range.
*/
pgprot_t orig_prot;
unsigned long pfn;
pte_t orig_pte;
pgprot_t prot;
pte_t *ptep;
pte_t pte;
int i;
retry:
orig_pte = __ptep_get(orig_ptep);
if (!pte_valid_cont(orig_pte))
return orig_pte;
orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte)));
ptep = contpte_align_down(orig_ptep);
pfn = pte_pfn(orig_pte) - (orig_ptep - ptep);
for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) {
pte = __ptep_get(ptep);
prot = pte_pgprot(pte_mkold(pte_mkclean(pte)));
if (!pte_valid_cont(pte) ||
pte_pfn(pte) != pfn ||
pgprot_val(prot) != pgprot_val(orig_prot))
goto retry;
if (pte_dirty(pte))
orig_pte = pte_mkdirty(orig_pte);
if (pte_young(pte))
orig_pte = pte_mkyoung(orig_pte);
}
return orig_pte;
}
EXPORT_SYMBOL_GPL(contpte_ptep_get_lockless);
void contpte_set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{
unsigned long next;
unsigned long end;
unsigned long pfn;
pgprot_t prot;
/*
* The set_ptes() spec guarantees that when nr > 1, the initial state of
* all ptes is not-present. Therefore we never need to unfold or
* otherwise invalidate a range before we set the new ptes.
* contpte_set_ptes() should never be called for nr < 2.
*/
VM_WARN_ON(nr == 1);
if (!mm_is_user(mm))
return __set_ptes(mm, addr, ptep, pte, nr);
end = addr + (nr << PAGE_SHIFT);
pfn = pte_pfn(pte);
prot = pte_pgprot(pte);
do {
next = pte_cont_addr_end(addr, end);
nr = (next - addr) >> PAGE_SHIFT;
pte = pfn_pte(pfn, prot);
if (((addr | next | (pfn << PAGE_SHIFT)) & ~CONT_PTE_MASK) == 0)
pte = pte_mkcont(pte);
else
pte = pte_mknoncont(pte);
__set_ptes(mm, addr, ptep, pte, nr);
addr = next;
ptep += nr;
pfn += nr;
} while (addr != end);
}
EXPORT_SYMBOL_GPL(contpte_set_ptes);
void contpte_clear_full_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr, int full)
{
contpte_try_unfold_partial(mm, addr, ptep, nr);
__clear_full_ptes(mm, addr, ptep, nr, full);
}
EXPORT_SYMBOL_GPL(contpte_clear_full_ptes);
pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
unsigned int nr, int full)
{
contpte_try_unfold_partial(mm, addr, ptep, nr);
return __get_and_clear_full_ptes(mm, addr, ptep, nr, full);
}
EXPORT_SYMBOL_GPL(contpte_get_and_clear_full_ptes);
int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
/*
* ptep_clear_flush_young() technically requires us to clear the access
* flag for a _single_ pte. However, the core-mm code actually tracks
* access/dirty per folio, not per page. And since we only create a
* contig range when the range is covered by a single folio, we can get
* away with clearing young for the whole contig range here, so we avoid
* having to unfold.
*/
int young = 0;
int i;
ptep = contpte_align_down(ptep);
addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
young |= __ptep_test_and_clear_young(vma, addr, ptep);
return young;
}
EXPORT_SYMBOL_GPL(contpte_ptep_test_and_clear_young);
int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
int young;
young = contpte_ptep_test_and_clear_young(vma, addr, ptep);
if (young) {
/*
* See comment in __ptep_clear_flush_young(); same rationale for
* eliding the trailing DSB applies here.
*/
addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
__flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE,
PAGE_SIZE, true, 3);
}
return young;
}
EXPORT_SYMBOL_GPL(contpte_ptep_clear_flush_young);
void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr)
{
/*
* If wrprotecting an entire contig range, we can avoid unfolding. Just
* set wrprotect and wait for the later mmu_gather flush to invalidate
* the tlb. Until the flush, the page may or may not be wrprotected.
* After the flush, it is guaranteed wrprotected. If it's a partial
* range though, we must unfold, because we can't have a case where
* CONT_PTE is set but wrprotect applies to a subset of the PTEs; this
* would cause it to continue to be unpredictable after the flush.
*/
contpte_try_unfold_partial(mm, addr, ptep, nr);
__wrprotect_ptes(mm, addr, ptep, nr);
}
EXPORT_SYMBOL_GPL(contpte_wrprotect_ptes);
int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t entry, int dirty)
{
unsigned long start_addr;
pte_t orig_pte;
int i;
/*
* Gather the access/dirty bits for the contiguous range. If nothing has
* changed, its a noop.
*/
orig_pte = pte_mknoncont(ptep_get(ptep));
if (pte_val(orig_pte) == pte_val(entry))
return 0;
/*
* We can fix up access/dirty bits without having to unfold the contig
* range. But if the write bit is changing, we must unfold.
*/
if (pte_write(orig_pte) == pte_write(entry)) {
/*
* For HW access management, we technically only need to update
* the flag on a single pte in the range. But for SW access
* management, we need to update all the ptes to prevent extra
* faults. Avoid per-page tlb flush in __ptep_set_access_flags()
* and instead flush the whole range at the end.
*/
ptep = contpte_align_down(ptep);
start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
__ptep_set_access_flags(vma, addr, ptep, entry, 0);
if (dirty)
__flush_tlb_range(vma, start_addr, addr,
PAGE_SIZE, true, 3);
} else {
__contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte);
__ptep_set_access_flags(vma, addr, ptep, entry, dirty);
}
return 1;
}
EXPORT_SYMBOL_GPL(contpte_ptep_set_access_flags);

View File

@ -191,7 +191,7 @@ static void show_pte(unsigned long addr)
if (!ptep)
break;
pte = READ_ONCE(*ptep);
pte = __ptep_get(ptep);
pr_cont(", pte=%016llx", pte_val(pte));
pte_unmap(ptep);
} while(0);
@ -205,16 +205,16 @@ static void show_pte(unsigned long addr)
*
* It needs to cope with hardware update of the accessed/dirty state by other
* agents in the system and can safely skip the __sync_icache_dcache() call as,
* like set_pte_at(), the PTE is never changed from no-exec to exec here.
* like __set_ptes(), the PTE is never changed from no-exec to exec here.
*
* Returns whether or not the PTE actually changed.
*/
int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty)
int __ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty)
{
pteval_t old_pteval, pteval;
pte_t pte = READ_ONCE(*ptep);
pte_t pte = __ptep_get(ptep);
if (pte_same(pte, entry))
return 0;

View File

@ -124,9 +124,9 @@ void __set_fixmap(enum fixed_addresses idx,
ptep = fixmap_pte(addr);
if (pgprot_val(flags)) {
set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags));
__set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags));
} else {
pte_clear(&init_mm, addr, ptep);
__pte_clear(&init_mm, addr, ptep);
flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
}
}

View File

@ -45,13 +45,6 @@ void __init arm64_hugetlb_cma_reserve(void)
else
order = CONT_PMD_SHIFT - PAGE_SHIFT;
/*
* HugeTLB CMA reservation is required for gigantic
* huge pages which could not be allocated via the
* page allocator. Just warn if there is any change
* breaking this assumption.
*/
WARN_ON(order <= MAX_PAGE_ORDER);
hugetlb_cma_reserve(order);
}
#endif /* CONFIG_CMA */
@ -152,14 +145,14 @@ pte_t huge_ptep_get(pte_t *ptep)
{
int ncontig, i;
size_t pgsize;
pte_t orig_pte = ptep_get(ptep);
pte_t orig_pte = __ptep_get(ptep);
if (!pte_present(orig_pte) || !pte_cont(orig_pte))
return orig_pte;
ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize);
for (i = 0; i < ncontig; i++, ptep++) {
pte_t pte = ptep_get(ptep);
pte_t pte = __ptep_get(ptep);
if (pte_dirty(pte))
orig_pte = pte_mkdirty(orig_pte);
@ -184,11 +177,11 @@ static pte_t get_clear_contig(struct mm_struct *mm,
unsigned long pgsize,
unsigned long ncontig)
{
pte_t orig_pte = ptep_get(ptep);
pte_t orig_pte = __ptep_get(ptep);
unsigned long i;
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
pte_t pte = ptep_get_and_clear(mm, addr, ptep);
pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
/*
* If HW_AFDBM is enabled, then the HW could turn on
@ -236,7 +229,7 @@ static void clear_flush(struct mm_struct *mm,
unsigned long i, saddr = addr;
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
ptep_clear(mm, addr, ptep);
__ptep_get_and_clear(mm, addr, ptep);
flush_tlb_range(&vma, saddr, addr);
}
@ -254,12 +247,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
if (!pte_present(pte)) {
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
set_pte_at(mm, addr, ptep, pte);
__set_ptes(mm, addr, ptep, pte, 1);
return;
}
if (!pte_cont(pte)) {
set_pte_at(mm, addr, ptep, pte);
__set_ptes(mm, addr, ptep, pte, 1);
return;
}
@ -270,7 +263,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
clear_flush(mm, addr, ptep, pgsize, ncontig);
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
}
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
@ -400,7 +393,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
ncontig = num_contig_ptes(sz, &pgsize);
for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
pte_clear(mm, addr, ptep);
__pte_clear(mm, addr, ptep);
}
pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
@ -408,10 +401,10 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
{
int ncontig;
size_t pgsize;
pte_t orig_pte = ptep_get(ptep);
pte_t orig_pte = __ptep_get(ptep);
if (!pte_cont(orig_pte))
return ptep_get_and_clear(mm, addr, ptep);
return __ptep_get_and_clear(mm, addr, ptep);
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
@ -431,11 +424,11 @@ static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig)
{
int i;
if (pte_write(pte) != pte_write(ptep_get(ptep)))
if (pte_write(pte) != pte_write(__ptep_get(ptep)))
return 1;
for (i = 0; i < ncontig; i++) {
pte_t orig_pte = ptep_get(ptep + i);
pte_t orig_pte = __ptep_get(ptep + i);
if (pte_dirty(pte) != pte_dirty(orig_pte))
return 1;
@ -459,7 +452,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
pte_t orig_pte;
if (!pte_cont(pte))
return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
return __ptep_set_access_flags(vma, addr, ptep, pte, dirty);
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
dpfn = pgsize >> PAGE_SHIFT;
@ -478,7 +471,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
hugeprot = pte_pgprot(pte);
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
return 1;
}
@ -492,8 +485,8 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
size_t pgsize;
pte_t pte;
if (!pte_cont(READ_ONCE(*ptep))) {
ptep_set_wrprotect(mm, addr, ptep);
if (!pte_cont(__ptep_get(ptep))) {
__ptep_set_wrprotect(mm, addr, ptep);
return;
}
@ -507,7 +500,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
pfn = pte_pfn(pte);
for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn)
set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot));
__set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1);
}
pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
@ -517,7 +510,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
size_t pgsize;
int ncontig;
if (!pte_cont(READ_ONCE(*ptep)))
if (!pte_cont(__ptep_get(ptep)))
return ptep_clear_flush(vma, addr, ptep);
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
@ -550,7 +543,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr
* when the permission changes from executable to non-executable
* in cases where cpu is affected with errata #2645198.
*/
if (pte_user_exec(READ_ONCE(*ptep)))
if (pte_user_exec(__ptep_get(ptep)))
return huge_ptep_clear_flush(vma, addr, ptep);
}
return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep);

View File

@ -100,7 +100,7 @@ static void __init arch_reserve_crashkernel(void)
bool high = false;
int ret;
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),

View File

@ -125,8 +125,8 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr,
if (!early)
memset(__va(page_phys), KASAN_SHADOW_INIT, PAGE_SIZE);
next = addr + PAGE_SIZE;
set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
} while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep)));
__set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL));
} while (ptep++, addr = next, addr != end && pte_none(__ptep_get(ptep)));
}
static void __init kasan_pmd_populate(pud_t *pudp, unsigned long addr,
@ -366,7 +366,7 @@ static void __init kasan_init_shadow(void)
* so we should make sure that it maps the zero page read-only.
*/
for (i = 0; i < PTRS_PER_PTE; i++)
set_pte(&kasan_early_shadow_pte[i],
__set_pte(&kasan_early_shadow_pte[i],
pfn_pte(sym_to_pfn(kasan_early_shadow_page),
PAGE_KERNEL_RO));

View File

@ -179,16 +179,16 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
ptep = pte_set_fixmap_offset(pmdp, addr);
do {
pte_t old_pte = READ_ONCE(*ptep);
pte_t old_pte = __ptep_get(ptep);
set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
__set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot));
/*
* After the PTE entry has been populated once, we
* only allow updates to the permission attributes.
*/
BUG_ON(!pgattr_change_is_safe(pte_val(old_pte),
READ_ONCE(pte_val(*ptep))));
pte_val(__ptep_get(ptep))));
phys += PAGE_SIZE;
} while (ptep++, addr += PAGE_SIZE, addr != end);
@ -682,8 +682,6 @@ void mark_rodata_ro(void)
WRITE_ONCE(rodata_is_rw, false);
update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
section_size, PAGE_KERNEL_RO);
debug_checkwx();
}
static void __init declare_vma(struct vm_struct *vma,
@ -846,12 +844,12 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
do {
ptep = pte_offset_kernel(pmdp, addr);
pte = READ_ONCE(*ptep);
pte = __ptep_get(ptep);
if (pte_none(pte))
continue;
WARN_ON(!pte_present(pte));
pte_clear(&init_mm, addr, ptep);
__pte_clear(&init_mm, addr, ptep);
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
if (free_mapped)
free_hotplug_page_range(pte_page(pte),
@ -979,7 +977,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
do {
ptep = pte_offset_kernel(pmdp, addr);
pte = READ_ONCE(*ptep);
pte = __ptep_get(ptep);
/*
* This is just a sanity check here which verifies that
@ -998,7 +996,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr,
*/
ptep = pte_offset_kernel(pmdp, 0UL);
for (i = 0; i < PTRS_PER_PTE; i++) {
if (!pte_none(READ_ONCE(ptep[i])))
if (!pte_none(__ptep_get(&ptep[i])))
return;
}
@ -1494,7 +1492,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte
* when the permission changes from executable to non-executable
* in cases where cpu is affected with errata #2645198.
*/
if (pte_user_exec(READ_ONCE(*ptep)))
if (pte_user_exec(ptep_get(ptep)))
return ptep_clear_flush(vma, addr, ptep);
}
return ptep_get_and_clear(vma->vm_mm, addr, ptep);

View File

@ -36,12 +36,12 @@ bool can_set_direct_map(void)
static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
{
struct page_change_data *cdata = data;
pte_t pte = READ_ONCE(*ptep);
pte_t pte = __ptep_get(ptep);
pte = clear_pte_bit(pte, cdata->clear_mask);
pte = set_pte_bit(pte, cdata->set_mask);
set_pte(ptep, pte);
__set_pte(ptep, pte);
return 0;
}
@ -245,5 +245,5 @@ bool kernel_page_present(struct page *page)
return true;
ptep = pte_offset_kernel(pmdp, addr);
return pte_valid(READ_ONCE(*ptep));
return pte_valid(__ptep_get(ptep));
}

View File

@ -322,7 +322,7 @@ static struct ptdump_info kernel_ptdump_info __ro_after_init = {
.mm = &init_mm,
};
void ptdump_check_wx(void)
bool ptdump_check_wx(void)
{
struct pg_state st = {
.seq = NULL,
@ -343,11 +343,16 @@ void ptdump_check_wx(void)
ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
if (st.wx_pages || st.uxn_pages)
if (st.wx_pages || st.uxn_pages) {
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
st.wx_pages, st.uxn_pages);
else
return false;
} else {
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
return true;
}
}
static int __init ptdump_init(void)

View File

@ -33,7 +33,7 @@ static void *trans_alloc(struct trans_pgd_info *info)
static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
{
pte_t pte = READ_ONCE(*src_ptep);
pte_t pte = __ptep_get(src_ptep);
if (pte_valid(pte)) {
/*
@ -41,7 +41,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
* read only (code, rodata). Clear the RDONLY bit from
* the temporary mappings we use during restore.
*/
set_pte(dst_ptep, pte_mkwrite_novma(pte));
__set_pte(dst_ptep, pte_mkwrite_novma(pte));
} else if ((debug_pagealloc_enabled() ||
is_kfence_address((void *)addr)) && !pte_none(pte)) {
/*
@ -55,7 +55,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
*/
BUG_ON(!pfn_valid(pte_pfn(pte)));
set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
__set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
}
}

View File

@ -2,6 +2,7 @@
config CSKY
def_bool y
select ARCH_32BIT_OFF_T
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SYNC_DMA_FOR_CPU

View File

@ -0,0 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_CSKY_CACHETYPE_H
#define __ASM_CSKY_CACHETYPE_H
#include <linux/types.h>
#define cpu_dcache_is_aliasing() true
#endif

View File

@ -260,7 +260,7 @@ static void __init arch_reserve_crashkernel(void)
char *cmdline = boot_command_line;
bool high = false;
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),

View File

@ -723,7 +723,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
/*
* Read each entry once. As above, a non-leaf entry can be promoted to
* a huge page _during_ this walk. Re-reading the entry could send the
* walk into the weeks, e.g. p*d_large() returns false (sees the old
* walk into the weeks, e.g. p*d_leaf() returns false (sees the old
* value) and then p*d_offset() walks into the target huge page instead
* of the old page table (sees the new value).
*/

View File

@ -3,6 +3,7 @@ config M68K
bool
default y
select ARCH_32BIT_OFF_T
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_CPU_FINALIZE_INIT if MMU
select ARCH_HAS_CURRENT_STACK_POINTER

View File

@ -0,0 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_M68K_CACHETYPE_H
#define __ASM_M68K_CACHETYPE_H
#include <linux/types.h>
#define cpu_dcache_is_aliasing() true
#endif

View File

@ -4,6 +4,7 @@ config MIPS
default y
select ARCH_32BIT_OFF_T if !64BIT
select ARCH_BINFMT_ELF_STATE if MIPS_FP_SUPPORT
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_CPU_FINALIZE_INIT
select ARCH_HAS_CURRENT_STACK_POINTER if !CC_IS_CLANG || CLANG_VERSION >= 140000
select ARCH_HAS_DEBUG_VIRTUAL if !64BIT

View File

@ -0,0 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_MIPS_CACHETYPE_H
#define __ASM_MIPS_CACHETYPE_H
#include <asm/cpu-features.h>
#define cpu_dcache_is_aliasing() cpu_has_dc_aliases
#endif

View File

@ -442,8 +442,6 @@ static void __init mips_reserve_vmcore(void)
#endif
}
#ifdef CONFIG_KEXEC
/* 64M alignment for crash kernel regions */
#define CRASH_ALIGN SZ_64M
#define CRASH_ADDR_MAX SZ_512M
@ -454,6 +452,9 @@ static void __init mips_parse_crashkernel(void)
unsigned long long crash_size, crash_base;
int ret;
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
total_mem = memblock_phys_mem_size();
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base,
@ -489,6 +490,9 @@ static void __init request_crashkernel(struct resource *res)
{
int ret;
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
if (crashk_res.start == crashk_res.end)
return;
@ -498,15 +502,6 @@ static void __init request_crashkernel(struct resource *res)
(unsigned long)(resource_size(&crashk_res) >> 20),
(unsigned long)(crashk_res.start >> 20));
}
#else /* !defined(CONFIG_KEXEC) */
static void __init mips_parse_crashkernel(void)
{
}
static void __init request_crashkernel(struct resource *res)
{
}
#endif /* !defined(CONFIG_KEXEC) */
static void __init check_kernel_sections_mem(void)
{

View File

@ -2,6 +2,7 @@
config NIOS2
def_bool y
select ARCH_32BIT_OFF_T
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE

View File

@ -0,0 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_NIOS2_CACHETYPE_H
#define __ASM_NIOS2_CACHETYPE_H
#include <asm/page.h>
#include <asm/cache.h>
#define cpu_dcache_is_aliasing() (NIOS2_DCACHE_SIZE > PAGE_SIZE)
#endif

View File

@ -178,6 +178,8 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
*ptep = pteval;
}
#define PFN_PTE_SHIFT 0
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
{

View File

@ -8,6 +8,7 @@ config PARISC
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_SYSCALL_TRACEPOINTS
select ARCH_WANT_FRAME_POINTERS
select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_DMA_ALLOC if PA11
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_STRICT_KERNEL_RWX

View File

@ -0,0 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __ASM_PARISC_CACHETYPE_H
#define __ASM_PARISC_CACHETYPE_H
#include <linux/types.h>
#define cpu_dcache_is_aliasing() true
#endif

View File

@ -608,6 +608,11 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
config ARCH_SUPPORTS_KEXEC
def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
config ARCH_SELECTS_KEXEC
def_bool y
depends on KEXEC
select CRASH_DUMP
config ARCH_SUPPORTS_KEXEC_FILE
def_bool PPC64
@ -618,6 +623,7 @@ config ARCH_SELECTS_KEXEC_FILE
def_bool y
depends on KEXEC_FILE
select KEXEC_ELF
select CRASH_DUMP
select HAVE_IMA_KEXEC if IMA
config PPC64_BIG_ENDIAN_ELF_ABI_V2
@ -690,7 +696,6 @@ config ARCH_SELECTS_CRASH_DUMP
config FA_DUMP
bool "Firmware-assisted dump"
depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
select CRASH_CORE
select CRASH_DUMP
help
A robust mechanism to get reliable kernel crash dump with

View File

@ -1157,20 +1157,6 @@ pud_hugepage_update(struct mm_struct *mm, unsigned long addr, pud_t *pudp,
return pud_val(*pudp);
}
/*
* returns true for pmd migration entries, THP, devmap, hugetlb
* But compile time dependent on THP config
*/
static inline int pmd_large(pmd_t pmd)
{
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
}
static inline int pud_large(pud_t pud)
{
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}
/*
* For radix we should always find H_PAGE_HASHPTE zero. Hence
* the below will work for radix too
@ -1451,18 +1437,16 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va
}
/*
* Like pmd_huge() and pmd_large(), but works regardless of config options
* Like pmd_huge(), but works regardless of config options
*/
#define pmd_is_leaf pmd_is_leaf
#define pmd_leaf pmd_is_leaf
static inline bool pmd_is_leaf(pmd_t pmd)
#define pmd_leaf pmd_leaf
static inline bool pmd_leaf(pmd_t pmd)
{
return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
}
#define pud_is_leaf pud_is_leaf
#define pud_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
#define pud_leaf pud_leaf
static inline bool pud_leaf(pud_t pud)
{
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}

View File

@ -41,6 +41,8 @@ struct mm_struct;
#ifndef __ASSEMBLY__
#define PFN_PTE_SHIFT PTE_RPN_SHIFT
void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
#define set_ptes set_ptes
@ -99,10 +101,6 @@ void poking_init(void);
extern unsigned long ioremap_bot;
extern const pgprot_t protection_map[16];
#ifndef CONFIG_TRANSPARENT_HUGEPAGE
#define pmd_large(pmd) 0
#endif
/* can we use this in kvm */
unsigned long vmalloc_to_phys(void *vmalloc_addr);
@ -180,30 +178,6 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p)
}
#endif
#ifndef pmd_is_leaf
#define pmd_is_leaf pmd_is_leaf
static inline bool pmd_is_leaf(pmd_t pmd)
{
return false;
}
#endif
#ifndef pud_is_leaf
#define pud_is_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
{
return false;
}
#endif
#ifndef p4d_is_leaf
#define p4d_is_leaf p4d_is_leaf
static inline bool p4d_is_leaf(p4d_t p4d)
{
return false;
}
#endif
#define pmd_pgtable pmd_pgtable
static inline pgtable_t pmd_pgtable(pmd_t pmd)
{

View File

@ -19,6 +19,8 @@
#include <linux/pagemap.h>
static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep,
unsigned long address);
#define __tlb_remove_tlb_entry __tlb_remove_tlb_entry
#define tlb_flush tlb_flush

View File

@ -109,7 +109,7 @@ int ppc_do_canonicalize_irqs;
EXPORT_SYMBOL(ppc_do_canonicalize_irqs);
#endif
#ifdef CONFIG_CRASH_CORE
#ifdef CONFIG_VMCORE_INFO
/* This keeps a track of which one is the crashing cpu. */
int crashing_cpu = -1;
#endif

View File

@ -8,6 +8,7 @@ obj-y += core.o crash.o core_$(BITS).o
obj-$(CONFIG_PPC32) += relocate_32.o
obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
# Disable GCOV, KCOV & sanitizers in odd or sensitive code
GCOV_PROFILE_core_$(BITS).o := n

View File

@ -53,34 +53,6 @@ void machine_kexec_cleanup(struct kimage *image)
{
}
void arch_crash_save_vmcoreinfo(void)
{
#ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
#endif
#ifndef CONFIG_NUMA
VMCOREINFO_SYMBOL(contig_page_data);
#endif
#if defined(CONFIG_PPC64) && defined(CONFIG_SPARSEMEM_VMEMMAP)
VMCOREINFO_SYMBOL(vmemmap_list);
VMCOREINFO_SYMBOL(mmu_vmemmap_psize);
VMCOREINFO_SYMBOL(mmu_psize_defs);
VMCOREINFO_STRUCT_SIZE(vmemmap_backing);
VMCOREINFO_OFFSET(vmemmap_backing, list);
VMCOREINFO_OFFSET(vmemmap_backing, phys);
VMCOREINFO_OFFSET(vmemmap_backing, virt_addr);
VMCOREINFO_STRUCT_SIZE(mmu_psize_def);
VMCOREINFO_OFFSET(mmu_psize_def, shift);
#endif
VMCOREINFO_SYMBOL(cur_cpu_spec);
VMCOREINFO_OFFSET(cpu_spec, cpu_features);
VMCOREINFO_OFFSET(cpu_spec, mmu_features);
vmcoreinfo_append_str("NUMBER(RADIX_MMU)=%d\n", early_radix_enabled());
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
}
/*
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.

View File

@ -0,0 +1,32 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/vmcore_info.h>
#include <asm/pgalloc.h>
void arch_crash_save_vmcoreinfo(void)
{
#ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
VMCOREINFO_LENGTH(node_data, MAX_NUMNODES);
#endif
#ifndef CONFIG_NUMA
VMCOREINFO_SYMBOL(contig_page_data);
#endif
#if defined(CONFIG_PPC64) && defined(CONFIG_SPARSEMEM_VMEMMAP)
VMCOREINFO_SYMBOL(vmemmap_list);
VMCOREINFO_SYMBOL(mmu_vmemmap_psize);
VMCOREINFO_SYMBOL(mmu_psize_defs);
VMCOREINFO_STRUCT_SIZE(vmemmap_backing);
VMCOREINFO_OFFSET(vmemmap_backing, list);
VMCOREINFO_OFFSET(vmemmap_backing, phys);
VMCOREINFO_OFFSET(vmemmap_backing, virt_addr);
VMCOREINFO_STRUCT_SIZE(mmu_psize_def);
VMCOREINFO_OFFSET(mmu_psize_def, shift);
#endif
VMCOREINFO_SYMBOL(cur_cpu_spec);
VMCOREINFO_OFFSET(cpu_spec, cpu_features);
VMCOREINFO_OFFSET(cpu_spec, mmu_features);
vmcoreinfo_append_str("NUMBER(RADIX_MMU)=%d\n", early_radix_enabled());
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
}

View File

@ -503,7 +503,7 @@ static void kvmppc_unmap_free_pmd(struct kvm *kvm, pmd_t *pmd, bool full,
for (im = 0; im < PTRS_PER_PMD; ++im, ++p) {
if (!pmd_present(*p))
continue;
if (pmd_is_leaf(*p)) {
if (pmd_leaf(*p)) {
if (full) {
pmd_clear(p);
} else {
@ -532,7 +532,7 @@ static void kvmppc_unmap_free_pud(struct kvm *kvm, pud_t *pud,
for (iu = 0; iu < PTRS_PER_PUD; ++iu, ++p) {
if (!pud_present(*p))
continue;
if (pud_is_leaf(*p)) {
if (pud_leaf(*p)) {
pud_clear(p);
} else {
pmd_t *pmd;
@ -635,12 +635,12 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
new_pud = pud_alloc_one(kvm->mm, gpa);
pmd = NULL;
if (pud && pud_present(*pud) && !pud_is_leaf(*pud))
if (pud && pud_present(*pud) && !pud_leaf(*pud))
pmd = pmd_offset(pud, gpa);
else if (level <= 1)
new_pmd = kvmppc_pmd_alloc();
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_is_leaf(*pmd)))
if (level == 0 && !(pmd && pmd_present(*pmd) && !pmd_leaf(*pmd)))
new_ptep = kvmppc_pte_alloc();
/* Check if we might have been invalidated; let the guest retry if so */
@ -658,7 +658,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
new_pud = NULL;
}
pud = pud_offset(p4d, gpa);
if (pud_is_leaf(*pud)) {
if (pud_leaf(*pud)) {
unsigned long hgpa = gpa & PUD_MASK;
/* Check if we raced and someone else has set the same thing */
@ -709,7 +709,7 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
new_pmd = NULL;
}
pmd = pmd_offset(pud, gpa);
if (pmd_is_leaf(*pmd)) {
if (pmd_leaf(*pmd)) {
unsigned long lgpa = gpa & PMD_MASK;
/* Check if we raced and someone else has set the same thing */

View File

@ -113,7 +113,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
assert_spin_locked(pmd_lockptr(mm, pmdp));
WARN_ON(!(pmd_large(pmd)));
WARN_ON(!(pmd_leaf(pmd)));
#endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
@ -130,7 +130,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(pte_hw_valid(pud_pte(*pudp)));
assert_spin_locked(pud_lockptr(mm, pudp));
WARN_ON(!(pud_large(pud)));
WARN_ON(!(pud_leaf(pud)));
#endif
trace_hugepage_set_pud(addr, pud_val(pud));
return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));

View File

@ -204,14 +204,14 @@ static void radix__change_memory_range(unsigned long start, unsigned long end,
pudp = pud_alloc(&init_mm, p4dp, idx);
if (!pudp)
continue;
if (pud_is_leaf(*pudp)) {
if (pud_leaf(*pudp)) {
ptep = (pte_t *)pudp;
goto update_the_pte;
}
pmdp = pmd_alloc(&init_mm, pudp, idx);
if (!pmdp)
continue;
if (pmd_is_leaf(*pmdp)) {
if (pmd_leaf(*pmdp)) {
ptep = pmdp_ptep(pmdp);
goto update_the_pte;
}
@ -767,7 +767,7 @@ static void __meminit remove_pmd_table(pmd_t *pmd_start, unsigned long addr,
if (!pmd_present(*pmd))
continue;
if (pmd_is_leaf(*pmd)) {
if (pmd_leaf(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
if (!direct)
@ -807,7 +807,7 @@ static void __meminit remove_pud_table(pud_t *pud_start, unsigned long addr,
if (!pud_present(*pud))
continue;
if (pud_is_leaf(*pud)) {
if (pud_leaf(*pud)) {
if (!IS_ALIGNED(addr, PUD_SIZE) ||
!IS_ALIGNED(next, PUD_SIZE)) {
WARN_ONCE(1, "%s: unaligned range\n", __func__);
@ -845,7 +845,7 @@ remove_pagetable(unsigned long start, unsigned long end, bool direct,
if (!p4d_present(*p4d))
continue;
if (p4d_is_leaf(*p4d)) {
if (p4d_leaf(*p4d)) {
if (!IS_ALIGNED(addr, P4D_SIZE) ||
!IS_ALIGNED(next, P4D_SIZE)) {
WARN_ONCE(1, "%s: unaligned range\n", __func__);
@ -924,7 +924,7 @@ bool vmemmap_can_optimize(struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
unsigned long addr, unsigned long next)
{
int large = pmd_large(*pmdp);
int large = pmd_leaf(*pmdp);
if (large)
vmemmap_verify(pmdp_ptep(pmdp), node, addr, next);
@ -1554,7 +1554,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
int pud_clear_huge(pud_t *pud)
{
if (pud_is_leaf(*pud)) {
if (pud_leaf(*pud)) {
pud_clear(pud);
return 1;
}
@ -1601,7 +1601,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
int pmd_clear_huge(pmd_t *pmd)
{
if (pmd_is_leaf(*pmd)) {
if (pmd_leaf(*pmd)) {
pmd_clear(pmd);
return 1;
}

View File

@ -226,7 +226,7 @@ static int __init pseries_alloc_bootmem_huge_page(struct hstate *hstate)
return 0;
m = phys_to_virt(gpage_freearray[--nr_gpages]);
gpage_freearray[nr_gpages] = 0;
list_add(&m->list, &huge_boot_pages);
list_add(&m->list, &huge_boot_pages[0]);
m->hstate = hstate;
return 1;
}
@ -614,8 +614,6 @@ void __init gigantic_hugetlb_cma_reserve(void)
*/
order = mmu_psize_to_shift(MMU_PAGE_16G) - PAGE_SHIFT;
if (order) {
VM_WARN_ON(order <= MAX_PAGE_ORDER);
if (order)
hugetlb_cma_reserve(order);
}
}

View File

@ -171,12 +171,6 @@ static inline void mmu_mark_rodata_ro(void) { }
void __init mmu_mapin_immr(void);
#endif
#ifdef CONFIG_DEBUG_WX
void ptdump_check_wx(void);
#else
static inline void ptdump_check_wx(void) { }
#endif
static inline bool debug_pagealloc_enabled_or_kfence(void)
{
return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled();

View File

@ -13,7 +13,7 @@
#include <linux/delay.h>
#include <linux/memblock.h>
#include <linux/libfdt.h>
#include <linux/crash_core.h>
#include <linux/crash_reserve.h>
#include <linux/of.h>
#include <linux/of_fdt.h>
#include <asm/cacheflush.h>
@ -173,7 +173,7 @@ static __init bool overlaps_region(const void *fdt, u32 start,
static void __init get_crash_kernel(void *fdt, unsigned long size)
{
#ifdef CONFIG_CRASH_CORE
#ifdef CONFIG_CRASH_RESERVE
unsigned long long crash_size, crash_base;
int ret;

View File

@ -220,10 +220,7 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
break;
ptep++;
addr += PAGE_SIZE;
/*
* increment the pfn.
*/
pte = pfn_pte(pte_pfn(pte) + 1, pte_pgprot((pte)));
pte = pte_next_pfn(pte);
}
}
@ -413,7 +410,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
if (p4d_none(p4d))
return NULL;
if (p4d_is_leaf(p4d)) {
if (p4d_leaf(p4d)) {
ret_pte = (pte_t *)p4dp;
goto out;
}
@ -435,7 +432,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
if (pud_none(pud))
return NULL;
if (pud_is_leaf(pud)) {
if (pud_leaf(pud)) {
ret_pte = (pte_t *)pudp;
goto out;
}
@ -474,7 +471,7 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
goto out;
}
if (pmd_is_leaf(pmd)) {
if (pmd_leaf(pmd)) {
ret_pte = (pte_t *)pmdp;
goto out;
}

View File

@ -153,7 +153,6 @@ void mark_rodata_ro(void)
if (v_block_mapped((unsigned long)_stext + 1)) {
mmu_mark_rodata_ro();
ptdump_check_wx();
return;
}
@ -166,9 +165,6 @@ void mark_rodata_ro(void)
PFN_DOWN((unsigned long)_stext);
set_memory_ro((unsigned long)_stext, numpages);
// mark_initmem_nx() should have already run by now
ptdump_check_wx();
}
#endif

View File

@ -100,7 +100,7 @@ EXPORT_SYMBOL(__pte_frag_size_shift);
/* 4 level page table */
struct page *p4d_page(p4d_t p4d)
{
if (p4d_is_leaf(p4d)) {
if (p4d_leaf(p4d)) {
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
VM_WARN_ON(!p4d_huge(p4d));
return pte_page(p4d_pte(p4d));
@ -111,7 +111,7 @@ struct page *p4d_page(p4d_t p4d)
struct page *pud_page(pud_t pud)
{
if (pud_is_leaf(pud)) {
if (pud_leaf(pud)) {
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
VM_WARN_ON(!pud_huge(pud));
return pte_page(pud_pte(pud));
@ -125,14 +125,14 @@ struct page *pud_page(pud_t pud)
*/
struct page *pmd_page(pmd_t pmd)
{
if (pmd_is_leaf(pmd)) {
if (pmd_leaf(pmd)) {
/*
* vmalloc_to_page may be called on any vmap address (not only
* vmalloc), and it uses pmd_page() etc., when huge vmap is
* enabled so these checks can't be used.
*/
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd)));
VM_WARN_ON(!(pmd_leaf(pmd) || pmd_huge(pmd)));
return pte_page(pmd_pte(pmd));
}
return virt_to_page(pmd_page_vaddr(pmd));
@ -150,9 +150,6 @@ void mark_rodata_ro(void)
radix__mark_rodata_ro();
else
hash__mark_rodata_ro();
// mark_initmem_nx() should have already run by now
ptdump_check_wx();
}
void mark_initmem_nx(void)

View File

@ -184,13 +184,14 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
{
pte_t pte = __pte(st->current_flags);
if (!IS_ENABLED(CONFIG_DEBUG_WX) || !st->check_wx)
if (!st->check_wx)
return;
if (!pte_write(pte) || !pte_exec(pte))
return;
WARN_ONCE(1, "powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
WARN_ONCE(IS_ENABLED(CONFIG_DEBUG_WX),
"powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
(void *)st->start_address, (void *)st->start_address);
st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
@ -326,8 +327,7 @@ static void __init build_pgtable_complete_mask(void)
pg_level[i].mask |= pg_level[i].flag[j].mask;
}
#ifdef CONFIG_DEBUG_WX
void ptdump_check_wx(void)
bool ptdump_check_wx(void)
{
struct pg_state st = {
.seq = NULL,
@ -343,15 +343,22 @@ void ptdump_check_wx(void)
}
};
if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !mmu_has_feature(MMU_FTR_KERNEL_RO))
return true;
ptdump_walk_pgd(&st.ptdump, &init_mm, NULL);
if (st.wx_pages)
if (st.wx_pages) {
pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found\n",
st.wx_pages);
else
return false;
} else {
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
return true;
}
}
#endif
static int __init ptdump_init(void)
{

View File

@ -16,7 +16,7 @@
#include <linux/kobject.h>
#include <linux/sysfs.h>
#include <linux/slab.h>
#include <linux/crash_core.h>
#include <linux/vmcore_info.h>
#include <linux/of.h>
#include <asm/page.h>

View File

@ -3342,7 +3342,7 @@ static void show_pte(unsigned long addr)
return;
}
if (p4d_is_leaf(*p4dp)) {
if (p4d_leaf(*p4dp)) {
format_pte(p4dp, p4d_val(*p4dp));
return;
}
@ -3356,7 +3356,7 @@ static void show_pte(unsigned long addr)
return;
}
if (pud_is_leaf(*pudp)) {
if (pud_leaf(*pudp)) {
format_pte(pudp, pud_val(*pudp));
return;
}
@ -3370,7 +3370,7 @@ static void show_pte(unsigned long addr)
return;
}
if (pmd_is_leaf(*pmdp)) {
if (pmd_leaf(*pmdp)) {
format_pte(pmdp, pmd_val(*pmdp));
return;
}

View File

@ -767,7 +767,7 @@ config ARCH_SUPPORTS_CRASH_DUMP
def_bool y
config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
def_bool CRASH_CORE
def_bool CRASH_RESERVE
config COMPAT
bool "Kernel support for 32-bit U-mode"

View File

@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef _RISCV_CRASH_CORE_H
#define _RISCV_CRASH_CORE_H
#ifndef _RISCV_CRASH_RESERVE_H
#define _RISCV_CRASH_RESERVE_H
#define CRASH_ALIGN PMD_SIZE

View File

@ -190,7 +190,7 @@ static inline int pud_bad(pud_t pud)
}
#define pud_leaf pud_leaf
static inline int pud_leaf(pud_t pud)
static inline bool pud_leaf(pud_t pud)
{
return pud_present(pud) && (pud_val(pud) & _PAGE_LEAF);
}

View File

@ -241,7 +241,7 @@ static inline int pmd_bad(pmd_t pmd)
}
#define pmd_leaf pmd_leaf
static inline int pmd_leaf(pmd_t pmd)
static inline bool pmd_leaf(pmd_t pmd)
{
return pmd_present(pmd) && (pmd_val(pmd) & _PAGE_LEAF);
}
@ -527,6 +527,8 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
set_pte(ptep, pteval);
}
#define PFN_PTE_SHIFT _PAGE_PFN_SHIFT
static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
{

View File

@ -1,22 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2019 SiFive
*/
#ifndef _ASM_RISCV_PTDUMP_H
#define _ASM_RISCV_PTDUMP_H
void ptdump_check_wx(void);
#ifdef CONFIG_DEBUG_WX
static inline void debug_checkwx(void)
{
ptdump_check_wx();
}
#else
static inline void debug_checkwx(void)
{
}
#endif
#endif /* _ASM_RISCV_PTDUMP_H */

View File

@ -94,7 +94,7 @@ obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_KEXEC_CORE) += kexec_relocate.o crash_save_regs.o machine_kexec.o
obj-$(CONFIG_KEXEC_FILE) += elf_kexec.o machine_kexec_file.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_CRASH_CORE) += crash_core.o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
obj-$(CONFIG_JUMP_LABEL) += jump_label.o

Some files were not shown because too many files have changed in this diff Show More