Many singleton patches against the MM code. The patch series which are

included in this merge do the following:
 
 - Kemeng Shi has contributed some compation maintenance work in the
   series "Fixes and cleanups to compaction".
 
 - Joel Fernandes has a patchset ("Optimize mremap during mutual
   alignment within PMD") which fixes an obscure issue with mremap()'s
   pagetable handling during a subsequent exec(), based upon an
   implementation which Linus suggested.
 
 - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
   following patch series:
 
 	mm/damon: misc fixups for documents, comments and its tracepoint
 	mm/damon: add a tracepoint for damos apply target regions
 	mm/damon: provide pseudo-moving sum based access rate
 	mm/damon: implement DAMOS apply intervals
 	mm/damon/core-test: Fix memory leaks in core-test
 	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
 
 - In the series "Do not try to access unaccepted memory" Adrian Hunter
   provides some fixups for the recently-added "unaccepted memory' feature.
   To increase the feature's checking coverage.  "Plug a few gaps where
   RAM is exposed without checking if it is unaccepted memory".
 
 - In the series "cleanups for lockless slab shrink" Qi Zheng has done
   some maintenance work which is preparation for the lockless slab
   shrinking code.
 
 - Qi Zheng has redone the earlier (and reverted) attempt to make slab
   shrinking lockless in the series "use refcount+RCU method to implement
   lockless slab shrink".
 
 - David Hildenbrand contributes some maintenance work for the rmap code
   in the series "Anon rmap cleanups".
 
 - Kefeng Wang does more folio conversions and some maintenance work in
   the migration code.  Series "mm: migrate: more folio conversion and
   unification".
 
 - Matthew Wilcox has fixed an issue in the buffer_head code which was
   causing long stalls under some heavy memory/IO loads.  Some cleanups
   were added on the way.  Series "Add and use bdev_getblk()".
 
 - In the series "Use nth_page() in place of direct struct page
   manipulation" Zi Yan has fixed a potential issue with the direct
   manipulation of hugetlb page frames.
 
 - In the series "mm: hugetlb: Skip initialization of gigantic tail
   struct pages if freed by HVO" has improved our handling of gigantic
   pages in the hugetlb vmmemmep optimizaton code.  This provides
   significant boot time improvements when significant amounts of gigantic
   pages are in use.
 
 - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
   rationalization and folio conversions in the hugetlb code.
 
 - Yin Fengwei has improved mlock()'s handling of large folios in the
   series "support large folio for mlock"
 
 - In the series "Expose swapcache stat for memcg v1" Liu Shixin has
   added statistics for memcg v1 users which are available (and useful)
   under memcg v2.
 
 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
   prctl so that userspace may direct the kernel to not automatically
   propagate the denial to child processes.  The series is named "MDWE
   without inheritance".
 
 - Kefeng Wang has provided the series "mm: convert numa balancing
   functions to use a folio" which does what it says.
 
 - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
   makes is possible for a process to propagate KSM treatment across
   exec().
 
 - Huang Ying has enhanced memory tiering's calculation of memory
   distances.  This is used to permit the dax/kmem driver to use "high
   bandwidth memory" in addition to Optane Data Center Persistent Memory
   Modules (DCPMM).  The series is named "memory tiering: calculate
   abstract distance based on ACPI HMAT"
 
 - In the series "Smart scanning mode for KSM" Stefan Roesch has
   optimized KSM by teaching it to retain and use some historical
   information from previous scans.
 
 - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
   series "mm: memcg: fix tracking of pending stats updates values".
 
 - In the series "Implement IOCTL to get and optionally clear info about
   PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
   us to atomically read-then-clear page softdirty state.  This is mainly
   used by CRIU.
 
 - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
   - a bunch of relatively minor maintenance tweaks to this code.
 
 - Matthew Wilcox has increased the use of the VMA lock over file-backed
   page faults in the series "Handle more faults under the VMA lock".  Some
   rationalizations of the fault path became possible as a result.
 
 - In the series "mm/rmap: convert page_move_anon_rmap() to
   folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
   and folio conversions.
 
 - In the series "various improvements to the GUP interface" Lorenzo
   Stoakes has simplified and improved the GUP interface with an eye to
   providing groundwork for future improvements.
 
 - Andrey Konovalov has sent along the series "kasan: assorted fixes and
   improvements" which does those things.
 
 - Some page allocator maintenance work from Kemeng Shi in the series
   "Two minor cleanups to break_down_buddy_pages".
 
 - In thes series "New selftest for mm" Breno Leitao has developed
   another MM self test which tickles a race we had between madvise() and
   page faults.
 
 - In the series "Add folio_end_read" Matthew Wilcox provides cleanups
   and an optimization to the core pagecache code.
 
 - Nhat Pham has added memcg accounting for hugetlb memory in the series
   "hugetlb memcg accounting".
 
 - Cleanups and rationalizations to the pagemap code from Lorenzo
   Stoakes, in the series "Abstract vma_merge() and split_vma()".
 
 - Audra Mitchell has fixed issues in the procfs page_owner code's new
   timestamping feature which was causing some misbehaviours.  In the
   series "Fix page_owner's use of free timestamps".
 
 - Lorenzo Stoakes has fixed the handling of new mappings of sealed files
   in the series "permit write-sealed memfd read-only shared mappings".
 
 - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
   series "Batch hugetlb vmemmap modification operations".
 
 - Some buffer_head folio conversions and cleanups from Matthew Wilcox in
   the series "Finish the create_empty_buffers() transition".
 
 - As a page allocator performance optimization Huang Ying has added
   automatic tuning to the allocator's per-cpu-pages feature, in the series
   "mm: PCP high auto-tuning".
 
 - Roman Gushchin has contributed the patchset "mm: improve performance
   of accounted kernel memory allocations" which improves their performance
   by ~30% as measured by a micro-benchmark.
 
 - folio conversions from Kefeng Wang in the series "mm: convert page
   cpupid functions to folios".
 
 - Some kmemleak fixups in Liu Shixin's series "Some bugfix about
   kmemleak".
 
 - Qi Zheng has improved our handling of memoryless nodes by keeping them
   off the allocation fallback list.  This is done in the series "handle
   memoryless nodes more appropriately".
 
 - khugepaged conversions from Vishal Moola in the series "Some
   khugepaged folio conversions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
 jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
 FgeUPAD1oasg6CP+INZvCj34waNxwAc=
 =E+Y4
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
This commit is contained in:
Linus Torvalds 2023-11-02 19:38:47 -10:00
commit ecae0bd517
281 changed files with 11698 additions and 5292 deletions

View File

@ -151,6 +151,13 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the action
of the scheme.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/apply_interval_us
Date: Sep 2023
Contact: SeongJae Park <sj@kernel.org>
Description: Writing a value to this file sets the action apply interval of
the scheme in microseconds. Reading this file returns the
value.
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/access_pattern/sz/min
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>

View File

@ -551,6 +551,7 @@ memory.stat file includes following statistics:
event happens each time a page is unaccounted from the
cgroup.
swap # of bytes of swap usage
swapcached # of bytes of swap cached in memory
dirty # of bytes that are waiting to get written back to the disk.
writeback # of bytes of file/anon cache that are queued for syncing to
disk.

View File

@ -210,6 +210,35 @@ cgroup v2 currently supports the following mount options.
relying on the original semantics (e.g. specifying bogusly
high 'bypass' protection values at higher tree levels).
memory_hugetlb_accounting
Count HugeTLB memory usage towards the cgroup's overall
memory usage for the memory controller (for the purpose of
statistics reporting and memory protetion). This is a new
behavior that could regress existing setups, so it must be
explicitly opted in with this mount option.
A few caveats to keep in mind:
* There is no HugeTLB pool management involved in the memory
controller. The pre-allocated pool does not belong to anyone.
Specifically, when a new HugeTLB folio is allocated to
the pool, it is not accounted for from the perspective of the
memory controller. It is only charged to a cgroup when it is
actually used (for e.g at page fault time). Host memory
overcommit management has to consider this when configuring
hard limits. In general, HugeTLB pool management should be
done via other mechanisms (such as the HugeTLB controller).
* Failure to charge a HugeTLB folio to the memory controller
results in SIGBUS. This could happen even if the HugeTLB pool
still has pages available (but the cgroup limit is hit and
reclaim attempt fails).
* Charging HugeTLB memory towards the memory controller affects
memory protection and reclaim dynamics. Any userspace tuning
(of low, min limits for e.g) needs to take this into account.
* HugeTLB pages utilized while this option is not selected
will not be tracked by the memory controller (even if cgroup
v2 is remounted later on).
Organizing Processes and Threads
--------------------------------
@ -1539,6 +1568,15 @@ PAGE_SIZE multiple when read back.
collapsing an existing range of pages. This counter is not
present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
thp_swpout (npn)
Number of transparent hugepages which are swapout in one piece
without splitting.
thp_swpout_fallback (npn)
Number of transparent hugepages which were split before swapout.
Usually because failed to allocate some continuous swap space
for the huge page.
memory.numa_stat
A read-only nested-keyed file which exists on non-root cgroups.

View File

@ -20,18 +20,18 @@ DAMON provides below interfaces for different users.
you can write and use your personalized DAMON sysfs wrapper programs that
reads/writes the sysfs files instead of you. The `DAMON user space tool
<https://github.com/awslabs/damo>`_ is one example of such programs.
- *debugfs interface. (DEPRECATED!)*
:ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
<sysfs_interface>`. This is deprecated, so users should move to the
:ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
move, please report your usecase to damon@lists.linux.dev and
linux-mm@kvack.org.
- *Kernel Space Programming Interface.*
:doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
users can utilize every feature of DAMON most flexibly and efficiently by
writing kernel space DAMON application programs for you. You can even extend
DAMON for various address spaces. For detail, please refer to the interface
:doc:`document </mm/damon/api>`.
- *debugfs interface. (DEPRECATED!)*
:ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
<sysfs_interface>`. This is deprecated, so users should move to the
:ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
move, please report your usecase to damon@lists.linux.dev and
linux-mm@kvack.org.
.. _sysfs_interface:
@ -76,7 +76,7 @@ comma (","). ::
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ 0/action,apply_interval_us
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
@ -105,14 +105,12 @@ having the root permission could use this directory.
kdamonds/
---------
The monitoring-related information including request specifications and results
are called DAMON context. DAMON executes each context with a kernel thread
called kdamond, and multiple kdamonds could run in parallel.
Under the ``admin`` directory, one directory, ``kdamonds``, which has files for
controlling the kdamonds exist. In the beginning, this directory has only one
file, ``nr_kdamonds``. Writing a number (``N``) to the file creates the number
of child directories named ``0`` to ``N-1``. Each directory represents each
controlling the kdamonds (refer to
:ref:`design <damon_design_execution_model_and_data_structures>` for more
details) exists. In the beginning, this directory has only one file,
``nr_kdamonds``. Writing a number (``N``) to the file creates the number of
child directories named ``0`` to ``N-1``. Each directory represents each
kdamond.
kdamonds/<N>/
@ -150,9 +148,10 @@ kdamonds/<N>/contexts/
In the beginning, this directory has only one file, ``nr_contexts``. Writing a
number (``N``) to the file creates the number of child directories named as
``0`` to ``N-1``. Each directory represents each monitoring context. At the
moment, only one context per kdamond is supported, so only ``0`` or ``1`` can
be written to the file.
``0`` to ``N-1``. Each directory represents each monitoring context (refer to
:ref:`design <damon_design_execution_model_and_data_structures>` for more
details). At the moment, only one context per kdamond is supported, so only
``0`` or ``1`` can be written to the file.
.. _sysfs_contexts:
@ -270,8 +269,8 @@ schemes/<N>/
------------
In each scheme directory, five directories (``access_pattern``, ``quotas``,
``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file
(``action``) exist.
``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and two files
(``action`` and ``apply_interval``) exist.
The ``action`` file is for setting and getting the scheme's :ref:`action
<damon_design_damos_action>`. The keywords that can be written to and read
@ -297,6 +296,9 @@ Note that support of each action depends on the running DAMON operations set
- ``stat``: Do nothing but count the statistics.
Supported by all operations sets.
The ``apply_interval_us`` file is for setting and getting the scheme's
:ref:`apply_interval <damon_design_damos>` in microseconds.
schemes/<N>/access_pattern/
---------------------------
@ -392,7 +394,7 @@ pages of all memory cgroups except ``/having_care_already``.::
echo N > 1/matching
Note that ``anon`` and ``memcg`` filters are currently supported only when
``paddr`` `implementation <sysfs_contexts>` is being used.
``paddr`` :ref:`implementation <sysfs_contexts>` is being used.
Also, memory regions that are filtered out by ``addr`` or ``target`` filters
are not counted as the scheme has tried to those, while regions that filtered
@ -430,9 +432,9 @@ that reading it returns the total size of the scheme tried regions, and creates
directories named integer starting from ``0`` under this directory. Each
directory contains files exposing detailed information about each of the memory
region that the corresponding scheme's ``action`` has tried to be applied under
this directory, during next :ref:`aggregation interval
<sysfs_monitoring_attrs>`. The information includes address range,
``nr_accesses``, and ``age`` of the region.
this directory, during next :ref:`apply interval <damon_design_damos>` of the
corresponding scheme. The information includes address range, ``nr_accesses``,
and ``age`` of the region.
Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state``
file will only update the ``total_bytes`` file, and will not create the
@ -495,6 +497,62 @@ Please note that it's highly recommended to use user space tools like `damo
<https://github.com/awslabs/damo>`_ rather than manually reading and writing
the files as above. Above is only for an example.
.. _tracepoint:
Tracepoints for Monitoring Results
==================================
Users can get the monitoring results via the :ref:`tried_regions
<sysfs_schemes_tried_regions>`. The interface is useful for getting a
snapshot, but it could be inefficient for fully recording all the monitoring
results. For the purpose, two trace points, namely ``damon:damon_aggregated``
and ``damon:damos_before_apply``, are provided. ``damon:damon_aggregated``
provides the whole monitoring results, while ``damon:damos_before_apply``
provides the monitoring results for regions that each DAMON-based Operation
Scheme (:ref:`DAMOS <damon_design_damos>`) is gonna be applied. Hence,
``damon:damos_before_apply`` is more useful for recording internal behavior of
DAMOS, or DAMOS target access
:ref:`pattern <damon_design_damos_access_pattern>` based query-like efficient
monitoring results recording.
While the monitoring is turned on, you could record the tracepoint events and
show results using tracepoint supporting tools like ``perf``. For example::
# echo on > monitor_on
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# perf script
kdamond.0 46568 [027] 79357.842179: damon:damon_aggregated: target_id=0 nr_regions=11 122509119488-135708762112: 0 864
[...]
Each line of the perf script output represents each monitoring region. The
first five fields are as usual other tracepoint outputs. The sixth field
(``target_id=X``) shows the ide of the monitoring target of the region. The
seventh field (``nr_regions=X``) shows the total number of monitoring regions
for the target. The eighth field (``X-Y:``) shows the start (``X``) and end
(``Y``) addresses of the region in bytes. The ninth field (``X``) shows the
``nr_accesses`` of the region (refer to
:ref:`design <damon_design_region_based_sampling>` for more details of the
counter). Finally the tenth field (``X``) shows the ``age`` of the region
(refer to :ref:`design <damon_design_age_tracking>` for more details of the
counter).
If the event was ``damon:damos_beofre_apply``, the ``perf script`` output would
be somewhat like below::
kdamond.0 47293 [000] 80801.060214: damon:damos_before_apply: ctx_idx=0 scheme_idx=0 target_idx=0 nr_regions=11 121932607488-135128711168: 0 136
[...]
Each line of the output represents each monitoring region that each DAMON-based
Operation Scheme was about to be applied at the traced time. The first five
fields are as usual. It shows the index of the DAMON context (``ctx_idx=X``)
of the scheme in the list of the contexts of the context's kdamond, the index
of the scheme (``scheme_idx=X``) in the list of the schemes of the context, in
addition to the output of ``damon_aggregated`` tracepoint.
.. _debugfs_interface:
debugfs Interface (DEPRECATED!)
@ -790,23 +848,3 @@ directory by putting the name of the context to the ``rm_contexts`` file. ::
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
root directory only.
.. _tracepoint:
Tracepoint for Monitoring Results
=================================
Users can get the monitoring results via the :ref:`tried_regions
<sysfs_schemes_tried_regions>` or a tracepoint, ``damon:damon_aggregated``.
While the tried regions directory is useful for getting a snapshot, the
tracepoint is useful for getting a full record of the results. While the
monitoring is turned on, you could record the tracepoint events and show
results using tracepoint supporting tools like ``perf``. For example::
# echo on > monitor_on
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# perf script

View File

@ -155,6 +155,15 @@ stable_node_chains_prune_millisecs
scan. It's a noop if not a single KSM page hit the
``max_page_sharing`` yet.
smart_scan
Historically KSM checked every candidate page for each scan. It did
not take into account historic information. When smart scan is
enabled, pages that have previously not been de-duplicated get
skipped. How often these pages are skipped depends on how often
de-duplication has already been tried and failed. By default this
optimization is enabled. The ``pages_skipped`` metric shows how
effective the setting is.
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
general_profit
@ -169,6 +178,8 @@ pages_unshared
how many pages unique but repeatedly checked for merging
pages_volatile
how many pages changing too fast to be placed in a tree
pages_skipped
how many pages did the "smart" page scanning algorithm skip
full_scans
how many times all mergeable areas have been scanned
stable_node_chains

View File

@ -227,3 +227,92 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
always 12 at most architectures). Since Linux 3.11 their meaning changes
after first clear of soft-dirty bits. Since Linux 4.2 they are used for
flags unconditionally.
Pagemap Scan IOCTL
==================
The ``PAGEMAP_SCAN`` IOCTL on the pagemap file can be used to get or optionally
clear the info about page table entries. The following operations are supported
in this IOCTL:
- Scan the address range and get the memory ranges matching the provided criteria.
This is performed when the output buffer is specified.
- Write-protect the pages. The ``PM_SCAN_WP_MATCHING`` is used to write-protect
the pages of interest. The ``PM_SCAN_CHECK_WPASYNC`` aborts the operation if
non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING`` can be
used with or without ``PM_SCAN_CHECK_WPASYNC``.
- Both of those operations can be combined into one atomic operation where we can
get and write protect the pages as well.
Following flags about pages are currently supported:
- ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled
- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was write protected
- ``PAGE_IS_FILE`` - Page is file backed
- ``PAGE_IS_PRESENT`` - Page is present in the memory
- ``PAGE_IS_SWAPPED`` - Page is in swapped
- ``PAGE_IS_PFNZERO`` - Page has zero PFN
- ``PAGE_IS_HUGE`` - Page is THP or Hugetlb backed
The ``struct pm_scan_arg`` is used as the argument of the IOCTL.
1. The size of the ``struct pm_scan_arg`` must be specified in the ``size``
field. This field will be helpful in recognizing the structure if extensions
are done later.
2. The flags can be specified in the ``flags`` field. The ``PM_SCAN_WP_MATCHING``
and ``PM_SCAN_CHECK_WPASYNC`` are the only added flags at this time. The get
operation is optionally performed depending upon if the output buffer is
provided or not.
3. The range is specified through ``start`` and ``end``.
4. The walk can abort before visiting the complete range such as the user buffer
can get full etc. The walk ending address is specified in``end_walk``.
5. The output buffer of ``struct page_region`` array and size is specified in
``vec`` and ``vec_len``.
6. The optional maximum requested pages are specified in the ``max_pages``.
7. The masks are specified in ``category_mask``, ``category_anyof_mask``,
``category_inverted`` and ``return_mask``.
Find pages which have been written and WP them as well::
struct pm_scan_arg arg = {
.size = sizeof(arg),
.flags = PM_SCAN_CHECK_WPASYNC | PM_SCAN_CHECK_WPASYNC,
..
.category_mask = PAGE_IS_WRITTEN,
.return_mask = PAGE_IS_WRITTEN,
};
Find pages which have been written, are file backed, not swapped and either
present or huge::
struct pm_scan_arg arg = {
.size = sizeof(arg),
.flags = 0,
..
.category_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED,
.category_inverted = PAGE_IS_SWAPPED,
.category_anyof_mask = PAGE_IS_PRESENT | PAGE_IS_HUGE,
.return_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED |
PAGE_IS_PRESENT | PAGE_IS_HUGE,
};
The ``PAGE_IS_WRITTEN`` flag can be considered as a better-performing alternative
of soft-dirty flag. It doesn't get affected by VMA merging of the kernel and hence
the user can find the true soft-dirty pages in case of normal pages. (There may
still be extra dirty pages reported for THP or Hugetlb pages.)
"PAGE_IS_WRITTEN" category is used with uffd write protect-enabled ranges to
implement memory dirty tracking in userspace:
1. The userfaultfd file descriptor is created with ``userfaultfd`` syscall.
2. The ``UFFD_FEATURE_WP_UNPOPULATED`` and ``UFFD_FEATURE_WP_ASYNC`` features
are set by ``UFFDIO_API`` IOCTL.
3. The memory range is registered with ``UFFDIO_REGISTER_MODE_WP`` mode
through ``UFFDIO_REGISTER`` IOCTL.
4. Then any part of the registered memory or the whole memory region must
be write protected using ``PAGEMAP_SCAN`` IOCTL with flag ``PM_SCAN_WP_MATCHING``
or the ``UFFDIO_WRITEPROTECT`` IOCTL can be used. Both of these perform the
same operation. The former is better in terms of performance.
5. Now the ``PAGEMAP_SCAN`` IOCTL can be used to either just find pages which
have been written to since they were last marked and/or optionally write protect
the pages as well.

View File

@ -244,6 +244,41 @@ write-protected (so future writes will also result in a WP fault). These ioctls
support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
respectively) to configure the mapping this way.
If the userfaultfd context has ``UFFD_FEATURE_WP_ASYNC`` feature bit set,
any vma registered with write-protection will work in async mode rather
than the default sync mode.
In async mode, there will be no message generated when a write operation
happens, meanwhile the write-protection will be resolved automatically by
the kernel. It can be seen as a more accurate version of soft-dirty
tracking and it can be different in a few ways:
- The dirty result will not be affected by vma changes (e.g. vma
merging) because the dirty is only tracked by the pte.
- It supports range operations by default, so one can enable tracking on
any range of memory as long as page aligned.
- Dirty information will not get lost if the pte was zapped due to
various reasons (e.g. during split of a shmem transparent huge page).
- Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit
set; dirty when uffd-wp bit cleared), it has different semantics on
some of the memory operations. For example: ``MADV_DONTNEED`` on
anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as
dirtying of memory by dropping uffd-wp bit during the procedure.
The user app can collect the "written/dirty" status by looking up the
uffd-wp bit for the pages being interested in /proc/pagemap.
The page will not be under track of uffd-wp async mode until the page is
explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode
flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault
that was tracked by async mode userfaultfd-wp is invalid.
When userfaultfd-wp async mode is used alone, it can be applied to all
kinds of memory.
Memory Poisioning Emulation
---------------------------

View File

@ -175,7 +175,7 @@ will return the previous entry which occurs before the entry at index.
mas_find() will find the first entry which exists at or above index on
the first call, and the next entry from every subsequent calls.
mas_find_rev() will find the fist entry which exists at or below the last on
mas_find_rev() will find the first entry which exists at or below the last on
the first call, and the previous entry from every subsequent calls.
If the user needs to yield the lock during an operation, then the maple state

View File

@ -1,5 +1,8 @@
The Kernel Address Sanitizer (KASAN)
====================================
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2023, Google LLC.
Kernel Address Sanitizer (KASAN)
================================
Overview
--------

View File

@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2019, Google LLC.
The Kernel Concurrency Sanitizer (KCSAN)
========================================
Kernel Concurrency Sanitizer (KCSAN)
====================================
The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which
relies on compile-time instrumentation, and uses a watchpoint-based sampling

View File

@ -1,9 +1,9 @@
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2022, Google LLC.
===================================
The Kernel Memory Sanitizer (KMSAN)
===================================
===============================
Kernel Memory Sanitizer (KMSAN)
===============================
KMSAN is a dynamic error detector aimed at finding uses of uninitialized
values. It is based on compiler instrumentation, and is quite similar to the

View File

@ -1,5 +1,7 @@
The Undefined Behavior Sanitizer - UBSAN
========================================
.. SPDX-License-Identifier: GPL-2.0
Undefined Behavior Sanitizer - UBSAN
====================================
UBSAN is a runtime undefined behaviour checker.

View File

@ -154,6 +154,8 @@ The monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
.. _damon_design_region_based_sampling:
Region Based Sampling
~~~~~~~~~~~~~~~~~~~~~
@ -163,9 +165,10 @@ assumption (pages in a region have the same access frequencies) is kept, only
one page in the region is required to be checked. Thus, for each ``sampling
interval``, DAMON randomly picks one page in each region, waits for one
``sampling interval``, checks whether the page is accessed meanwhile, and
increases the access frequency of the region if so. Therefore, the monitoring
overhead is controllable by setting the number of regions. DAMON allows users
to set the minimum and the maximum number of regions for the trade-off.
increases the access frequency counter of the region if so. The counter is
called ``nr_regions`` of the region. Therefore, the monitoring overhead is
controllable by setting the number of regions. DAMON allows users to set the
minimum and the maximum number of regions for the trade-off.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.
@ -190,6 +193,8 @@ In this way, DAMON provides its best-effort quality and minimal overhead while
keeping the bounds users set for their trade-off.
.. _damon_design_age_tracking:
Age Tracking
~~~~~~~~~~~~
@ -254,7 +259,8 @@ works, DAMON provides a feature called Data Access Monitoring-based Operation
Schemes (DAMOS). It lets users specify their desired schemes at a high
level. For such specifications, DAMON starts monitoring, finds regions having
the access pattern of interest, and applies the user-desired operation actions
to the regions as soon as found.
to the regions, for every user-specified time interval called
``apply_interval``.
.. _damon_design_damos_action:
@ -471,3 +477,15 @@ modules for proactive reclamation and LRU lists manipulation are provided. For
more detail, please read the usage documents for those
(:doc:`/admin-guide/mm/damon/reclaim` and
:doc:`/admin-guide/mm/damon/lru_sort`).
.. _damon_design_execution_model_and_data_structures:
Execution Model and Data Structures
===================================
The monitoring-related information including the monitoring request
specification and DAMON-based operation schemes are stored in a data structure
called DAMON ``context``. DAMON executes each context with a kernel thread
called ``kdamond``. Multiple kdamonds could run in parallel, for different
types of monitoring.

View File

@ -107,14 +107,14 @@ GetOptions(
);
# Defaults for dynamically discovered regex's
my $regex_direct_begin_default = 'order=([0-9]*) may_writepage=([0-9]*) gfp_flags=([A-Z_|]*)';
my $regex_direct_begin_default = 'order=([0-9]*) gfp_flags=([A-Z_|]*)';
my $regex_direct_end_default = 'nr_reclaimed=([0-9]*)';
my $regex_kswapd_wake_default = 'nid=([0-9]*) order=([0-9]*)';
my $regex_kswapd_sleep_default = 'nid=([0-9]*)';
my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*) gfp_flags=([A-Z_|]*)';
my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) classzone_idx=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_skipped=([0-9]*) nr_taken=([0-9]*) lru=([a-z_]*)';
my $regex_wakeup_kswapd_default = 'nid=([0-9]*) order=([0-9]*) gfp_flags=([A-Z_|]*)';
my $regex_lru_isolate_default = 'classzone=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_skipped=([0-9]*) nr_taken=([0-9]*) lru=([a-z_]*)';
my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) nr_dirty=([0-9]*) nr_writeback=([0-9]*) nr_congested=([0-9]*) nr_immediate=([0-9]*) nr_activate_anon=([0-9]*) nr_activate_file=([0-9]*) nr_ref_keep=([0-9]*) nr_unmap_fail=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)';
my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)';
my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_taken=([0-9]*) nr_active=([0-9]*) nr_deactivated=([0-9]*) nr_referenced=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)' ;
my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)';
# Dyanically discovered regex
@ -184,8 +184,7 @@ sub generate_traceevent_regex {
$regex_direct_begin = generate_traceevent_regex(
"vmscan/mm_vmscan_direct_reclaim_begin",
$regex_direct_begin_default,
"order", "may_writepage",
"gfp_flags");
"order", "gfp_flags");
$regex_direct_end = generate_traceevent_regex(
"vmscan/mm_vmscan_direct_reclaim_end",
$regex_direct_end_default,
@ -201,11 +200,11 @@ $regex_kswapd_sleep = generate_traceevent_regex(
$regex_wakeup_kswapd = generate_traceevent_regex(
"vmscan/mm_vmscan_wakeup_kswapd",
$regex_wakeup_kswapd_default,
"nid", "zid", "order", "gfp_flags");
"nid", "order", "gfp_flags");
$regex_lru_isolate = generate_traceevent_regex(
"vmscan/mm_vmscan_lru_isolate",
$regex_lru_isolate_default,
"isolate_mode", "classzone_idx", "order",
"classzone", "order",
"nr_requested", "nr_scanned", "nr_skipped", "nr_taken",
"lru");
$regex_lru_shrink_inactive = generate_traceevent_regex(
@ -218,11 +217,10 @@ $regex_lru_shrink_inactive = generate_traceevent_regex(
$regex_lru_shrink_active = generate_traceevent_regex(
"vmscan/mm_vmscan_lru_shrink_active",
$regex_lru_shrink_active_default,
"nid", "zid",
"lru",
"nr_scanned", "nr_rotated", "priority");
"nid", "nr_taken", "nr_active", "nr_deactivated", "nr_referenced",
"priority", "flags");
$regex_writepage = generate_traceevent_regex(
"vmscan/mm_vmscan_writepage",
"vmscan/mm_vmscan_write_folio",
$regex_writepage_default,
"page", "pfn", "flags");
@ -371,7 +369,7 @@ EVENT_PROCESS:
print " $regex_wakeup_kswapd\n";
next;
}
my $order = $3;
my $order = $2;
$perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD_PERORDER}[$order]++;
} elsif ($tracepoint eq "mm_vmscan_lru_isolate") {
$details = $6;
@ -381,18 +379,14 @@ EVENT_PROCESS:
print " $regex_lru_isolate/o\n";
next;
}
my $isolate_mode = $1;
my $nr_scanned = $5;
my $file = $8;
my $nr_scanned = $4;
my $lru = $7;
# To closer match vmstat scanning statistics, only count isolate_both
# and isolate_inactive as scanning. isolate_active is rotation
# isolate_inactive == 1
# isolate_active == 2
# isolate_both == 3
if ($isolate_mode != 2) {
# To closer match vmstat scanning statistics, only count
# inactive lru as scanning
if ($lru =~ /inactive_/) {
$perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned;
if ($file =~ /_file/) {
if ($lru =~ /_file/) {
$perprocesspid{$process_pid}->{HIGH_NR_FILE_SCANNED} += $nr_scanned;
} else {
$perprocesspid{$process_pid}->{HIGH_NR_ANON_SCANNED} += $nr_scanned;

View File

@ -5332,6 +5332,7 @@ S: Maintained
F: mm/memcontrol.c
F: mm/swap_cgroup.c
F: tools/testing/selftests/cgroup/memcg_protection.m
F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c
F: tools/testing/selftests/cgroup/test_kmem.c
F: tools/testing/selftests/cgroup/test_memcontrol.c
@ -9754,6 +9755,7 @@ F: include/linux/hugetlb.h
F: mm/hugetlb.c
F: mm/hugetlb_vmemmap.c
F: mm/hugetlb_vmemmap.h
F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c
HVA ST MEDIA DRIVER
M: Jean-Christophe Trotin <jean-christophe.trotin@foss.st.com>

View File

@ -286,6 +286,26 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
#define arch_test_bit generic_test_bit
#define arch_test_bit_acquire generic_test_bit_acquire
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
unsigned long temp, old;
__asm__ __volatile__(
"1: ldl_l %0,%4\n"
" mov %0,%2\n"
" xor %0,%3,%0\n"
" stl_c %0,%1\n"
" beq %0,2f\n"
".subsection 2\n"
"2: br 1b\n"
".previous"
:"=&r" (temp), "=m" (*p), "=&r" (old)
:"Ir" (mask), "m" (*p));
return (old & BIT(7)) != 0;
}
/*
* ffz = Find First Zero in word. Undefined if no zero exists,
* so code should check against ~0UL first..

View File

@ -96,7 +96,10 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
unsigned long addr)
{
tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
struct ptdesc *ptdesc = virt_to_ptdesc(pudp);
pagetable_pud_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
}
#endif

View File

@ -411,8 +411,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
struct page *page = get_user_page_vma_remote(mm, addr,
gup_flags, &vma);
if (IS_ERR_OR_NULL(page)) {
err = page == NULL ? -EIO : PTR_ERR(page);
if (IS_ERR(page)) {
err = PTR_ERR(page);
break;
}

View File

@ -300,7 +300,11 @@ void __init kasan_init(void)
kasan_init_shadow();
kasan_init_depth();
#if defined(CONFIG_KASAN_GENERIC)
/* CONFIG_KASAN_SW_TAGS also requires kasan_init_sw_tags(). */
/*
* Generic KASAN is now fully initialized.
* Software and Hardware Tag-Based modes still require
* kasan_init_sw_tags() and kasan_init_hw_tags() correspondingly.
*/
pr_info("KernelAddressSanitizer initialized (generic)\n");
#endif
}

View File

@ -84,6 +84,7 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
if (!ptdesc)
return NULL;
pagetable_pud_ctor(ptdesc);
pud = ptdesc_address(ptdesc);
pud_init(pud);

View File

@ -319,6 +319,27 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_change_bit(nr, addr);
}
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
#ifdef CONFIG_COLDFIRE
__asm__ __volatile__ ("eorl %1, %0"
: "+m" (*p)
: "d" (mask)
: "memory");
return *p & (1 << 7);
#else
char result;
char *cp = (char *)p + 3; /* m68k is big-endian */
__asm__ __volatile__ ("eor.b %1, %2; smi %0"
: "=d" (result)
: "di" (mask), "o" (*cp)
: "memory");
return result;
#endif
}
/*
* The true 68020 and more advanced processors support the "bfffo"
* instruction for finding bits. ColdFire and simple 68000 parts

View File

@ -73,7 +73,8 @@ int __mips_test_and_clear_bit(unsigned long nr,
volatile unsigned long *addr);
int __mips_test_and_change_bit(unsigned long nr,
volatile unsigned long *addr);
bool __mips_xor_is_negative_byte(unsigned long mask,
volatile unsigned long *addr);
/*
* set_bit - Atomically set a bit in memory
@ -279,6 +280,28 @@ static inline int test_and_change_bit(unsigned long nr,
return res;
}
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
unsigned long orig;
bool res;
smp_mb__before_atomic();
if (!kernel_uses_llsc) {
res = __mips_xor_is_negative_byte(mask, p);
} else {
orig = __test_bit_op(*p, "%0",
"xor\t%1, %0, %3",
"ir"(mask));
res = (orig & BIT(7)) != 0;
}
smp_llsc_mb();
return res;
}
#undef __bit_op
#undef __test_bit_op

View File

@ -95,6 +95,7 @@ static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
if (!ptdesc)
return NULL;
pagetable_pud_ctor(ptdesc);
pud = ptdesc_address(ptdesc);
pud_init(pud);

View File

@ -146,3 +146,17 @@ int __mips_test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
return res;
}
EXPORT_SYMBOL(__mips_test_and_change_bit);
bool __mips_xor_is_negative_byte(unsigned long mask,
volatile unsigned long *addr)
{
unsigned long flags;
unsigned long data;
raw_local_irq_save(flags);
data = *addr;
*addr = data ^ mask;
raw_local_irq_restore(flags);
return (data & BIT(7)) != 0;
}

View File

@ -117,7 +117,7 @@ void __flush_dcache_pages(struct page *page, unsigned int nr)
* get faulted into the tlb (and thus flushed) anyways.
*/
for (i = 0; i < nr; i++) {
addr = (unsigned long)kmap_local_page(page + i);
addr = (unsigned long)kmap_local_page(nth_page(page, i));
flush_data_cache_page(addr);
kunmap_local((void *)addr);
}

View File

@ -233,35 +233,24 @@ static inline int arch_test_and_change_bit(unsigned long nr,
return test_and_change_bits(BIT_MASK(nr), addr + BIT_WORD(nr)) != 0;
}
#ifdef CONFIG_PPC64
static inline unsigned long
clear_bit_unlock_return_word(int nr, volatile unsigned long *addr)
static inline bool arch_xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *p)
{
unsigned long old, t;
unsigned long *p = (unsigned long *)addr + BIT_WORD(nr);
unsigned long mask = BIT_MASK(nr);
__asm__ __volatile__ (
PPC_RELEASE_BARRIER
"1:" PPC_LLARX "%0,0,%3,0\n"
"andc %1,%0,%2\n"
"xor %1,%0,%2\n"
PPC_STLCX "%1,0,%3\n"
"bne- 1b\n"
: "=&r" (old), "=&r" (t)
: "r" (mask), "r" (p)
: "cc", "memory");
return old;
return (old & BIT_MASK(7)) != 0;
}
/*
* This is a special function for mm/filemap.c
* Bit 7 corresponds to PG_waiters.
*/
#define arch_clear_bit_unlock_is_negative_byte(nr, addr) \
(clear_bit_unlock_return_word(nr, addr) & BIT_MASK(7))
#endif /* CONFIG_PPC64 */
#define arch_xor_unlock_is_negative_byte arch_xor_unlock_is_negative_byte
#include <asm-generic/bitops/non-atomic.h>

View File

@ -191,6 +191,18 @@ static inline void __clear_bit_unlock(
clear_bit_unlock(nr, addr);
}
static inline bool xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *addr)
{
unsigned long res;
__asm__ __volatile__ (
__AMO(xor) ".rl %0, %2, %1"
: "=r" (res), "+A" (*addr)
: "r" (__NOP(mask))
: "memory");
return (res & BIT(7)) != 0;
}
#undef __test_and_op_bit
#undef __op_bit
#undef __NOP

View File

@ -201,6 +201,16 @@ static inline void arch___clear_bit_unlock(unsigned long nr,
arch___clear_bit(nr, ptr);
}
static inline bool arch_xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *ptr)
{
unsigned long old;
old = __atomic64_xor_barrier(mask, (long *)ptr);
return old & BIT(7);
}
#define arch_xor_unlock_is_negative_byte arch_xor_unlock_is_negative_byte
#include <asm-generic/bitops/instrumented-atomic.h>
#include <asm-generic/bitops/instrumented-non-atomic.h>
#include <asm-generic/bitops/instrumented-lock.h>

View File

@ -94,18 +94,17 @@ arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
static __always_inline bool
arch_clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
static __always_inline bool arch_xor_unlock_is_negative_byte(unsigned long mask,
volatile unsigned long *addr)
{
bool negative;
asm volatile(LOCK_PREFIX "andb %2,%1"
asm volatile(LOCK_PREFIX "xorb %2,%1"
CC_SET(s)
: CC_OUT(s) (negative), WBYTE_ADDR(addr)
: "ir" ((char) ~(1 << nr)) : "memory");
: "iq" ((char)mask) : "memory");
return negative;
}
#define arch_clear_bit_unlock_is_negative_byte \
arch_clear_bit_unlock_is_negative_byte
#define arch_xor_unlock_is_negative_byte arch_xor_unlock_is_negative_byte
static __always_inline void
arch___clear_bit_unlock(long nr, volatile unsigned long *addr)

View File

@ -6800,11 +6800,7 @@ static unsigned long mmu_shrink_count(struct shrinker *shrink,
return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
}
static struct shrinker mmu_shrinker = {
.count_objects = mmu_shrink_count,
.scan_objects = mmu_shrink_scan,
.seeks = DEFAULT_SEEKS * 10,
};
static struct shrinker *mmu_shrinker;
static void mmu_destroy_caches(void)
{
@ -6937,10 +6933,16 @@ int kvm_mmu_vendor_module_init(void)
if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL))
goto out;
ret = register_shrinker(&mmu_shrinker, "x86-mmu");
if (ret)
mmu_shrinker = shrinker_alloc(0, "x86-mmu");
if (!mmu_shrinker)
goto out_shrinker;
mmu_shrinker->count_objects = mmu_shrink_count;
mmu_shrinker->scan_objects = mmu_shrink_scan;
mmu_shrinker->seeks = DEFAULT_SEEKS * 10;
shrinker_register(mmu_shrinker);
return 0;
out_shrinker:
@ -6962,7 +6964,7 @@ void kvm_mmu_vendor_module_exit(void)
{
mmu_destroy_caches();
percpu_counter_destroy(&kvm_total_used_mmu_pages);
unregister_shrinker(&mmu_shrinker);
shrinker_free(mmu_shrinker);
}
/*

View File

@ -76,6 +76,9 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
#if CONFIG_PGTABLE_LEVELS > 3
void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
{
struct ptdesc *ptdesc = virt_to_ptdesc(pud);
pagetable_pud_dtor(ptdesc);
paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
paravirt_tlb_remove_table(tlb, virt_to_page(pud));
}

View File

@ -101,7 +101,7 @@ static void round_robin_cpu(unsigned int tsk_index)
for_each_cpu(cpu, pad_busy_cpus)
cpumask_or(tmp, tmp, topology_sibling_cpumask(cpu));
cpumask_andnot(tmp, cpu_online_mask, tmp);
/* avoid HT sibilings if possible */
/* avoid HT siblings if possible */
if (cpumask_empty(tmp))
cpumask_andnot(tmp, cpu_online_mask, pad_busy_cpus);
if (cpumask_empty(tmp)) {

View File

@ -24,6 +24,7 @@
#include <linux/node.h>
#include <linux/sysfs.h>
#include <linux/dax.h>
#include <linux/memory-tiers.h>
static u8 hmat_revision;
static int hmat_disable __initdata;
@ -582,28 +583,25 @@ static int initiators_to_nodemask(unsigned long *p_nodes)
return 0;
}
static void hmat_register_target_initiators(struct memory_target *target)
static void hmat_update_target_attrs(struct memory_target *target,
unsigned long *p_nodes, int access)
{
static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
struct memory_initiator *initiator;
unsigned int mem_nid, cpu_nid;
unsigned int cpu_nid;
struct memory_locality *loc = NULL;
u32 best = 0;
bool access0done = false;
int i;
mem_nid = pxm_to_node(target->memory_pxm);
bitmap_zero(p_nodes, MAX_NUMNODES);
/*
* If the Address Range Structure provides a local processor pxm, link
* If the Address Range Structure provides a local processor pxm, set
* only that one. Otherwise, find the best performance attributes and
* register all initiators that match.
* collect all initiators that match.
*/
if (target->processor_pxm != PXM_INVAL) {
cpu_nid = pxm_to_node(target->processor_pxm);
register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
access0done = true;
if (node_state(cpu_nid, N_CPU)) {
register_memory_node_under_compute_node(mem_nid, cpu_nid, 1);
if (access == 0 || node_state(cpu_nid, N_CPU)) {
set_bit(target->processor_pxm, p_nodes);
return;
}
}
@ -617,47 +615,10 @@ static void hmat_register_target_initiators(struct memory_target *target)
* We'll also use the sorting to prime the candidate nodes with known
* initiators.
*/
bitmap_zero(p_nodes, MAX_NUMNODES);
list_sort(NULL, &initiators, initiator_cmp);
if (initiators_to_nodemask(p_nodes) < 0)
return;
if (!access0done) {
for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
loc = localities_types[i];
if (!loc)
continue;
best = 0;
list_for_each_entry(initiator, &initiators, node) {
u32 value;
if (!test_bit(initiator->processor_pxm, p_nodes))
continue;
value = hmat_initiator_perf(target, initiator,
loc->hmat_loc);
if (hmat_update_best(loc->hmat_loc->data_type, value, &best))
bitmap_clear(p_nodes, 0, initiator->processor_pxm);
if (value != best)
clear_bit(initiator->processor_pxm, p_nodes);
}
if (best)
hmat_update_target_access(target, loc->hmat_loc->data_type,
best, 0);
}
for_each_set_bit(i, p_nodes, MAX_NUMNODES) {
cpu_nid = pxm_to_node(i);
register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
}
}
/* Access 1 ignores Generic Initiators */
bitmap_zero(p_nodes, MAX_NUMNODES);
if (initiators_to_nodemask(p_nodes) < 0)
return;
for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
loc = localities_types[i];
if (!loc)
@ -667,7 +628,7 @@ static void hmat_register_target_initiators(struct memory_target *target)
list_for_each_entry(initiator, &initiators, node) {
u32 value;
if (!initiator->has_cpu) {
if (access == 1 && !initiator->has_cpu) {
clear_bit(initiator->processor_pxm, p_nodes);
continue;
}
@ -681,14 +642,33 @@ static void hmat_register_target_initiators(struct memory_target *target)
clear_bit(initiator->processor_pxm, p_nodes);
}
if (best)
hmat_update_target_access(target, loc->hmat_loc->data_type, best, 1);
hmat_update_target_access(target, loc->hmat_loc->data_type, best, access);
}
}
static void __hmat_register_target_initiators(struct memory_target *target,
unsigned long *p_nodes,
int access)
{
unsigned int mem_nid, cpu_nid;
int i;
mem_nid = pxm_to_node(target->memory_pxm);
hmat_update_target_attrs(target, p_nodes, access);
for_each_set_bit(i, p_nodes, MAX_NUMNODES) {
cpu_nid = pxm_to_node(i);
register_memory_node_under_compute_node(mem_nid, cpu_nid, 1);
register_memory_node_under_compute_node(mem_nid, cpu_nid, access);
}
}
static void hmat_register_target_initiators(struct memory_target *target)
{
static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
__hmat_register_target_initiators(target, p_nodes, 0);
__hmat_register_target_initiators(target, p_nodes, 1);
}
static void hmat_register_target_cache(struct memory_target *target)
{
unsigned mem_nid = pxm_to_node(target->memory_pxm);
@ -780,6 +760,61 @@ static int hmat_callback(struct notifier_block *self,
return NOTIFY_OK;
}
static int hmat_set_default_dram_perf(void)
{
int rc;
int nid, pxm;
struct memory_target *target;
struct node_hmem_attrs *attrs;
if (!default_dram_type)
return -EIO;
for_each_node_mask(nid, default_dram_type->nodes) {
pxm = node_to_pxm(nid);
target = find_mem_target(pxm);
if (!target)
continue;
attrs = &target->hmem_attrs[1];
rc = mt_set_default_dram_perf(nid, attrs, "ACPI HMAT");
if (rc)
return rc;
}
return 0;
}
static int hmat_calculate_adistance(struct notifier_block *self,
unsigned long nid, void *data)
{
static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
struct memory_target *target;
struct node_hmem_attrs *perf;
int *adist = data;
int pxm;
pxm = node_to_pxm(nid);
target = find_mem_target(pxm);
if (!target)
return NOTIFY_OK;
mutex_lock(&target_lock);
hmat_update_target_attrs(target, p_nodes, 1);
mutex_unlock(&target_lock);
perf = &target->hmem_attrs[1];
if (mt_perf_to_adistance(perf, adist))
return NOTIFY_OK;
return NOTIFY_STOP;
}
static struct notifier_block hmat_adist_nb __meminitdata = {
.notifier_call = hmat_calculate_adistance,
.priority = 100,
};
static __init void hmat_free_structures(void)
{
struct memory_target *target, *tnext;
@ -862,8 +897,13 @@ static __init int hmat_init(void)
hmat_register_targets();
/* Keep the table and structures if the notifier may use them */
if (!hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
return 0;
if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
goto out_put;
if (!hmat_set_default_dram_perf())
register_mt_adistance_algorithm(&hmat_adist_nb);
return 0;
out_put:
hmat_free_structures();
acpi_put_table(tbl);

View File

@ -1053,11 +1053,7 @@ binder_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
NULL, sc->nr_to_scan);
}
static struct shrinker binder_shrinker = {
.count_objects = binder_shrink_count,
.scan_objects = binder_shrink_scan,
.seeks = DEFAULT_SEEKS,
};
static struct shrinker *binder_shrinker;
/**
* binder_alloc_init() - called by binder_open() for per-proc initialization
@ -1077,19 +1073,29 @@ void binder_alloc_init(struct binder_alloc *alloc)
int binder_alloc_shrinker_init(void)
{
int ret = list_lru_init(&binder_alloc_lru);
int ret;
if (ret == 0) {
ret = register_shrinker(&binder_shrinker, "android-binder");
if (ret)
list_lru_destroy(&binder_alloc_lru);
ret = list_lru_init(&binder_alloc_lru);
if (ret)
return ret;
binder_shrinker = shrinker_alloc(0, "android-binder");
if (!binder_shrinker) {
list_lru_destroy(&binder_alloc_lru);
return -ENOMEM;
}
return ret;
binder_shrinker->count_objects = binder_shrink_count;
binder_shrinker->scan_objects = binder_shrink_scan;
shrinker_register(binder_shrinker);
return 0;
}
void binder_alloc_shrinker_exit(void)
{
unregister_shrinker(&binder_shrinker);
shrinker_free(binder_shrinker);
list_lru_destroy(&binder_alloc_lru);
}

View File

@ -898,6 +898,48 @@ err:
return rc;
}
/*
* Calculate the size of the per-CPU data cache slice. This can be
* used to estimate the size of the data cache slice that can be used
* by one CPU under ideal circumstances. UNIFIED caches are counted
* in addition to DATA caches. So, please consider code cache usage
* when use the result.
*
* Because the cache inclusive/non-inclusive information isn't
* available, we just use the size of the per-CPU slice of LLC to make
* the result more predictable across architectures.
*/
static void update_per_cpu_data_slice_size_cpu(unsigned int cpu)
{
struct cpu_cacheinfo *ci;
struct cacheinfo *llc;
unsigned int nr_shared;
if (!last_level_cache_is_valid(cpu))
return;
ci = ci_cacheinfo(cpu);
llc = per_cpu_cacheinfo_idx(cpu, cache_leaves(cpu) - 1);
if (llc->type != CACHE_TYPE_DATA && llc->type != CACHE_TYPE_UNIFIED)
return;
nr_shared = cpumask_weight(&llc->shared_cpu_map);
if (nr_shared)
ci->per_cpu_data_slice_size = llc->size / nr_shared;
}
static void update_per_cpu_data_slice_size(bool cpu_online, unsigned int cpu)
{
unsigned int icpu;
for_each_online_cpu(icpu) {
if (!cpu_online && icpu == cpu)
continue;
update_per_cpu_data_slice_size_cpu(icpu);
}
}
static int cacheinfo_cpu_online(unsigned int cpu)
{
int rc = detect_cache_attributes(cpu);
@ -906,7 +948,12 @@ static int cacheinfo_cpu_online(unsigned int cpu)
return rc;
rc = cache_add_dev(cpu);
if (rc)
free_cache_attributes(cpu);
goto err;
update_per_cpu_data_slice_size(true, cpu);
setup_pcp_cacheinfo();
return 0;
err:
free_cache_attributes(cpu);
return rc;
}
@ -916,6 +963,8 @@ static int cacheinfo_cpu_pre_down(unsigned int cpu)
cpu_cache_sysfs_exit(cpu);
free_cache_attributes(cpu);
update_per_cpu_data_slice_size(false, cpu);
setup_pcp_cacheinfo();
return 0;
}

View File

@ -49,14 +49,52 @@ struct dax_kmem_data {
struct resource *res[];
};
static struct memory_dev_type *dax_slowmem_type;
static DEFINE_MUTEX(kmem_memory_type_lock);
static LIST_HEAD(kmem_memory_types);
static struct memory_dev_type *kmem_find_alloc_memory_type(int adist)
{
bool found = false;
struct memory_dev_type *mtype;
mutex_lock(&kmem_memory_type_lock);
list_for_each_entry(mtype, &kmem_memory_types, list) {
if (mtype->adistance == adist) {
found = true;
break;
}
}
if (!found) {
mtype = alloc_memory_type(adist);
if (!IS_ERR(mtype))
list_add(&mtype->list, &kmem_memory_types);
}
mutex_unlock(&kmem_memory_type_lock);
return mtype;
}
static void kmem_put_memory_types(void)
{
struct memory_dev_type *mtype, *mtn;
mutex_lock(&kmem_memory_type_lock);
list_for_each_entry_safe(mtype, mtn, &kmem_memory_types, list) {
list_del(&mtype->list);
put_memory_type(mtype);
}
mutex_unlock(&kmem_memory_type_lock);
}
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
unsigned long total_len = 0;
struct dax_kmem_data *data;
struct memory_dev_type *mtype;
int i, rc, mapped = 0;
int numa_node;
int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
/*
* Ensure good NUMA information for the persistent memory.
@ -71,6 +109,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return -EINVAL;
}
mt_calc_adistance(numa_node, &adist);
mtype = kmem_find_alloc_memory_type(adist);
if (IS_ERR(mtype))
return PTR_ERR(mtype);
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
@ -88,7 +131,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
return -EINVAL;
}
init_node_memory_type(numa_node, dax_slowmem_type);
init_node_memory_type(numa_node, mtype);
rc = -ENOMEM;
data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
@ -167,7 +210,7 @@ err_reg_mgid:
err_res_name:
kfree(data);
err_dax_kmem_data:
clear_node_memory_type(numa_node, dax_slowmem_type);
clear_node_memory_type(numa_node, mtype);
return rc;
}
@ -219,7 +262,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* for that. This implies this reference will be around
* till next reboot.
*/
clear_node_memory_type(node, dax_slowmem_type);
clear_node_memory_type(node, NULL);
}
}
#else
@ -251,12 +294,6 @@ static int __init dax_kmem_init(void)
if (!kmem_name)
return -ENOMEM;
dax_slowmem_type = alloc_memory_type(MEMTIER_DEFAULT_DAX_ADISTANCE);
if (IS_ERR(dax_slowmem_type)) {
rc = PTR_ERR(dax_slowmem_type);
goto err_dax_slowmem_type;
}
rc = dax_driver_register(&device_dax_kmem_driver);
if (rc)
goto error_dax_driver;
@ -264,8 +301,7 @@ static int __init dax_kmem_init(void)
return rc;
error_dax_driver:
put_memory_type(dax_slowmem_type);
err_dax_slowmem_type:
kmem_put_memory_types();
kfree_const(kmem_name);
return rc;
}
@ -275,7 +311,7 @@ static void __exit dax_kmem_exit(void)
dax_driver_unregister(&device_dax_kmem_driver);
if (!any_hotremove_failed)
kfree_const(kmem_name);
put_memory_type(dax_slowmem_type);
kmem_put_memory_types();
}
MODULE_AUTHOR("Intel Corporation");

View File

@ -3,6 +3,7 @@
#include <linux/efi.h>
#include <linux/memblock.h>
#include <linux/spinlock.h>
#include <linux/crash_dump.h>
#include <asm/unaccepted_memory.h>
/* Protects unaccepted memory bitmap and accepting_list */
@ -201,3 +202,22 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
return ret;
}
#ifdef CONFIG_PROC_VMCORE
static bool unaccepted_memory_vmcore_pfn_is_ram(struct vmcore_cb *cb,
unsigned long pfn)
{
return !pfn_is_unaccepted_memory(pfn);
}
static struct vmcore_cb vmcore_cb = {
.pfn_is_ram = unaccepted_memory_vmcore_pfn_is_ram,
};
static int __init unaccepted_memory_init_kdump(void)
{
register_vmcore_cb(&vmcore_cb);
return 0;
}
core_initcall(unaccepted_memory_init_kdump);
#endif /* CONFIG_PROC_VMCORE */

View File

@ -288,8 +288,7 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *i915)
static unsigned long
i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
struct drm_i915_private *i915 =
container_of(shrinker, struct drm_i915_private, mm.shrinker);
struct drm_i915_private *i915 = shrinker->private_data;
unsigned long num_objects;
unsigned long count;
@ -306,8 +305,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
if (num_objects) {
unsigned long avg = 2 * count / num_objects;
i915->mm.shrinker.batch =
max((i915->mm.shrinker.batch + avg) >> 1,
i915->mm.shrinker->batch =
max((i915->mm.shrinker->batch + avg) >> 1,
128ul /* default SHRINK_BATCH */);
}
@ -317,8 +316,7 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
static unsigned long
i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
struct drm_i915_private *i915 =
container_of(shrinker, struct drm_i915_private, mm.shrinker);
struct drm_i915_private *i915 = shrinker->private_data;
unsigned long freed;
sc->nr_scanned = 0;
@ -430,12 +428,17 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
{
i915->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
i915->mm.shrinker.count_objects = i915_gem_shrinker_count;
i915->mm.shrinker.seeks = DEFAULT_SEEKS;
i915->mm.shrinker.batch = 4096;
drm_WARN_ON(&i915->drm, register_shrinker(&i915->mm.shrinker,
"drm-i915_gem"));
i915->mm.shrinker = shrinker_alloc(0, "drm-i915_gem");
if (!i915->mm.shrinker) {
drm_WARN_ON(&i915->drm, 1);
} else {
i915->mm.shrinker->scan_objects = i915_gem_shrinker_scan;
i915->mm.shrinker->count_objects = i915_gem_shrinker_count;
i915->mm.shrinker->batch = 4096;
i915->mm.shrinker->private_data = i915;
shrinker_register(i915->mm.shrinker);
}
i915->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
drm_WARN_ON(&i915->drm, register_oom_notifier(&i915->mm.oom_notifier));
@ -451,7 +454,7 @@ void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
unregister_vmap_purge_notifier(&i915->mm.vmap_notifier));
drm_WARN_ON(&i915->drm,
unregister_oom_notifier(&i915->mm.oom_notifier));
unregister_shrinker(&i915->mm.shrinker);
shrinker_free(i915->mm.shrinker);
}
void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915,

View File

@ -163,7 +163,7 @@ struct i915_gem_mm {
struct notifier_block oom_notifier;
struct notifier_block vmap_notifier;
struct shrinker shrinker;
struct shrinker *shrinker;
#ifdef CONFIG_MMU_NOTIFIER
/**

View File

@ -265,7 +265,9 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv)
if (ret)
goto err_deinit_vram;
msm_gem_shrinker_init(ddev);
ret = msm_gem_shrinker_init(ddev);
if (ret)
goto err_msm_uninit;
if (priv->kms_init) {
ret = msm_drm_kms_init(dev, drv);

View File

@ -218,7 +218,7 @@ struct msm_drm_private {
} vram;
struct notifier_block vmap_notifier;
struct shrinker shrinker;
struct shrinker *shrinker;
struct drm_atomic_state *pm_state;
@ -280,7 +280,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
unsigned long msm_gem_shrinker_shrink(struct drm_device *dev, unsigned long nr_to_scan);
#endif
void msm_gem_shrinker_init(struct drm_device *dev);
int msm_gem_shrinker_init(struct drm_device *dev);
void msm_gem_shrinker_cleanup(struct drm_device *dev);
struct sg_table *msm_gem_prime_get_sg_table(struct drm_gem_object *obj);

View File

@ -34,8 +34,7 @@ static bool can_block(struct shrink_control *sc)
static unsigned long
msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
struct msm_drm_private *priv = shrinker->private_data;
unsigned count = priv->lru.dontneed.count;
if (can_swap())
@ -100,8 +99,7 @@ active_evict(struct drm_gem_object *obj)
static unsigned long
msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
struct msm_drm_private *priv =
container_of(shrinker, struct msm_drm_private, shrinker);
struct msm_drm_private *priv = shrinker->private_data;
struct {
struct drm_gem_lru *lru;
bool (*shrink)(struct drm_gem_object *obj);
@ -148,10 +146,11 @@ msm_gem_shrinker_shrink(struct drm_device *dev, unsigned long nr_to_scan)
struct shrink_control sc = {
.nr_to_scan = nr_to_scan,
};
int ret;
unsigned long ret = SHRINK_STOP;
fs_reclaim_acquire(GFP_KERNEL);
ret = msm_gem_shrinker_scan(&priv->shrinker, &sc);
if (priv->shrinker)
ret = msm_gem_shrinker_scan(priv->shrinker, &sc);
fs_reclaim_release(GFP_KERNEL);
return ret;
@ -210,16 +209,24 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
*
* This function registers and sets up the msm shrinker.
*/
void msm_gem_shrinker_init(struct drm_device *dev)
int msm_gem_shrinker_init(struct drm_device *dev)
{
struct msm_drm_private *priv = dev->dev_private;
priv->shrinker.count_objects = msm_gem_shrinker_count;
priv->shrinker.scan_objects = msm_gem_shrinker_scan;
priv->shrinker.seeks = DEFAULT_SEEKS;
WARN_ON(register_shrinker(&priv->shrinker, "drm-msm_gem"));
priv->shrinker = shrinker_alloc(0, "drm-msm_gem");
if (!priv->shrinker)
return -ENOMEM;
priv->shrinker->count_objects = msm_gem_shrinker_count;
priv->shrinker->scan_objects = msm_gem_shrinker_scan;
priv->shrinker->private_data = priv;
shrinker_register(priv->shrinker);
priv->vmap_notifier.notifier_call = msm_gem_shrinker_vmap;
WARN_ON(register_vmap_purge_notifier(&priv->vmap_notifier));
return 0;
}
/**
@ -232,8 +239,8 @@ void msm_gem_shrinker_cleanup(struct drm_device *dev)
{
struct msm_drm_private *priv = dev->dev_private;
if (priv->shrinker.nr_deferred) {
if (priv->shrinker) {
WARN_ON(unregister_vmap_purge_notifier(&priv->vmap_notifier));
unregister_shrinker(&priv->shrinker);
shrinker_free(priv->shrinker);
}
}

View File

@ -119,7 +119,7 @@ struct panfrost_device {
struct mutex shrinker_lock;
struct list_head shrinker_list;
struct shrinker shrinker;
struct shrinker *shrinker;
struct panfrost_devfreq pfdevfreq;

View File

@ -659,10 +659,14 @@ static int panfrost_probe(struct platform_device *pdev)
if (err < 0)
goto err_out1;
panfrost_gem_shrinker_init(ddev);
err = panfrost_gem_shrinker_init(ddev);
if (err)
goto err_out2;
return 0;
err_out2:
drm_dev_unregister(ddev);
err_out1:
pm_runtime_disable(pfdev->dev);
panfrost_device_fini(pfdev);

View File

@ -86,7 +86,7 @@ panfrost_gem_mapping_get(struct panfrost_gem_object *bo,
void panfrost_gem_mapping_put(struct panfrost_gem_mapping *mapping);
void panfrost_gem_teardown_mappings_locked(struct panfrost_gem_object *bo);
void panfrost_gem_shrinker_init(struct drm_device *dev);
int panfrost_gem_shrinker_init(struct drm_device *dev);
void panfrost_gem_shrinker_cleanup(struct drm_device *dev);
#endif /* __PANFROST_GEM_H__ */

View File

@ -18,8 +18,7 @@
static unsigned long
panfrost_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
{
struct panfrost_device *pfdev =
container_of(shrinker, struct panfrost_device, shrinker);
struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem;
unsigned long count = 0;
@ -65,8 +64,7 @@ unlock_mappings:
static unsigned long
panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
struct panfrost_device *pfdev =
container_of(shrinker, struct panfrost_device, shrinker);
struct panfrost_device *pfdev = shrinker->private_data;
struct drm_gem_shmem_object *shmem, *tmp;
unsigned long freed = 0;
@ -97,13 +95,21 @@ panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
*
* This function registers and sets up the panfrost shrinker.
*/
void panfrost_gem_shrinker_init(struct drm_device *dev)
int panfrost_gem_shrinker_init(struct drm_device *dev)
{
struct panfrost_device *pfdev = dev->dev_private;
pfdev->shrinker.count_objects = panfrost_gem_shrinker_count;
pfdev->shrinker.scan_objects = panfrost_gem_shrinker_scan;
pfdev->shrinker.seeks = DEFAULT_SEEKS;
WARN_ON(register_shrinker(&pfdev->shrinker, "drm-panfrost"));
pfdev->shrinker = shrinker_alloc(0, "drm-panfrost");
if (!pfdev->shrinker)
return -ENOMEM;
pfdev->shrinker->count_objects = panfrost_gem_shrinker_count;
pfdev->shrinker->scan_objects = panfrost_gem_shrinker_scan;
pfdev->shrinker->private_data = pfdev;
shrinker_register(pfdev->shrinker);
return 0;
}
/**
@ -116,7 +122,6 @@ void panfrost_gem_shrinker_cleanup(struct drm_device *dev)
{
struct panfrost_device *pfdev = dev->dev_private;
if (pfdev->shrinker.nr_deferred) {
unregister_shrinker(&pfdev->shrinker);
}
if (pfdev->shrinker)
shrinker_free(pfdev->shrinker);
}

View File

@ -73,7 +73,8 @@ static struct ttm_pool_type global_dma32_uncached[MAX_ORDER + 1];
static spinlock_t shrinker_lock;
static struct list_head shrinker_list;
static struct shrinker mm_shrinker;
static struct shrinker *mm_shrinker;
static DECLARE_RWSEM(pool_shrink_rwsem);
/* Allocate pages of size 1 << order with the given gfp_flags */
static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
@ -317,6 +318,7 @@ static unsigned int ttm_pool_shrink(void)
unsigned int num_pages;
struct page *p;
down_read(&pool_shrink_rwsem);
spin_lock(&shrinker_lock);
pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
list_move_tail(&pt->shrinker_list, &shrinker_list);
@ -329,6 +331,7 @@ static unsigned int ttm_pool_shrink(void)
} else {
num_pages = 0;
}
up_read(&pool_shrink_rwsem);
return num_pages;
}
@ -572,6 +575,18 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
}
EXPORT_SYMBOL(ttm_pool_init);
/**
* ttm_pool_synchronize_shrinkers - Wait for all running shrinkers to complete.
*
* This is useful to guarantee that all shrinker invocations have seen an
* update, before freeing memory, similar to rcu.
*/
static void ttm_pool_synchronize_shrinkers(void)
{
down_write(&pool_shrink_rwsem);
up_write(&pool_shrink_rwsem);
}
/**
* ttm_pool_fini - Cleanup a pool
*
@ -593,7 +608,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
/* We removed the pool types from the LRU, but we need to also make sure
* that no shrinker is concurrently freeing pages from the pool.
*/
synchronize_shrinkers();
ttm_pool_synchronize_shrinkers();
}
EXPORT_SYMBOL(ttm_pool_fini);
@ -734,8 +749,8 @@ static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
struct shrink_control sc = { .gfp_mask = GFP_NOFS };
fs_reclaim_acquire(GFP_KERNEL);
seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(&mm_shrinker, &sc),
ttm_pool_shrinker_scan(&mm_shrinker, &sc));
seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, &sc),
ttm_pool_shrinker_scan(mm_shrinker, &sc));
fs_reclaim_release(GFP_KERNEL);
return 0;
@ -779,10 +794,17 @@ int ttm_pool_mgr_init(unsigned long num_pages)
&ttm_pool_debugfs_shrink_fops);
#endif
mm_shrinker.count_objects = ttm_pool_shrinker_count;
mm_shrinker.scan_objects = ttm_pool_shrinker_scan;
mm_shrinker.seeks = 1;
return register_shrinker(&mm_shrinker, "drm-ttm_pool");
mm_shrinker = shrinker_alloc(0, "drm-ttm_pool");
if (!mm_shrinker)
return -ENOMEM;
mm_shrinker->count_objects = ttm_pool_shrinker_count;
mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
mm_shrinker->seeks = 1;
shrinker_register(mm_shrinker);
return 0;
}
/**
@ -802,6 +824,6 @@ void ttm_pool_mgr_fini(void)
ttm_pool_type_fini(&global_dma32_uncached[i]);
}
unregister_shrinker(&mm_shrinker);
shrinker_free(mm_shrinker);
WARN_ON(!list_empty(&shrinker_list));
}

View File

@ -543,7 +543,7 @@ struct cache_set {
struct bio_set bio_split;
/* For the btree cache */
struct shrinker shrink;
struct shrinker *shrink;
/* For the btree cache and anything allocation related */
struct mutex bucket_lock;

View File

@ -667,7 +667,7 @@ out_unlock:
static unsigned long bch_mca_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct cache_set *c = container_of(shrink, struct cache_set, shrink);
struct cache_set *c = shrink->private_data;
struct btree *b, *t;
unsigned long i, nr = sc->nr_to_scan;
unsigned long freed = 0;
@ -734,7 +734,7 @@ out:
static unsigned long bch_mca_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct cache_set *c = container_of(shrink, struct cache_set, shrink);
struct cache_set *c = shrink->private_data;
if (c->shrinker_disabled)
return 0;
@ -752,8 +752,8 @@ void bch_btree_cache_free(struct cache_set *c)
closure_init_stack(&cl);
if (c->shrink.list.next)
unregister_shrinker(&c->shrink);
if (c->shrink)
shrinker_free(c->shrink);
mutex_lock(&c->bucket_lock);
@ -828,14 +828,19 @@ int bch_btree_cache_alloc(struct cache_set *c)
c->verify_data = NULL;
#endif
c->shrink.count_objects = bch_mca_count;
c->shrink.scan_objects = bch_mca_scan;
c->shrink.seeks = 4;
c->shrink.batch = c->btree_pages * 2;
c->shrink = shrinker_alloc(0, "md-bcache:%pU", c->set_uuid);
if (!c->shrink) {
pr_warn("bcache: %s: could not allocate shrinker\n", __func__);
return 0;
}
if (register_shrinker(&c->shrink, "md-bcache:%pU", c->set_uuid))
pr_warn("bcache: %s: could not register shrinker\n",
__func__);
c->shrink->count_objects = bch_mca_count;
c->shrink->scan_objects = bch_mca_scan;
c->shrink->seeks = 4;
c->shrink->batch = c->btree_pages * 2;
c->shrink->private_data = c;
shrinker_register(c->shrink);
return 0;
}

View File

@ -866,7 +866,8 @@ STORE(__bch_cache_set)
sc.gfp_mask = GFP_KERNEL;
sc.nr_to_scan = strtoul_or_return(buf);
c->shrink.scan_objects(&c->shrink, &sc);
if (c->shrink)
c->shrink->scan_objects(c->shrink, &sc);
}
sysfs_strtoul_clamp(congested_read_threshold_us,

View File

@ -963,7 +963,7 @@ struct dm_bufio_client {
sector_t start;
struct shrinker shrinker;
struct shrinker *shrinker;
struct work_struct shrink_work;
atomic_long_t need_shrink;
@ -2368,7 +2368,7 @@ static unsigned long dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink
{
struct dm_bufio_client *c;
c = container_of(shrink, struct dm_bufio_client, shrinker);
c = shrink->private_data;
atomic_long_add(sc->nr_to_scan, &c->need_shrink);
queue_work(dm_bufio_wq, &c->shrink_work);
@ -2377,7 +2377,7 @@ static unsigned long dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink
static unsigned long dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker);
struct dm_bufio_client *c = shrink->private_data;
unsigned long count = cache_total(&c->cache);
unsigned long retain_target = get_retain_buffers(c);
unsigned long queued_for_cleanup = atomic_long_read(&c->need_shrink);
@ -2490,14 +2490,20 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
INIT_WORK(&c->shrink_work, shrink_work);
atomic_long_set(&c->need_shrink, 0);
c->shrinker.count_objects = dm_bufio_shrink_count;
c->shrinker.scan_objects = dm_bufio_shrink_scan;
c->shrinker.seeks = 1;
c->shrinker.batch = 0;
r = register_shrinker(&c->shrinker, "dm-bufio:(%u:%u)",
MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
if (r)
c->shrinker = shrinker_alloc(0, "dm-bufio:(%u:%u)",
MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
if (!c->shrinker) {
r = -ENOMEM;
goto bad;
}
c->shrinker->count_objects = dm_bufio_shrink_count;
c->shrinker->scan_objects = dm_bufio_shrink_scan;
c->shrinker->seeks = 1;
c->shrinker->batch = 0;
c->shrinker->private_data = c;
shrinker_register(c->shrinker);
mutex_lock(&dm_bufio_clients_lock);
dm_bufio_client_count++;
@ -2537,7 +2543,7 @@ void dm_bufio_client_destroy(struct dm_bufio_client *c)
drop_buffers(c);
unregister_shrinker(&c->shrinker);
shrinker_free(c->shrinker);
flush_work(&c->shrink_work);
mutex_lock(&dm_bufio_clients_lock);

View File

@ -1828,7 +1828,7 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
* Replacement block manager (new_bm) is created and old_bm destroyed outside of
* cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of
* shrinker associated with the block manager's bufio client vs cmd root_lock).
* - must take shrinker_rwsem without holding cmd->root_lock
* - must take shrinker_mutex without holding cmd->root_lock
*/
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
CACHE_MAX_CONCURRENT_LOCKS);

View File

@ -187,7 +187,7 @@ struct dmz_metadata {
struct rb_root mblk_rbtree;
struct list_head mblk_lru_list;
struct list_head mblk_dirty_list;
struct shrinker mblk_shrinker;
struct shrinker *mblk_shrinker;
/* Zone allocation management */
struct mutex map_lock;
@ -615,7 +615,7 @@ static unsigned long dmz_shrink_mblock_cache(struct dmz_metadata *zmd,
static unsigned long dmz_mblock_shrinker_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct dmz_metadata *zmd = container_of(shrink, struct dmz_metadata, mblk_shrinker);
struct dmz_metadata *zmd = shrink->private_data;
return atomic_read(&zmd->nr_mblks);
}
@ -626,7 +626,7 @@ static unsigned long dmz_mblock_shrinker_count(struct shrinker *shrink,
static unsigned long dmz_mblock_shrinker_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct dmz_metadata *zmd = container_of(shrink, struct dmz_metadata, mblk_shrinker);
struct dmz_metadata *zmd = shrink->private_data;
unsigned long count;
spin_lock(&zmd->mblk_lock);
@ -2936,19 +2936,23 @@ int dmz_ctr_metadata(struct dmz_dev *dev, int num_dev,
*/
zmd->min_nr_mblks = 2 + zmd->nr_map_blocks + zmd->zone_nr_bitmap_blocks * 16;
zmd->max_nr_mblks = zmd->min_nr_mblks + 512;
zmd->mblk_shrinker.count_objects = dmz_mblock_shrinker_count;
zmd->mblk_shrinker.scan_objects = dmz_mblock_shrinker_scan;
zmd->mblk_shrinker.seeks = DEFAULT_SEEKS;
/* Metadata cache shrinker */
ret = register_shrinker(&zmd->mblk_shrinker, "dm-zoned-meta:(%u:%u)",
MAJOR(dev->bdev->bd_dev),
MINOR(dev->bdev->bd_dev));
if (ret) {
dmz_zmd_err(zmd, "Register metadata cache shrinker failed");
zmd->mblk_shrinker = shrinker_alloc(0, "dm-zoned-meta:(%u:%u)",
MAJOR(dev->bdev->bd_dev),
MINOR(dev->bdev->bd_dev));
if (!zmd->mblk_shrinker) {
ret = -ENOMEM;
dmz_zmd_err(zmd, "Allocate metadata cache shrinker failed");
goto err;
}
zmd->mblk_shrinker->count_objects = dmz_mblock_shrinker_count;
zmd->mblk_shrinker->scan_objects = dmz_mblock_shrinker_scan;
zmd->mblk_shrinker->private_data = zmd;
shrinker_register(zmd->mblk_shrinker);
dmz_zmd_info(zmd, "DM-Zoned metadata version %d", zmd->sb_version);
for (i = 0; i < zmd->nr_devs; i++)
dmz_print_dev(zmd, i);
@ -2995,7 +2999,7 @@ err:
*/
void dmz_dtr_metadata(struct dmz_metadata *zmd)
{
unregister_shrinker(&zmd->mblk_shrinker);
shrinker_free(zmd->mblk_shrinker);
dmz_cleanup_metadata(zmd);
kfree(zmd);
}

View File

@ -7378,7 +7378,7 @@ static void free_conf(struct r5conf *conf)
log_exit(conf);
unregister_shrinker(&conf->shrinker);
shrinker_free(conf->shrinker);
free_thread_groups(conf);
shrink_stripes(conf);
raid5_free_percpu(conf);
@ -7426,7 +7426,7 @@ static int raid5_alloc_percpu(struct r5conf *conf)
static unsigned long raid5_cache_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct r5conf *conf = container_of(shrink, struct r5conf, shrinker);
struct r5conf *conf = shrink->private_data;
unsigned long ret = SHRINK_STOP;
if (mutex_trylock(&conf->cache_size_mutex)) {
@ -7447,7 +7447,7 @@ static unsigned long raid5_cache_scan(struct shrinker *shrink,
static unsigned long raid5_cache_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct r5conf *conf = container_of(shrink, struct r5conf, shrinker);
struct r5conf *conf = shrink->private_data;
if (conf->max_nr_stripes < conf->min_nr_stripes)
/* unlikely, but not impossible */
@ -7682,18 +7682,22 @@ static struct r5conf *setup_conf(struct mddev *mddev)
* it reduces the queue depth and so can hurt throughput.
* So set it rather large, scaled by number of devices.
*/
conf->shrinker.seeks = DEFAULT_SEEKS * conf->raid_disks * 4;
conf->shrinker.scan_objects = raid5_cache_scan;
conf->shrinker.count_objects = raid5_cache_count;
conf->shrinker.batch = 128;
conf->shrinker.flags = 0;
ret = register_shrinker(&conf->shrinker, "md-raid5:%s", mdname(mddev));
if (ret) {
pr_warn("md/raid:%s: couldn't register shrinker.\n",
conf->shrinker = shrinker_alloc(0, "md-raid5:%s", mdname(mddev));
if (!conf->shrinker) {
ret = -ENOMEM;
pr_warn("md/raid:%s: couldn't allocate shrinker.\n",
mdname(mddev));
goto abort;
}
conf->shrinker->seeks = DEFAULT_SEEKS * conf->raid_disks * 4;
conf->shrinker->scan_objects = raid5_cache_scan;
conf->shrinker->count_objects = raid5_cache_count;
conf->shrinker->batch = 128;
conf->shrinker->private_data = conf;
shrinker_register(conf->shrinker);
sprintf(pers_name, "raid%d", mddev->new_level);
rcu_assign_pointer(conf->thread,
md_register_thread(raid5d, mddev, pers_name));

View File

@ -670,7 +670,7 @@ struct r5conf {
wait_queue_head_t wait_for_stripe;
wait_queue_head_t wait_for_overlap;
unsigned long cache_state;
struct shrinker shrinker;
struct shrinker *shrinker;
int pool_size; /* number of disks in stripeheads in pool */
spinlock_t device_lock;
struct disk_info *disks;

View File

@ -380,16 +380,7 @@ struct vmballoon {
/**
* @shrinker: shrinker interface that is used to avoid over-inflation.
*/
struct shrinker shrinker;
/**
* @shrinker_registered: whether the shrinker was registered.
*
* The shrinker interface does not handle gracefully the removal of
* shrinker that was not registered before. This indication allows to
* simplify the unregistration process.
*/
bool shrinker_registered;
struct shrinker *shrinker;
};
static struct vmballoon balloon;
@ -1568,29 +1559,27 @@ static unsigned long vmballoon_shrinker_count(struct shrinker *shrinker,
static void vmballoon_unregister_shrinker(struct vmballoon *b)
{
if (b->shrinker_registered)
unregister_shrinker(&b->shrinker);
b->shrinker_registered = false;
shrinker_free(b->shrinker);
b->shrinker = NULL;
}
static int vmballoon_register_shrinker(struct vmballoon *b)
{
int r;
/* Do nothing if the shrinker is not enabled */
if (!vmwballoon_shrinker_enable)
return 0;
b->shrinker.scan_objects = vmballoon_shrinker_scan;
b->shrinker.count_objects = vmballoon_shrinker_count;
b->shrinker.seeks = DEFAULT_SEEKS;
b->shrinker = shrinker_alloc(0, "vmw-balloon");
if (!b->shrinker)
return -ENOMEM;
r = register_shrinker(&b->shrinker, "vmw-balloon");
b->shrinker->scan_objects = vmballoon_shrinker_scan;
b->shrinker->count_objects = vmballoon_shrinker_count;
b->shrinker->private_data = b;
if (r == 0)
b->shrinker_registered = true;
shrinker_register(b->shrinker);
return r;
return 0;
}
/*
@ -1883,7 +1872,7 @@ static int __init vmballoon_init(void)
error = vmballoon_register_shrinker(&balloon);
if (error)
goto fail;
return error;
/*
* Initialization of compaction must be done after the call to
@ -1905,9 +1894,6 @@ static int __init vmballoon_init(void)
vmballoon_debugfs_init(&balloon);
return 0;
fail:
vmballoon_unregister_shrinker(&balloon);
return error;
}
/*

View File

@ -111,7 +111,7 @@ struct virtio_balloon {
struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
/* Shrinker to return free pages - VIRTIO_BALLOON_F_FREE_PAGE_HINT */
struct shrinker shrinker;
struct shrinker *shrinker;
/* OOM notifier to deflate on OOM - VIRTIO_BALLOON_F_DEFLATE_ON_OOM */
struct notifier_block oom_nb;
@ -820,8 +820,7 @@ static unsigned long shrink_free_pages(struct virtio_balloon *vb,
static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
struct shrink_control *sc)
{
struct virtio_balloon *vb = container_of(shrinker,
struct virtio_balloon, shrinker);
struct virtio_balloon *vb = shrinker->private_data;
return shrink_free_pages(vb, sc->nr_to_scan);
}
@ -829,8 +828,7 @@ static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,
static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
struct shrink_control *sc)
{
struct virtio_balloon *vb = container_of(shrinker,
struct virtio_balloon, shrinker);
struct virtio_balloon *vb = shrinker->private_data;
return vb->num_free_page_blocks * VIRTIO_BALLOON_HINT_BLOCK_PAGES;
}
@ -851,16 +849,22 @@ static int virtio_balloon_oom_notify(struct notifier_block *nb,
static void virtio_balloon_unregister_shrinker(struct virtio_balloon *vb)
{
unregister_shrinker(&vb->shrinker);
shrinker_free(vb->shrinker);
}
static int virtio_balloon_register_shrinker(struct virtio_balloon *vb)
{
vb->shrinker.scan_objects = virtio_balloon_shrinker_scan;
vb->shrinker.count_objects = virtio_balloon_shrinker_count;
vb->shrinker.seeks = DEFAULT_SEEKS;
vb->shrinker = shrinker_alloc(0, "virtio-balloon");
if (!vb->shrinker)
return -ENOMEM;
return register_shrinker(&vb->shrinker, "virtio-balloon");
vb->shrinker->scan_objects = virtio_balloon_shrinker_scan;
vb->shrinker->count_objects = virtio_balloon_shrinker_count;
vb->shrinker->private_data = vb;
shrinker_register(vb->shrinker);
return 0;
}
static int virtballoon_probe(struct virtio_device *vdev)

View File

@ -284,13 +284,9 @@ static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
return 0;
}
static struct shrinker backend_memory_shrinker = {
.count_objects = backend_shrink_memory_count,
.seeks = DEFAULT_SEEKS,
};
static int __init xenbus_probe_backend_init(void)
{
struct shrinker *backend_memory_shrinker;
static struct notifier_block xenstore_notifier = {
.notifier_call = backend_probe_and_watch
};
@ -305,8 +301,15 @@ static int __init xenbus_probe_backend_init(void)
register_xenstore_notifier(&xenstore_notifier);
if (register_shrinker(&backend_memory_shrinker, "xen-backend"))
pr_warn("shrinker registration failed\n");
backend_memory_shrinker = shrinker_alloc(0, "xen-backend");
if (!backend_memory_shrinker) {
pr_warn("shrinker allocation failed\n");
return 0;
}
backend_memory_shrinker->count_objects = backend_shrink_memory_count;
shrinker_register(backend_memory_shrinker);
return 0;
}

View File

@ -285,8 +285,7 @@ static int btree_node_write_and_reclaim(struct bch_fs *c, struct btree *b)
static unsigned long bch2_btree_cache_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct bch_fs *c = container_of(shrink, struct bch_fs,
btree_cache.shrink);
struct bch_fs *c = shrink->private_data;
struct btree_cache *bc = &c->btree_cache;
struct btree *b, *t;
unsigned long nr = sc->nr_to_scan;
@ -384,8 +383,7 @@ out_nounlock:
static unsigned long bch2_btree_cache_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct bch_fs *c = container_of(shrink, struct bch_fs,
btree_cache.shrink);
struct bch_fs *c = shrink->private_data;
struct btree_cache *bc = &c->btree_cache;
if (bch2_btree_shrinker_disabled)
@ -400,7 +398,7 @@ void bch2_fs_btree_cache_exit(struct bch_fs *c)
struct btree *b;
unsigned i, flags;
unregister_shrinker(&bc->shrink);
shrinker_free(bc->shrink);
/* vfree() can allocate memory: */
flags = memalloc_nofs_save();
@ -454,6 +452,7 @@ void bch2_fs_btree_cache_exit(struct bch_fs *c)
int bch2_fs_btree_cache_init(struct bch_fs *c)
{
struct btree_cache *bc = &c->btree_cache;
struct shrinker *shrink;
unsigned i;
int ret = 0;
@ -473,12 +472,15 @@ int bch2_fs_btree_cache_init(struct bch_fs *c)
mutex_init(&c->verify_lock);
bc->shrink.count_objects = bch2_btree_cache_count;
bc->shrink.scan_objects = bch2_btree_cache_scan;
bc->shrink.seeks = 4;
ret = register_shrinker(&bc->shrink, "%s/btree_cache", c->name);
if (ret)
shrink = shrinker_alloc(0, "%s/btree_cache", c->name);
if (!shrink)
goto err;
bc->shrink = shrink;
shrink->count_objects = bch2_btree_cache_count;
shrink->scan_objects = bch2_btree_cache_scan;
shrink->seeks = 4;
shrink->private_data = c;
shrinker_register(shrink);
return 0;
err:

View File

@ -834,8 +834,7 @@ void bch2_btree_key_cache_drop(struct btree_trans *trans,
static unsigned long bch2_btree_key_cache_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct bch_fs *c = container_of(shrink, struct bch_fs,
btree_key_cache.shrink);
struct bch_fs *c = shrink->private_data;
struct btree_key_cache *bc = &c->btree_key_cache;
struct bucket_table *tbl;
struct bkey_cached *ck, *t;
@ -932,8 +931,7 @@ out:
static unsigned long bch2_btree_key_cache_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct bch_fs *c = container_of(shrink, struct bch_fs,
btree_key_cache.shrink);
struct bch_fs *c = shrink->private_data;
struct btree_key_cache *bc = &c->btree_key_cache;
long nr = atomic_long_read(&bc->nr_keys) -
atomic_long_read(&bc->nr_dirty);
@ -953,7 +951,7 @@ void bch2_fs_btree_key_cache_exit(struct btree_key_cache *bc)
int cpu;
#endif
unregister_shrinker(&bc->shrink);
shrinker_free(bc->shrink);
mutex_lock(&bc->lock);
@ -1027,6 +1025,7 @@ void bch2_fs_btree_key_cache_init_early(struct btree_key_cache *c)
int bch2_fs_btree_key_cache_init(struct btree_key_cache *bc)
{
struct bch_fs *c = container_of(bc, struct bch_fs, btree_key_cache);
struct shrinker *shrink;
#ifdef __KERNEL__
bc->pcpu_freed = alloc_percpu(struct btree_key_cache_freelist);
@ -1039,11 +1038,15 @@ int bch2_fs_btree_key_cache_init(struct btree_key_cache *bc)
bc->table_init_done = true;
bc->shrink.seeks = 0;
bc->shrink.count_objects = bch2_btree_key_cache_count;
bc->shrink.scan_objects = bch2_btree_key_cache_scan;
if (register_shrinker(&bc->shrink, "%s/btree_key_cache", c->name))
shrink = shrinker_alloc(0, "%s/btree_key_cache", c->name);
if (!shrink)
return -BCH_ERR_ENOMEM_fs_btree_cache_init;
bc->shrink = shrink;
shrink->seeks = 0;
shrink->count_objects = bch2_btree_key_cache_count;
shrink->scan_objects = bch2_btree_key_cache_scan;
shrink->private_data = c;
shrinker_register(shrink);
return 0;
}

View File

@ -163,7 +163,7 @@ struct btree_cache {
unsigned used;
unsigned reserve;
atomic_t dirty;
struct shrinker shrink;
struct shrinker *shrink;
/*
* If we need to allocate memory for a new btree node and that
@ -321,7 +321,7 @@ struct btree_key_cache {
bool table_init_done;
struct list_head freed_pcpu;
struct list_head freed_nonpcpu;
struct shrinker shrink;
struct shrinker *shrink;
unsigned shrink_iter;
struct btree_key_cache_freelist __percpu *pcpu_freed;

View File

@ -1904,7 +1904,7 @@ got_sb:
sb->s_flags |= SB_POSIXACL;
#endif
sb->s_shrink.seeks = 0;
sb->s_shrink->seeks = 0;
vinode = bch2_vfs_inode_get(c, BCACHEFS_ROOT_SUBVOL_INUM);
ret = PTR_ERR_OR_ZERO(vinode);

View File

@ -494,7 +494,7 @@ STORE(bch2_fs)
sc.gfp_mask = GFP_KERNEL;
sc.nr_to_scan = strtoul_or_return(buf);
c->btree_cache.shrink.scan_objects(&c->btree_cache.shrink, &sc);
c->btree_cache.shrink->scan_objects(c->btree_cache.shrink, &sc);
}
if (attr == &sysfs_btree_wakeup)

View File

@ -1472,7 +1472,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
error = -EBUSY;
} else {
snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s", fs_type->name,
shrinker_debugfs_rename(s->s_shrink, "sb-%s:%s", fs_type->name,
s->s_id);
btrfs_sb(s)->bdev_holder = fs_type;
error = btrfs_fill_super(s, fs_devices, data);

View File

@ -282,13 +282,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate)
} while (tmp != bh);
spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
/*
* If all of the buffers are uptodate then we can set the page
* uptodate.
*/
if (folio_uptodate)
folio_mark_uptodate(folio);
folio_unlock(folio);
folio_end_read(folio, folio_uptodate);
return;
still_busy:
@ -915,16 +909,12 @@ int remove_inode_buffers(struct inode *inode)
* which may not fail from ordinary buffer allocations.
*/
struct buffer_head *folio_alloc_buffers(struct folio *folio, unsigned long size,
bool retry)
gfp_t gfp)
{
struct buffer_head *bh, *head;
gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
long offset;
struct mem_cgroup *memcg, *old_memcg;
if (retry)
gfp |= __GFP_NOFAIL;
/* The folio lock pins the memcg */
memcg = folio_memcg(folio);
old_memcg = set_active_memcg(memcg);
@ -967,7 +957,11 @@ EXPORT_SYMBOL_GPL(folio_alloc_buffers);
struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
bool retry)
{
return folio_alloc_buffers(page_folio(page), size, retry);
gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
if (retry)
gfp |= __GFP_NOFAIL;
return folio_alloc_buffers(page_folio(page), size, gfp);
}
EXPORT_SYMBOL_GPL(alloc_page_buffers);
@ -1043,20 +1037,11 @@ grow_dev_page(struct block_device *bdev, sector_t block,
struct buffer_head *bh;
sector_t end_block;
int ret = 0;
gfp_t gfp_mask;
gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp;
/*
* XXX: __getblk_slow() can not really deal with failure and
* will endlessly loop on improvised global reclaim. Prefer
* looping in the allocator rather than here, at least that
* code knows what it's doing.
*/
gfp_mask |= __GFP_NOFAIL;
folio = __filemap_get_folio(inode->i_mapping, index,
FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp_mask);
FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
if (IS_ERR(folio))
return PTR_ERR(folio);
bh = folio_buffers(folio);
if (bh) {
@ -1069,7 +1054,10 @@ grow_dev_page(struct block_device *bdev, sector_t block,
goto failed;
}
bh = folio_alloc_buffers(folio, size, true);
ret = -ENOMEM;
bh = folio_alloc_buffers(folio, size, gfp | __GFP_ACCOUNT);
if (!bh)
goto failed;
/*
* Link the folio to the buffers and initialise them. Take the
@ -1420,33 +1408,36 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)
}
EXPORT_SYMBOL(__find_get_block);
/*
* __getblk_gfp() will locate (and, if necessary, create) the buffer_head
* which corresponds to the passed block_device, block and size. The
* returned buffer has its reference count incremented.
/**
* bdev_getblk - Get a buffer_head in a block device's buffer cache.
* @bdev: The block device.
* @block: The block number.
* @size: The size of buffer_heads for this @bdev.
* @gfp: The memory allocation flags to use.
*
* __getblk_gfp() will lock up the machine if grow_dev_page's
* try_to_free_buffers() attempt is failing. FIXME, perhaps?
* Return: The buffer head, or NULL if memory could not be allocated.
*/
struct buffer_head *
__getblk_gfp(struct block_device *bdev, sector_t block,
unsigned size, gfp_t gfp)
struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
unsigned size, gfp_t gfp)
{
struct buffer_head *bh = __find_get_block(bdev, block, size);
might_sleep();
if (bh == NULL)
bh = __getblk_slow(bdev, block, size, gfp);
return bh;
might_alloc(gfp);
if (bh)
return bh;
return __getblk_slow(bdev, block, size, gfp);
}
EXPORT_SYMBOL(__getblk_gfp);
EXPORT_SYMBOL(bdev_getblk);
/*
* Do async read-ahead on a buffer..
*/
void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
{
struct buffer_head *bh = __getblk(bdev, block, size);
struct buffer_head *bh = bdev_getblk(bdev, block, size,
GFP_NOWAIT | __GFP_MOVABLE);
if (likely(bh)) {
bh_readahead(bh, REQ_RAHEAD);
brelse(bh);
@ -1470,7 +1461,17 @@ struct buffer_head *
__bread_gfp(struct block_device *bdev, sector_t block,
unsigned size, gfp_t gfp)
{
struct buffer_head *bh = __getblk_gfp(bdev, block, size, gfp);
struct buffer_head *bh;
gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
/*
* Prefer looping in the allocator rather than here, at least that
* code knows what it's doing.
*/
gfp |= __GFP_NOFAIL;
bh = bdev_getblk(bdev, block, size, gfp);
if (likely(bh) && !buffer_uptodate(bh))
bh = __bread_slow(bh);
@ -1640,12 +1641,13 @@ EXPORT_SYMBOL(block_invalidate_folio);
* block_dirty_folio() via private_lock. try_to_free_buffers
* is already excluded via the folio lock.
*/
void folio_create_empty_buffers(struct folio *folio, unsigned long blocksize,
unsigned long b_state)
struct buffer_head *create_empty_buffers(struct folio *folio,
unsigned long blocksize, unsigned long b_state)
{
struct buffer_head *bh, *head, *tail;
gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT | __GFP_NOFAIL;
head = folio_alloc_buffers(folio, blocksize, true);
head = folio_alloc_buffers(folio, blocksize, gfp);
bh = head;
do {
bh->b_state |= b_state;
@ -1667,13 +1669,8 @@ void folio_create_empty_buffers(struct folio *folio, unsigned long blocksize,
}
folio_attach_private(folio, head);
spin_unlock(&folio->mapping->private_lock);
}
EXPORT_SYMBOL(folio_create_empty_buffers);
void create_empty_buffers(struct page *page,
unsigned long blocksize, unsigned long b_state)
{
folio_create_empty_buffers(page_folio(page), blocksize, b_state);
return head;
}
EXPORT_SYMBOL(create_empty_buffers);
@ -1768,13 +1765,15 @@ static struct buffer_head *folio_create_buffers(struct folio *folio,
struct inode *inode,
unsigned int b_state)
{
struct buffer_head *bh;
BUG_ON(!folio_test_locked(folio));
if (!folio_buffers(folio))
folio_create_empty_buffers(folio,
1 << READ_ONCE(inode->i_blkbits),
b_state);
return folio_buffers(folio);
bh = folio_buffers(folio);
if (!bh)
bh = create_empty_buffers(folio,
1 << READ_ONCE(inode->i_blkbits), b_state);
return bh;
}
/*
@ -2425,12 +2424,10 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
if (!nr) {
/*
* All buffers are uptodate - we can set the folio uptodate
* as well. But not if get_block() returned an error.
* All buffers are uptodate or get_block() returned an
* error when trying to map them - we can finish the read.
*/
if (!page_error)
folio_mark_uptodate(folio);
folio_unlock(folio);
folio_end_read(folio, !page_error);
return 0;
}
@ -2676,10 +2673,8 @@ int block_truncate_page(struct address_space *mapping,
return PTR_ERR(folio);
bh = folio_buffers(folio);
if (!bh) {
folio_create_empty_buffers(folio, blocksize, 0);
bh = folio_buffers(folio);
}
if (!bh)
bh = create_empty_buffers(folio, blocksize, 0);
/* Find the buffer that contains "offset" */
offset = offset_in_folio(folio, from);

View File

@ -412,23 +412,23 @@ static struct page *dax_busy_page(void *entry)
return NULL;
}
/*
* dax_lock_page - Lock the DAX entry corresponding to a page
* @page: The page whose entry we want to lock
/**
* dax_lock_folio - Lock the DAX entry corresponding to a folio
* @folio: The folio whose entry we want to lock
*
* Context: Process context.
* Return: A cookie to pass to dax_unlock_page() or 0 if the entry could
* Return: A cookie to pass to dax_unlock_folio() or 0 if the entry could
* not be locked.
*/
dax_entry_t dax_lock_page(struct page *page)
dax_entry_t dax_lock_folio(struct folio *folio)
{
XA_STATE(xas, NULL, 0);
void *entry;
/* Ensure page->mapping isn't freed while we look at it */
/* Ensure folio->mapping isn't freed while we look at it */
rcu_read_lock();
for (;;) {
struct address_space *mapping = READ_ONCE(page->mapping);
struct address_space *mapping = READ_ONCE(folio->mapping);
entry = NULL;
if (!mapping || !dax_mapping(mapping))
@ -447,11 +447,11 @@ dax_entry_t dax_lock_page(struct page *page)
xas.xa = &mapping->i_pages;
xas_lock_irq(&xas);
if (mapping != page->mapping) {
if (mapping != folio->mapping) {
xas_unlock_irq(&xas);
continue;
}
xas_set(&xas, page->index);
xas_set(&xas, folio->index);
entry = xas_load(&xas);
if (dax_is_locked(entry)) {
rcu_read_unlock();
@ -467,10 +467,10 @@ dax_entry_t dax_lock_page(struct page *page)
return (dax_entry_t)entry;
}
void dax_unlock_page(struct page *page, dax_entry_t cookie)
void dax_unlock_folio(struct folio *folio, dax_entry_t cookie)
{
struct address_space *mapping = page->mapping;
XA_STATE(xas, &mapping->i_pages, page->index);
struct address_space *mapping = folio->mapping;
XA_STATE(xas, &mapping->i_pages, folio->index);
if (S_ISCHR(mapping->host->i_mode))
return;

View File

@ -264,19 +264,24 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
return freed;
}
static struct shrinker erofs_shrinker_info = {
.scan_objects = erofs_shrink_scan,
.count_objects = erofs_shrink_count,
.seeks = DEFAULT_SEEKS,
};
static struct shrinker *erofs_shrinker_info;
int __init erofs_init_shrinker(void)
{
return register_shrinker(&erofs_shrinker_info, "erofs-shrinker");
erofs_shrinker_info = shrinker_alloc(0, "erofs-shrinker");
if (!erofs_shrinker_info)
return -ENOMEM;
erofs_shrinker_info->count_objects = erofs_shrink_count;
erofs_shrinker_info->scan_objects = erofs_shrink_scan;
shrinker_register(erofs_shrinker_info);
return 0;
}
void erofs_exit_shrinker(void)
{
unregister_shrinker(&erofs_shrinker_info);
shrinker_free(erofs_shrinker_info);
}
#endif /* !CONFIG_EROFS_FS_ZIP */

View File

@ -713,7 +713,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, unsigned long shift)
* process cleanup to remove whatever mess we made.
*/
if (length != move_page_tables(vma, old_start,
vma, new_start, length, false))
vma, new_start, length, false, true))
return -ENOMEM;
lru_add_drain();
@ -986,8 +986,6 @@ static int exec_mmap(struct mm_struct *mm)
tsk = current;
old_mm = current->mm;
exec_mm_release(tsk, old_mm);
if (old_mm)
sync_mm_rss(old_mm);
ret = down_write_killable(&tsk->signal->exec_update_lock);
if (ret)

View File

@ -1664,7 +1664,7 @@ struct ext4_sb_info {
__u32 s_csum_seed;
/* Reclaim extents from extent status tree */
struct shrinker s_es_shrinker;
struct shrinker *s_es_shrinker;
struct list_head s_es_list; /* List of inodes with reclaimable extents */
long s_es_nr_inode;
struct ext4_es_stats s_es_stats;

View File

@ -1632,7 +1632,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
unsigned long nr;
struct ext4_sb_info *sbi;
sbi = container_of(shrink, struct ext4_sb_info, s_es_shrinker);
sbi = shrink->private_data;
nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt);
trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr);
return nr;
@ -1641,8 +1641,7 @@ static unsigned long ext4_es_count(struct shrinker *shrink,
static unsigned long ext4_es_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct ext4_sb_info *sbi = container_of(shrink,
struct ext4_sb_info, s_es_shrinker);
struct ext4_sb_info *sbi = shrink->private_data;
int nr_to_scan = sc->nr_to_scan;
int ret, nr_shrunk;
@ -1726,13 +1725,17 @@ int ext4_es_register_shrinker(struct ext4_sb_info *sbi)
if (err)
goto err3;
sbi->s_es_shrinker.scan_objects = ext4_es_scan;
sbi->s_es_shrinker.count_objects = ext4_es_count;
sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
err = register_shrinker(&sbi->s_es_shrinker, "ext4-es:%s",
sbi->s_sb->s_id);
if (err)
sbi->s_es_shrinker = shrinker_alloc(0, "ext4-es:%s", sbi->s_sb->s_id);
if (!sbi->s_es_shrinker) {
err = -ENOMEM;
goto err4;
}
sbi->s_es_shrinker->scan_objects = ext4_es_scan;
sbi->s_es_shrinker->count_objects = ext4_es_count;
sbi->s_es_shrinker->private_data = sbi;
shrinker_register(sbi->s_es_shrinker);
return 0;
err4:
@ -1752,7 +1755,7 @@ void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi)
percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_misses);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt);
percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt);
unregister_shrinker(&sbi->s_es_shrinker);
shrinker_free(sbi->s_es_shrinker);
}
/*

View File

@ -1032,10 +1032,8 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
BUG_ON(from > to);
head = folio_buffers(folio);
if (!head) {
create_empty_buffers(&folio->page, blocksize, 0);
head = folio_buffers(folio);
}
if (!head)
head = create_empty_buffers(folio, blocksize, 0);
bbits = ilog2(blocksize);
block = (sector_t)folio->index << (PAGE_SHIFT - bbits);
@ -1165,7 +1163,7 @@ retry_grab:
* starting the handle.
*/
if (!folio_buffers(folio))
create_empty_buffers(&folio->page, inode->i_sb->s_blocksize, 0);
create_empty_buffers(folio, inode->i_sb->s_blocksize, 0);
folio_unlock(folio);
@ -3655,10 +3653,8 @@ static int __ext4_block_zero_page_range(handle_t *handle,
iblock = index << (PAGE_SHIFT - inode->i_sb->s_blocksize_bits);
bh = folio_buffers(folio);
if (!bh) {
create_empty_buffers(&folio->page, blocksize, 0);
bh = folio_buffers(folio);
}
if (!bh)
bh = create_empty_buffers(folio, blocksize, 0);
/* Find the buffer that contains "offset" */
pos = blocksize;

View File

@ -183,10 +183,8 @@ mext_page_mkuptodate(struct folio *folio, unsigned from, unsigned to)
blocksize = i_blocksize(inode);
head = folio_buffers(folio);
if (!head) {
create_empty_buffers(&folio->page, blocksize, 0);
head = folio_buffers(folio);
}
if (!head)
head = create_empty_buffers(folio, blocksize, 0);
block = (sector_t)folio->index << (PAGE_SHIFT - inode->i_blkbits);
for (bh = head, block_start = 0; bh != head || !block_start;
@ -380,9 +378,10 @@ data_copy:
}
/* Perform all necessary steps similar write_begin()/write_end()
* but keeping in mind that i_size will not change */
if (!folio_buffers(folio[0]))
create_empty_buffers(&folio[0]->page, 1 << orig_inode->i_blkbits, 0);
bh = folio_buffers(folio[0]);
if (!bh)
bh = create_empty_buffers(folio[0],
1 << orig_inode->i_blkbits, 0);
for (i = 0; i < data_offset_in_page; i++)
bh = bh->b_this_page;
for (i = 0; i < block_len_in_page; i++) {

View File

@ -70,15 +70,8 @@ static void __read_end_io(struct bio *bio)
{
struct folio_iter fi;
bio_for_each_folio_all(fi, bio) {
struct folio *folio = fi.folio;
if (bio->bi_status)
folio_clear_uptodate(folio);
else
folio_mark_uptodate(folio);
folio_unlock(folio);
}
bio_for_each_folio_all(fi, bio)
folio_end_read(fi.folio, bio->bi_status == 0);
if (bio->bi_private)
mempool_free(bio->bi_private, bio_post_read_ctx_pool);
bio_put(bio);
@ -336,8 +329,7 @@ int ext4_mpage_readpages(struct inode *inode,
if (ext4_need_verity(inode, folio->index) &&
!fsverity_verify_folio(folio))
goto set_error_page;
folio_mark_uptodate(folio);
folio_unlock(folio);
folio_end_read(folio, true);
continue;
}
} else if (fully_mapped) {

View File

@ -244,18 +244,25 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
blk_opf_t op_flags)
{
return __ext4_sb_bread_gfp(sb, block, op_flags, __GFP_MOVABLE);
gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
~__GFP_FS) | __GFP_MOVABLE;
return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
}
struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
sector_t block)
{
return __ext4_sb_bread_gfp(sb, block, 0, 0);
gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
~__GFP_FS);
return __ext4_sb_bread_gfp(sb, block, 0, gfp);
}
void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
{
struct buffer_head *bh = sb_getblk_gfp(sb, block, 0);
struct buffer_head *bh = bdev_getblk(sb->s_bdev, block,
sb->s_blocksize, GFP_NOWAIT | __GFP_NOWARN);
if (likely(bh)) {
if (trylock_buffer(bh))

View File

@ -83,11 +83,26 @@ void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
#endif
/* f2fs-wide shrinker description */
static struct shrinker f2fs_shrinker_info = {
.scan_objects = f2fs_shrink_scan,
.count_objects = f2fs_shrink_count,
.seeks = DEFAULT_SEEKS,
};
static struct shrinker *f2fs_shrinker_info;
static int __init f2fs_init_shrinker(void)
{
f2fs_shrinker_info = shrinker_alloc(0, "f2fs-shrinker");
if (!f2fs_shrinker_info)
return -ENOMEM;
f2fs_shrinker_info->count_objects = f2fs_shrink_count;
f2fs_shrinker_info->scan_objects = f2fs_shrink_scan;
shrinker_register(f2fs_shrinker_info);
return 0;
}
static void f2fs_exit_shrinker(void)
{
shrinker_free(f2fs_shrinker_info);
}
enum {
Opt_gc_background,
@ -4940,7 +4955,7 @@ static int __init init_f2fs_fs(void)
err = f2fs_init_sysfs();
if (err)
goto free_garbage_collection_cache;
err = register_shrinker(&f2fs_shrinker_info, "f2fs-shrinker");
err = f2fs_init_shrinker();
if (err)
goto free_sysfs;
err = register_filesystem(&f2fs_fs_type);
@ -4985,7 +5000,7 @@ free_root_stats:
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
free_shrinker:
unregister_shrinker(&f2fs_shrinker_info);
f2fs_exit_shrinker();
free_sysfs:
f2fs_exit_sysfs();
free_garbage_collection_cache:
@ -5017,7 +5032,7 @@ static void __exit exit_f2fs_fs(void)
f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
unregister_shrinker(&f2fs_shrinker_info);
f2fs_exit_shrinker();
f2fs_exit_sysfs();
f2fs_destroy_garbage_collection_cache();
f2fs_destroy_extent_cache();

View File

@ -130,7 +130,7 @@ static int __gfs2_jdata_write_folio(struct folio *folio,
if (folio_test_checked(folio)) {
folio_clear_checked(folio);
if (!folio_buffers(folio)) {
folio_create_empty_buffers(folio,
create_empty_buffers(folio,
inode->i_sb->s_blocksize,
BIT(BH_Dirty)|BIT(BH_Uptodate));
}

View File

@ -43,53 +43,51 @@ struct metapath {
static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
/**
* gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
* gfs2_unstuffer_folio - unstuff a stuffed inode into a block cached by a folio
* @ip: the inode
* @dibh: the dinode buffer
* @block: the block number that was allocated
* @page: The (optional) page. This is looked up if @page is NULL
* @folio: The folio.
*
* Returns: errno
*/
static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
u64 block, struct page *page)
static int gfs2_unstuffer_folio(struct gfs2_inode *ip, struct buffer_head *dibh,
u64 block, struct folio *folio)
{
struct inode *inode = &ip->i_inode;
if (!PageUptodate(page)) {
void *kaddr = kmap(page);
if (!folio_test_uptodate(folio)) {
void *kaddr = kmap_local_folio(folio, 0);
u64 dsize = i_size_read(inode);
memcpy(kaddr, dibh->b_data + sizeof(struct gfs2_dinode), dsize);
memset(kaddr + dsize, 0, PAGE_SIZE - dsize);
kunmap(page);
memset(kaddr + dsize, 0, folio_size(folio) - dsize);
kunmap_local(kaddr);
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
if (gfs2_is_jdata(ip)) {
struct buffer_head *bh;
struct buffer_head *bh = folio_buffers(folio);
if (!page_has_buffers(page))
create_empty_buffers(page, BIT(inode->i_blkbits),
BIT(BH_Uptodate));
if (!bh)
bh = create_empty_buffers(folio,
BIT(inode->i_blkbits), BIT(BH_Uptodate));
bh = page_buffers(page);
if (!buffer_mapped(bh))
map_bh(bh, inode->i_sb, block);
set_buffer_uptodate(bh);
gfs2_trans_add_data(ip->i_gl, bh);
} else {
set_page_dirty(page);
folio_mark_dirty(folio);
gfs2_ordered_add_inode(ip);
}
return 0;
}
static int __gfs2_unstuff_inode(struct gfs2_inode *ip, struct page *page)
static int __gfs2_unstuff_inode(struct gfs2_inode *ip, struct folio *folio)
{
struct buffer_head *bh, *dibh;
struct gfs2_dinode *di;
@ -118,7 +116,7 @@ static int __gfs2_unstuff_inode(struct gfs2_inode *ip, struct page *page)
dibh, sizeof(struct gfs2_dinode));
brelse(bh);
} else {
error = gfs2_unstuffer_page(ip, dibh, block, page);
error = gfs2_unstuffer_folio(ip, dibh, block, folio);
if (error)
goto out_brelse;
}
@ -157,17 +155,17 @@ out_brelse:
int gfs2_unstuff_dinode(struct gfs2_inode *ip)
{
struct inode *inode = &ip->i_inode;
struct page *page;
struct folio *folio;
int error;
down_write(&ip->i_rw_mutex);
page = grab_cache_page(inode->i_mapping, 0);
error = -ENOMEM;
if (!page)
folio = filemap_grab_folio(inode->i_mapping, 0);
error = PTR_ERR(folio);
if (IS_ERR(folio))
goto out;
error = __gfs2_unstuff_inode(ip, page);
unlock_page(page);
put_page(page);
error = __gfs2_unstuff_inode(ip, folio);
folio_unlock(folio);
folio_put(folio);
out:
up_write(&ip->i_rw_mutex);
return error;

View File

@ -2041,11 +2041,7 @@ static unsigned long gfs2_glock_shrink_count(struct shrinker *shrink,
return vfs_pressure_ratio(atomic_read(&lru_count));
}
static struct shrinker glock_shrinker = {
.seeks = DEFAULT_SEEKS,
.count_objects = gfs2_glock_shrink_count,
.scan_objects = gfs2_glock_shrink_scan,
};
static struct shrinker *glock_shrinker;
/**
* glock_hash_walk - Call a function for glock in a hash bucket
@ -2465,13 +2461,18 @@ int __init gfs2_glock_init(void)
return -ENOMEM;
}
ret = register_shrinker(&glock_shrinker, "gfs2-glock");
if (ret) {
glock_shrinker = shrinker_alloc(0, "gfs2-glock");
if (!glock_shrinker) {
destroy_workqueue(glock_workqueue);
rhashtable_destroy(&gl_hash_table);
return ret;
return -ENOMEM;
}
glock_shrinker->count_objects = gfs2_glock_shrink_count;
glock_shrinker->scan_objects = gfs2_glock_shrink_scan;
shrinker_register(glock_shrinker);
for (i = 0; i < GLOCK_WAIT_TABLE_SIZE; i++)
init_waitqueue_head(glock_wait_table + i);
@ -2480,7 +2481,7 @@ int __init gfs2_glock_init(void)
void gfs2_glock_exit(void)
{
unregister_shrinker(&glock_shrinker);
shrinker_free(glock_shrinker);
rhashtable_destroy(&gl_hash_table);
destroy_workqueue(glock_workqueue);
}

View File

@ -147,7 +147,7 @@ static int __init init_gfs2_fs(void)
if (!gfs2_trans_cachep)
goto fail_cachep8;
error = register_shrinker(&gfs2_qd_shrinker, "gfs2-qd");
error = gfs2_qd_shrinker_init();
if (error)
goto fail_shrinker;
@ -196,7 +196,7 @@ fail_wq3:
fail_wq2:
destroy_workqueue(gfs2_recovery_wq);
fail_wq1:
unregister_shrinker(&gfs2_qd_shrinker);
gfs2_qd_shrinker_exit();
fail_shrinker:
kmem_cache_destroy(gfs2_trans_cachep);
fail_cachep8:
@ -229,7 +229,7 @@ fail_lru:
static void __exit exit_gfs2_fs(void)
{
unregister_shrinker(&gfs2_qd_shrinker);
gfs2_qd_shrinker_exit();
gfs2_glock_exit();
gfs2_unregister_debugfs();
unregister_filesystem(&gfs2_fs_type);

View File

@ -115,7 +115,7 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
{
struct address_space *mapping = gfs2_glock2aspace(gl);
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
struct page *page;
struct folio *folio;
struct buffer_head *bh;
unsigned int shift;
unsigned long index;
@ -129,36 +129,31 @@ struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
bufnum = blkno - (index << shift); /* block buf index within page */
if (create) {
for (;;) {
page = grab_cache_page(mapping, index);
if (page)
break;
yield();
}
if (!page_has_buffers(page))
create_empty_buffers(page, sdp->sd_sb.sb_bsize, 0);
folio = __filemap_get_folio(mapping, index,
FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
mapping_gfp_mask(mapping) | __GFP_NOFAIL);
bh = folio_buffers(folio);
if (!bh)
bh = create_empty_buffers(folio,
sdp->sd_sb.sb_bsize, 0);
} else {
page = find_get_page_flags(mapping, index,
FGP_LOCK|FGP_ACCESSED);
if (!page)
folio = __filemap_get_folio(mapping, index,
FGP_LOCK | FGP_ACCESSED, 0);
if (IS_ERR(folio))
return NULL;
if (!page_has_buffers(page)) {
bh = NULL;
goto out_unlock;
}
bh = folio_buffers(folio);
}
/* Locate header for our buffer within our page */
for (bh = page_buffers(page); bufnum--; bh = bh->b_this_page)
/* Do nothing */;
get_bh(bh);
if (!bh)
goto out_unlock;
bh = get_nth_bh(bh, bufnum);
if (!buffer_mapped(bh))
map_bh(bh, sdp->sd_vfs, blkno);
out_unlock:
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return bh;
}
@ -405,26 +400,20 @@ static struct buffer_head *gfs2_getjdatabuf(struct gfs2_inode *ip, u64 blkno)
{
struct address_space *mapping = ip->i_inode.i_mapping;
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct page *page;
struct folio *folio;
struct buffer_head *bh;
unsigned int shift = PAGE_SHIFT - sdp->sd_sb.sb_bsize_shift;
unsigned long index = blkno >> shift; /* convert block to page */
unsigned int bufnum = blkno - (index << shift);
page = find_get_page_flags(mapping, index, FGP_LOCK|FGP_ACCESSED);
if (!page)
folio = __filemap_get_folio(mapping, index, FGP_LOCK | FGP_ACCESSED, 0);
if (IS_ERR(folio))
return NULL;
if (!page_has_buffers(page)) {
unlock_page(page);
put_page(page);
return NULL;
}
/* Locate header for our buffer within our page */
for (bh = page_buffers(page); bufnum--; bh = bh->b_this_page)
/* Do nothing */;
get_bh(bh);
unlock_page(page);
put_page(page);
bh = folio_buffers(folio);
if (bh)
bh = get_nth_bh(bh, bufnum);
folio_unlock(folio);
folio_put(folio);
return bh;
}

View File

@ -196,13 +196,26 @@ static unsigned long gfs2_qd_shrink_count(struct shrinker *shrink,
return vfs_pressure_ratio(list_lru_shrink_count(&gfs2_qd_lru, sc));
}
struct shrinker gfs2_qd_shrinker = {
.count_objects = gfs2_qd_shrink_count,
.scan_objects = gfs2_qd_shrink_scan,
.seeks = DEFAULT_SEEKS,
.flags = SHRINKER_NUMA_AWARE,
};
static struct shrinker *gfs2_qd_shrinker;
int __init gfs2_qd_shrinker_init(void)
{
gfs2_qd_shrinker = shrinker_alloc(SHRINKER_NUMA_AWARE, "gfs2-qd");
if (!gfs2_qd_shrinker)
return -ENOMEM;
gfs2_qd_shrinker->count_objects = gfs2_qd_shrink_count;
gfs2_qd_shrinker->scan_objects = gfs2_qd_shrink_scan;
shrinker_register(gfs2_qd_shrinker);
return 0;
}
void gfs2_qd_shrinker_exit(void)
{
shrinker_free(gfs2_qd_shrinker);
}
static u64 qd2index(struct gfs2_quota_data *qd)
{
@ -736,7 +749,7 @@ static int gfs2_write_buf_to_page(struct gfs2_sbd *sdp, unsigned long index,
struct gfs2_inode *ip = GFS2_I(sdp->sd_quota_inode);
struct inode *inode = &ip->i_inode;
struct address_space *mapping = inode->i_mapping;
struct page *page;
struct folio *folio;
struct buffer_head *bh;
u64 blk;
unsigned bsize = sdp->sd_sb.sb_bsize, bnum = 0, boff = 0;
@ -745,15 +758,15 @@ static int gfs2_write_buf_to_page(struct gfs2_sbd *sdp, unsigned long index,
blk = index << (PAGE_SHIFT - sdp->sd_sb.sb_bsize_shift);
boff = off % bsize;
page = grab_cache_page(mapping, index);
if (!page)
return -ENOMEM;
if (!page_has_buffers(page))
create_empty_buffers(page, bsize, 0);
folio = filemap_grab_folio(mapping, index);
if (IS_ERR(folio))
return PTR_ERR(folio);
bh = folio_buffers(folio);
if (!bh)
bh = create_empty_buffers(folio, bsize, 0);
bh = page_buffers(page);
for(;;) {
/* Find the beginning block within the page */
for (;;) {
/* Find the beginning block within the folio */
if (pg_off >= ((bnum * bsize) + bsize)) {
bh = bh->b_this_page;
bnum++;
@ -766,9 +779,10 @@ static int gfs2_write_buf_to_page(struct gfs2_sbd *sdp, unsigned long index,
goto unlock_out;
/* If it's a newly allocated disk block, zero it */
if (buffer_new(bh))
zero_user(page, bnum * bsize, bh->b_size);
folio_zero_range(folio, bnum * bsize,
bh->b_size);
}
if (PageUptodate(page))
if (folio_test_uptodate(folio))
set_buffer_uptodate(bh);
if (bh_read(bh, REQ_META | REQ_PRIO) < 0)
goto unlock_out;
@ -784,17 +798,17 @@ static int gfs2_write_buf_to_page(struct gfs2_sbd *sdp, unsigned long index,
break;
}
/* Write to the page, now that we have setup the buffer(s) */
memcpy_to_page(page, off, buf, bytes);
flush_dcache_page(page);
unlock_page(page);
put_page(page);
/* Write to the folio, now that we have setup the buffer(s) */
memcpy_to_folio(folio, off, buf, bytes);
flush_dcache_folio(folio);
folio_unlock(folio);
folio_put(folio);
return 0;
unlock_out:
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return -EIO;
}

View File

@ -60,7 +60,8 @@ static inline int gfs2_quota_lock_check(struct gfs2_inode *ip,
}
extern const struct quotactl_ops gfs2_quotactl_ops;
extern struct shrinker gfs2_qd_shrinker;
int __init gfs2_qd_shrinker_init(void);
void gfs2_qd_shrinker_exit(void);
extern struct list_lru gfs2_qd_lru;
extern void __init gfs2_quota_hash_init(void);

View File

@ -83,29 +83,6 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
{}
};
#ifdef CONFIG_NUMA
static inline void hugetlb_set_vma_policy(struct vm_area_struct *vma,
struct inode *inode, pgoff_t index)
{
vma->vm_policy = mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy,
index);
}
static inline void hugetlb_drop_vma_policy(struct vm_area_struct *vma)
{
mpol_cond_put(vma->vm_policy);
}
#else
static inline void hugetlb_set_vma_policy(struct vm_area_struct *vma,
struct inode *inode, pgoff_t index)
{
}
static inline void hugetlb_drop_vma_policy(struct vm_area_struct *vma)
{
}
#endif
/*
* Mask used when checking the page offset value passed in via system
* calls. This value will be converted to a loff_t which is signed.
@ -135,7 +112,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
vma->vm_ops = &hugetlb_vm_ops;
ret = seal_check_future_write(info->seals, vma);
ret = seal_check_write(info->seals, vma);
if (ret)
return ret;
@ -295,7 +272,7 @@ static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t byt
size_t res = 0;
/* First subpage to start the loop. */
page += offset / PAGE_SIZE;
page = nth_page(page, offset / PAGE_SIZE);
offset %= PAGE_SIZE;
while (1) {
if (is_raw_hwpoison_page_in_hugepage(page))
@ -309,7 +286,7 @@ static size_t adjust_range_hwpoison(struct page *page, size_t offset, size_t byt
break;
offset += n;
if (offset == PAGE_SIZE) {
page++;
page = nth_page(page, 1);
offset = 0;
}
}
@ -334,7 +311,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
ssize_t retval = 0;
while (iov_iter_count(to)) {
struct page *page;
struct folio *folio;
size_t nr, copied, want;
/* nr is the maximum number of bytes to copy from this page */
@ -352,18 +329,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
}
nr = nr - offset;
/* Find the page */
page = find_lock_page(mapping, index);
if (unlikely(page == NULL)) {
/* Find the folio */
folio = filemap_lock_hugetlb_folio(h, mapping, index);
if (IS_ERR(folio)) {
/*
* We have a HOLE, zero out the user-buffer for the
* length of the hole or request.
*/
copied = iov_iter_zero(nr, to);
} else {
unlock_page(page);
folio_unlock(folio);
if (!PageHWPoison(page))
if (!folio_test_has_hwpoisoned(folio))
want = nr;
else {
/*
@ -371,19 +348,19 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
* touching the 1st raw HWPOISON subpage after
* offset.
*/
want = adjust_range_hwpoison(page, offset, nr);
want = adjust_range_hwpoison(&folio->page, offset, nr);
if (want == 0) {
put_page(page);
folio_put(folio);
retval = -EIO;
break;
}
}
/*
* We have the page, copy it to user space buffer.
* We have the folio, copy it to user space buffer.
*/
copied = copy_page_to_iter(page, offset, want, to);
put_page(page);
copied = copy_folio_to_iter(folio, offset, want, to);
folio_put(folio);
}
offset += copied;
retval += copied;
@ -661,21 +638,20 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
{
struct hstate *h = hstate_inode(inode);
struct address_space *mapping = &inode->i_data;
const pgoff_t start = lstart >> huge_page_shift(h);
const pgoff_t end = lend >> huge_page_shift(h);
const pgoff_t end = lend >> PAGE_SHIFT;
struct folio_batch fbatch;
pgoff_t next, index;
int i, freed = 0;
bool truncate_op = (lend == LLONG_MAX);
folio_batch_init(&fbatch);
next = start;
next = lstart >> PAGE_SHIFT;
while (filemap_get_folios(mapping, &next, end - 1, &fbatch)) {
for (i = 0; i < folio_batch_count(&fbatch); ++i) {
struct folio *folio = fbatch.folios[i];
u32 hash = 0;
index = folio->index;
index = folio->index >> huge_page_order(h);
hash = hugetlb_fault_mutex_hash(mapping, index);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
@ -693,7 +669,9 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
}
if (truncate_op)
(void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed);
(void)hugetlb_unreserve_pages(inode,
lstart >> huge_page_shift(h),
LONG_MAX, freed);
}
static void hugetlbfs_evict_inode(struct inode *inode)
@ -741,7 +719,7 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
pgoff_t idx = start >> huge_page_shift(h);
struct folio *folio;
folio = filemap_lock_folio(mapping, idx);
folio = filemap_lock_hugetlb_folio(h, mapping, idx);
if (IS_ERR(folio))
return;
@ -852,8 +830,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
/*
* Initialize a pseudo vma as this is required by the huge page
* allocation routines. If NUMA is configured, use page index
* as input to create an allocation policy.
* allocation routines.
*/
vma_init(&pseudo_vma, mm);
vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
@ -886,7 +863,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_lock(&hugetlb_fault_mutex_table[hash]);
/* See if already present in mapping to avoid alloc/free */
folio = filemap_get_folio(mapping, index);
folio = filemap_get_folio(mapping, index << huge_page_order(h));
if (!IS_ERR(folio)) {
folio_put(folio);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
@ -901,9 +878,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
* folios in these areas, we need to consume the reserves
* to keep reservation accounting consistent.
*/
hugetlb_set_vma_policy(&pseudo_vma, inode, index);
folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0);
hugetlb_drop_vma_policy(&pseudo_vma);
if (IS_ERR(folio)) {
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
error = PTR_ERR(folio);
@ -1282,18 +1257,6 @@ static struct inode *hugetlbfs_alloc_inode(struct super_block *sb)
hugetlbfs_inc_free_inodes(sbinfo);
return NULL;
}
/*
* Any time after allocation, hugetlbfs_destroy_inode can be called
* for the inode. mpol_free_shared_policy is unconditionally called
* as part of hugetlbfs_destroy_inode. So, initialize policy here
* in case of a quick call to destroy.
*
* Note that the policy is initialized even if we are creating a
* private inode. This simplifies hugetlbfs_destroy_inode.
*/
mpol_shared_policy_init(&p->policy, NULL);
return &p->vfs_inode;
}
@ -1305,7 +1268,6 @@ static void hugetlbfs_free_inode(struct inode *inode)
static void hugetlbfs_destroy_inode(struct inode *inode)
{
hugetlbfs_inc_free_inodes(HUGETLBFS_SB(inode->i_sb));
mpol_free_shared_policy(&HUGETLBFS_I(inode)->policy);
}
static const struct address_space_operations hugetlbfs_aops = {

View File

@ -29,9 +29,9 @@ typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
* and I/O completions.
*/
struct iomap_folio_state {
atomic_t read_bytes_pending;
atomic_t write_bytes_pending;
spinlock_t state_lock;
unsigned int read_bytes_pending;
atomic_t write_bytes_pending;
/*
* Each block has two bits in this bitmap:
@ -57,30 +57,32 @@ static inline bool ifs_block_is_uptodate(struct iomap_folio_state *ifs,
return test_bit(block, ifs->state);
}
static void ifs_set_range_uptodate(struct folio *folio,
static bool ifs_set_range_uptodate(struct folio *folio,
struct iomap_folio_state *ifs, size_t off, size_t len)
{
struct inode *inode = folio->mapping->host;
unsigned int first_blk = off >> inode->i_blkbits;
unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
unsigned int nr_blks = last_blk - first_blk + 1;
unsigned long flags;
spin_lock_irqsave(&ifs->state_lock, flags);
bitmap_set(ifs->state, first_blk, nr_blks);
if (ifs_is_fully_uptodate(folio, ifs))
folio_mark_uptodate(folio);
spin_unlock_irqrestore(&ifs->state_lock, flags);
return ifs_is_fully_uptodate(folio, ifs);
}
static void iomap_set_range_uptodate(struct folio *folio, size_t off,
size_t len)
{
struct iomap_folio_state *ifs = folio->private;
unsigned long flags;
bool uptodate = true;
if (ifs)
ifs_set_range_uptodate(folio, ifs, off, len);
else
if (ifs) {
spin_lock_irqsave(&ifs->state_lock, flags);
uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
spin_unlock_irqrestore(&ifs->state_lock, flags);
}
if (uptodate)
folio_mark_uptodate(folio);
}
@ -181,7 +183,7 @@ static void ifs_free(struct folio *folio)
if (!ifs)
return;
WARN_ON_ONCE(atomic_read(&ifs->read_bytes_pending));
WARN_ON_ONCE(ifs->read_bytes_pending != 0);
WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending));
WARN_ON_ONCE(ifs_is_fully_uptodate(folio, ifs) !=
folio_test_uptodate(folio));
@ -248,20 +250,28 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
*lenp = plen;
}
static void iomap_finish_folio_read(struct folio *folio, size_t offset,
static void iomap_finish_folio_read(struct folio *folio, size_t off,
size_t len, int error)
{
struct iomap_folio_state *ifs = folio->private;
bool uptodate = !error;
bool finished = true;
if (unlikely(error)) {
folio_clear_uptodate(folio);
folio_set_error(folio);
} else {
iomap_set_range_uptodate(folio, offset, len);
if (ifs) {
unsigned long flags;
spin_lock_irqsave(&ifs->state_lock, flags);
if (!error)
uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
ifs->read_bytes_pending -= len;
finished = !ifs->read_bytes_pending;
spin_unlock_irqrestore(&ifs->state_lock, flags);
}
if (!ifs || atomic_sub_and_test(len, &ifs->read_bytes_pending))
folio_unlock(folio);
if (error)
folio_set_error(folio);
if (finished)
folio_end_read(folio, uptodate);
}
static void iomap_read_end_io(struct bio *bio)
@ -358,8 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
}
ctx->cur_folio_in_bio = true;
if (ifs)
atomic_add(plen, &ifs->read_bytes_pending);
if (ifs) {
spin_lock_irq(&ifs->state_lock);
ifs->read_bytes_pending += plen;
spin_unlock_irq(&ifs->state_lock);
}
sector = iomap_sector(iomap, pos);
if (!ctx->bio ||

View File

@ -1290,7 +1290,7 @@ static int jbd2_min_tag_size(void)
static unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
journal_t *journal = container_of(shrink, journal_t, j_shrinker);
journal_t *journal = shrink->private_data;
unsigned long nr_to_scan = sc->nr_to_scan;
unsigned long nr_shrunk;
unsigned long count;
@ -1316,7 +1316,7 @@ static unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
static unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
struct shrink_control *sc)
{
journal_t *journal = container_of(shrink, journal_t, j_shrinker);
journal_t *journal = shrink->private_data;
unsigned long count;
count = percpu_counter_read_positive(&journal->j_checkpoint_jh_count);
@ -1588,14 +1588,21 @@ static journal_t *journal_init_common(struct block_device *bdev,
goto err_cleanup;
journal->j_shrink_transaction = NULL;
journal->j_shrinker.scan_objects = jbd2_journal_shrink_scan;
journal->j_shrinker.count_objects = jbd2_journal_shrink_count;
journal->j_shrinker.seeks = DEFAULT_SEEKS;
journal->j_shrinker.batch = journal->j_max_transaction_buffers;
err = register_shrinker(&journal->j_shrinker, "jbd2-journal:(%u:%u)",
MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
if (err)
journal->j_shrinker = shrinker_alloc(0, "jbd2-journal:(%u:%u)",
MAJOR(bdev->bd_dev),
MINOR(bdev->bd_dev));
if (!journal->j_shrinker) {
err = -ENOMEM;
goto err_cleanup;
}
journal->j_shrinker->scan_objects = jbd2_journal_shrink_scan;
journal->j_shrinker->count_objects = jbd2_journal_shrink_count;
journal->j_shrinker->batch = journal->j_max_transaction_buffers;
journal->j_shrinker->private_data = journal;
shrinker_register(journal->j_shrinker);
return journal;
@ -2172,9 +2179,9 @@ int jbd2_journal_destroy(journal_t *journal)
brelse(journal->j_sb_buffer);
}
if (journal->j_shrinker.flags & SHRINKER_REGISTERED) {
if (journal->j_shrinker) {
percpu_counter_destroy(&journal->j_checkpoint_jh_count);
unregister_shrinker(&journal->j_shrinker);
shrinker_free(journal->j_shrinker);
}
if (journal->j_proc_entry)
jbd2_stats_proc_exit(journal);

View File

@ -429,60 +429,11 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr,
return ret;
}
#ifdef CONFIG_NUMA
static int kernfs_vma_set_policy(struct vm_area_struct *vma,
struct mempolicy *new)
{
struct file *file = vma->vm_file;
struct kernfs_open_file *of = kernfs_of(file);
int ret;
if (!of->vm_ops)
return 0;
if (!kernfs_get_active(of->kn))
return -EINVAL;
ret = 0;
if (of->vm_ops->set_policy)
ret = of->vm_ops->set_policy(vma, new);
kernfs_put_active(of->kn);
return ret;
}
static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma,
unsigned long addr)
{
struct file *file = vma->vm_file;
struct kernfs_open_file *of = kernfs_of(file);
struct mempolicy *pol;
if (!of->vm_ops)
return vma->vm_policy;
if (!kernfs_get_active(of->kn))
return vma->vm_policy;
pol = vma->vm_policy;
if (of->vm_ops->get_policy)
pol = of->vm_ops->get_policy(vma, addr);
kernfs_put_active(of->kn);
return pol;
}
#endif
static const struct vm_operations_struct kernfs_vm_ops = {
.open = kernfs_vma_open,
.fault = kernfs_vma_fault,
.page_mkwrite = kernfs_vma_page_mkwrite,
.access = kernfs_vma_access,
#ifdef CONFIG_NUMA
.set_policy = kernfs_vma_set_policy,
.get_policy = kernfs_vma_get_policy,
#endif
};
static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)

View File

@ -265,7 +265,7 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
sb->s_time_gran = 1;
/* sysfs dentries and inodes don't require IO to create */
sb->s_shrink.seeks = 0;
sb->s_shrink->seeks = 0;
/* get root inode, initialize and unlock it */
down_read(&kf_root->kernfs_rwsem);

View File

@ -37,7 +37,7 @@ struct mb_cache {
struct list_head c_list;
/* Number of entries in cache */
unsigned long c_entry_count;
struct shrinker c_shrink;
struct shrinker *c_shrink;
/* Work for shrinking when the cache has too many entries */
struct work_struct c_shrink_work;
};
@ -293,8 +293,7 @@ EXPORT_SYMBOL(mb_cache_entry_touch);
static unsigned long mb_cache_count(struct shrinker *shrink,
struct shrink_control *sc)
{
struct mb_cache *cache = container_of(shrink, struct mb_cache,
c_shrink);
struct mb_cache *cache = shrink->private_data;
return cache->c_entry_count;
}
@ -333,8 +332,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache,
static unsigned long mb_cache_scan(struct shrinker *shrink,
struct shrink_control *sc)
{
struct mb_cache *cache = container_of(shrink, struct mb_cache,
c_shrink);
struct mb_cache *cache = shrink->private_data;
return mb_cache_shrink(cache, sc->nr_to_scan);
}
@ -377,15 +375,19 @@ struct mb_cache *mb_cache_create(int bucket_bits)
for (i = 0; i < bucket_count; i++)
INIT_HLIST_BL_HEAD(&cache->c_hash[i]);
cache->c_shrink.count_objects = mb_cache_count;
cache->c_shrink.scan_objects = mb_cache_scan;
cache->c_shrink.seeks = DEFAULT_SEEKS;
if (register_shrinker(&cache->c_shrink, "mbcache-shrinker")) {
cache->c_shrink = shrinker_alloc(0, "mbcache-shrinker");
if (!cache->c_shrink) {
kfree(cache->c_hash);
kfree(cache);
goto err_out;
}
cache->c_shrink->count_objects = mb_cache_count;
cache->c_shrink->scan_objects = mb_cache_scan;
cache->c_shrink->private_data = cache;
shrinker_register(cache->c_shrink);
INIT_WORK(&cache->c_shrink_work, mb_cache_shrink_worker);
return cache;
@ -406,7 +408,7 @@ void mb_cache_destroy(struct mb_cache *cache)
{
struct mb_cache_entry *entry, *next;
unregister_shrinker(&cache->c_shrink);
shrinker_free(cache->c_shrink);
/*
* We don't bother with any locking. Cache must not be used at this

View File

@ -119,8 +119,7 @@ static void map_buffer_to_folio(struct folio *folio, struct buffer_head *bh,
folio_mark_uptodate(folio);
return;
}
create_empty_buffers(&folio->page, i_blocksize(inode), 0);
head = folio_buffers(folio);
head = create_empty_buffers(folio, i_blocksize(inode), 0);
}
page_bh = head;

View File

@ -796,28 +796,9 @@ static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
struct shrink_control *sc);
static struct shrinker nfs4_xattr_cache_shrinker = {
.count_objects = nfs4_xattr_cache_count,
.scan_objects = nfs4_xattr_cache_scan,
.seeks = DEFAULT_SEEKS,
.flags = SHRINKER_MEMCG_AWARE,
};
static struct shrinker nfs4_xattr_entry_shrinker = {
.count_objects = nfs4_xattr_entry_count,
.scan_objects = nfs4_xattr_entry_scan,
.seeks = DEFAULT_SEEKS,
.batch = 512,
.flags = SHRINKER_MEMCG_AWARE,
};
static struct shrinker nfs4_xattr_large_entry_shrinker = {
.count_objects = nfs4_xattr_entry_count,
.scan_objects = nfs4_xattr_entry_scan,
.seeks = 1,
.batch = 512,
.flags = SHRINKER_MEMCG_AWARE,
};
static struct shrinker *nfs4_xattr_cache_shrinker;
static struct shrinker *nfs4_xattr_entry_shrinker;
static struct shrinker *nfs4_xattr_large_entry_shrinker;
static enum lru_status
cache_lru_isolate(struct list_head *item,
@ -943,7 +924,7 @@ nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
struct nfs4_xattr_entry *entry;
struct list_lru *lru;
lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
@ -971,7 +952,7 @@ nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count;
struct list_lru *lru;
lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
lru = (shrink == nfs4_xattr_large_entry_shrinker) ?
&nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
count = list_lru_shrink_count(lru, sc);
@ -991,18 +972,34 @@ static void nfs4_xattr_cache_init_once(void *p)
INIT_LIST_HEAD(&cache->dispose);
}
static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
struct list_lru *lru, const char *name)
typedef unsigned long (*count_objects_cb)(struct shrinker *s,
struct shrink_control *sc);
typedef unsigned long (*scan_objects_cb)(struct shrinker *s,
struct shrink_control *sc);
static int __init nfs4_xattr_shrinker_init(struct shrinker **shrinker,
struct list_lru *lru, const char *name,
count_objects_cb count,
scan_objects_cb scan, long batch, int seeks)
{
int ret = 0;
int ret;
ret = register_shrinker(shrinker, name);
if (ret)
*shrinker = shrinker_alloc(SHRINKER_MEMCG_AWARE, name);
if (!*shrinker)
return -ENOMEM;
ret = list_lru_init_memcg(lru, *shrinker);
if (ret) {
shrinker_free(*shrinker);
return ret;
}
ret = list_lru_init_memcg(lru, shrinker);
if (ret)
unregister_shrinker(shrinker);
(*shrinker)->count_objects = count;
(*shrinker)->scan_objects = scan;
(*shrinker)->batch = batch;
(*shrinker)->seeks = seeks;
shrinker_register(*shrinker);
return ret;
}
@ -1010,7 +1007,7 @@ static int nfs4_xattr_shrinker_init(struct shrinker *shrinker,
static void nfs4_xattr_shrinker_destroy(struct shrinker *shrinker,
struct list_lru *lru)
{
unregister_shrinker(shrinker);
shrinker_free(shrinker);
list_lru_destroy(lru);
}
@ -1026,27 +1023,31 @@ int __init nfs4_xattr_cache_init(void)
return -ENOMEM;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru,
"nfs-xattr_cache");
&nfs4_xattr_cache_lru, "nfs-xattr_cache",
nfs4_xattr_cache_count,
nfs4_xattr_cache_scan, 0, DEFAULT_SEEKS);
if (ret)
goto out1;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru,
"nfs-xattr_entry");
&nfs4_xattr_entry_lru, "nfs-xattr_entry",
nfs4_xattr_entry_count,
nfs4_xattr_entry_scan, 512, DEFAULT_SEEKS);
if (ret)
goto out2;
ret = nfs4_xattr_shrinker_init(&nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru,
"nfs-xattr_large_entry");
"nfs-xattr_large_entry",
nfs4_xattr_entry_count,
nfs4_xattr_entry_scan, 512, 1);
if (!ret)
return 0;
nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
out2:
nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
out1:
kmem_cache_destroy(nfs4_xattr_cache_cachep);
@ -1056,11 +1057,11 @@ out1:
void nfs4_xattr_cache_exit(void)
{
nfs4_xattr_shrinker_destroy(&nfs4_xattr_large_entry_shrinker,
nfs4_xattr_shrinker_destroy(nfs4_xattr_large_entry_shrinker,
&nfs4_xattr_large_entry_lru);
nfs4_xattr_shrinker_destroy(&nfs4_xattr_entry_shrinker,
nfs4_xattr_shrinker_destroy(nfs4_xattr_entry_shrinker,
&nfs4_xattr_entry_lru);
nfs4_xattr_shrinker_destroy(&nfs4_xattr_cache_shrinker,
nfs4_xattr_shrinker_destroy(nfs4_xattr_cache_shrinker,
&nfs4_xattr_cache_lru);
kmem_cache_destroy(nfs4_xattr_cache_cachep);
}

View File

@ -129,11 +129,7 @@ static void nfs_ssc_unregister_ops(void)
}
#endif /* CONFIG_NFS_V4_2 */
static struct shrinker acl_shrinker = {
.count_objects = nfs_access_cache_count,
.scan_objects = nfs_access_cache_scan,
.seeks = DEFAULT_SEEKS,
};
static struct shrinker *acl_shrinker;
/*
* Register the NFS filesystems
@ -153,9 +149,18 @@ int __init register_nfs_fs(void)
ret = nfs_register_sysctl();
if (ret < 0)
goto error_2;
ret = register_shrinker(&acl_shrinker, "nfs-acl");
if (ret < 0)
acl_shrinker = shrinker_alloc(0, "nfs-acl");
if (!acl_shrinker) {
ret = -ENOMEM;
goto error_3;
}
acl_shrinker->count_objects = nfs_access_cache_count;
acl_shrinker->scan_objects = nfs_access_cache_scan;
shrinker_register(acl_shrinker);
#ifdef CONFIG_NFS_V4_2
nfs_ssc_register_ops();
#endif
@ -175,7 +180,7 @@ error_0:
*/
void __exit unregister_nfs_fs(void)
{
unregister_shrinker(&acl_shrinker);
shrinker_free(acl_shrinker);
nfs_unregister_sysctl();
unregister_nfs4_fs();
#ifdef CONFIG_NFS_V4_2

View File

@ -521,11 +521,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
return ret;
}
static struct shrinker nfsd_file_shrinker = {
.scan_objects = nfsd_file_lru_scan,
.count_objects = nfsd_file_lru_count,
.seeks = 1,
};
static struct shrinker *nfsd_file_shrinker;
/**
* nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file
@ -746,12 +742,19 @@ nfsd_file_cache_init(void)
goto out_err;
}
ret = register_shrinker(&nfsd_file_shrinker, "nfsd-filecache");
if (ret) {
pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
nfsd_file_shrinker = shrinker_alloc(0, "nfsd-filecache");
if (!nfsd_file_shrinker) {
ret = -ENOMEM;
pr_err("nfsd: failed to allocate nfsd_file_shrinker\n");
goto out_lru;
}
nfsd_file_shrinker->count_objects = nfsd_file_lru_count;
nfsd_file_shrinker->scan_objects = nfsd_file_lru_scan;
nfsd_file_shrinker->seeks = 1;
shrinker_register(nfsd_file_shrinker);
ret = lease_register_notifier(&nfsd_file_lease_notifier);
if (ret) {
pr_err("nfsd: unable to register lease notifier: %d\n", ret);
@ -774,7 +777,7 @@ out:
out_notifier:
lease_unregister_notifier(&nfsd_file_lease_notifier);
out_shrinker:
unregister_shrinker(&nfsd_file_shrinker);
shrinker_free(nfsd_file_shrinker);
out_lru:
list_lru_destroy(&nfsd_file_lru);
out_err:
@ -891,7 +894,7 @@ nfsd_file_cache_shutdown(void)
return;
lease_unregister_notifier(&nfsd_file_lease_notifier);
unregister_shrinker(&nfsd_file_shrinker);
shrinker_free(nfsd_file_shrinker);
/*
* make sure all callers of nfsd_file_lru_cb are done before
* calling nfsd_file_cache_purge

View File

@ -177,7 +177,7 @@ struct nfsd_net {
/* size of cache when we saw the longest hash chain */
unsigned int longest_chain_cachesize;
struct shrinker nfsd_reply_cache_shrinker;
struct shrinker *nfsd_reply_cache_shrinker;
/* tracking server-to-server copy mounts */
spinlock_t nfsd_ssc_lock;
@ -195,7 +195,7 @@ struct nfsd_net {
int nfs4_max_clients;
atomic_t nfsd_courtesy_clients;
struct shrinker nfsd_client_shrinker;
struct shrinker *nfsd_client_shrinker;
struct work_struct nfsd_shrinker_work;
};

View File

@ -4452,8 +4452,7 @@ static unsigned long
nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc)
{
int count;
struct nfsd_net *nn = container_of(shrink,
struct nfsd_net, nfsd_client_shrinker);
struct nfsd_net *nn = shrink->private_data;
count = atomic_read(&nn->nfsd_courtesy_clients);
if (!count)
@ -8235,12 +8234,16 @@ static int nfs4_state_create_net(struct net *net)
INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker);
get_net(net);
nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan;
nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count;
nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS;
if (register_shrinker(&nn->nfsd_client_shrinker, "nfsd-client"))
nn->nfsd_client_shrinker = shrinker_alloc(0, "nfsd-client");
if (!nn->nfsd_client_shrinker)
goto err_shrinker;
nn->nfsd_client_shrinker->scan_objects = nfsd4_state_shrinker_scan;
nn->nfsd_client_shrinker->count_objects = nfsd4_state_shrinker_count;
nn->nfsd_client_shrinker->private_data = nn;
shrinker_register(nn->nfsd_client_shrinker);
return 0;
err_shrinker:
@ -8338,7 +8341,7 @@ nfs4_state_shutdown_net(struct net *net)
struct list_head *pos, *next, reaplist;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
unregister_shrinker(&nn->nfsd_client_shrinker);
shrinker_free(nn->nfsd_client_shrinker);
cancel_work(&nn->nfsd_shrinker_work);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);

View File

@ -201,26 +201,29 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
{
unsigned int hashsize;
unsigned int i;
int status = 0;
nn->max_drc_entries = nfsd_cache_size_limit();
atomic_set(&nn->num_drc_entries, 0);
hashsize = nfsd_hashsize(nn->max_drc_entries);
nn->maskbits = ilog2(hashsize);
nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan;
nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count;
nn->nfsd_reply_cache_shrinker.seeks = 1;
status = register_shrinker(&nn->nfsd_reply_cache_shrinker,
"nfsd-reply:%s", nn->nfsd_name);
if (status)
return status;
nn->drc_hashtbl = kvzalloc(array_size(hashsize,
sizeof(*nn->drc_hashtbl)), GFP_KERNEL);
if (!nn->drc_hashtbl)
return -ENOMEM;
nn->nfsd_reply_cache_shrinker = shrinker_alloc(0, "nfsd-reply:%s",
nn->nfsd_name);
if (!nn->nfsd_reply_cache_shrinker)
goto out_shrinker;
nn->nfsd_reply_cache_shrinker->scan_objects = nfsd_reply_cache_scan;
nn->nfsd_reply_cache_shrinker->count_objects = nfsd_reply_cache_count;
nn->nfsd_reply_cache_shrinker->seeks = 1;
nn->nfsd_reply_cache_shrinker->private_data = nn;
shrinker_register(nn->nfsd_reply_cache_shrinker);
for (i = 0; i < hashsize; i++) {
INIT_LIST_HEAD(&nn->drc_hashtbl[i].lru_head);
spin_lock_init(&nn->drc_hashtbl[i].cache_lock);
@ -229,7 +232,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
return 0;
out_shrinker:
unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
kvfree(nn->drc_hashtbl);
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
return -ENOMEM;
}
@ -239,7 +242,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn)
struct nfsd_cacherep *rp;
unsigned int i;
unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
shrinker_free(nn->nfsd_reply_cache_shrinker);
for (i = 0; i < nn->drc_hashsize; i++) {
struct list_head *head = &nn->drc_hashtbl[i].lru_head;
@ -323,8 +326,7 @@ nfsd_prune_bucket_locked(struct nfsd_net *nn, struct nfsd_drc_bucket *b,
static unsigned long
nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
{
struct nfsd_net *nn = container_of(shrink,
struct nfsd_net, nfsd_reply_cache_shrinker);
struct nfsd_net *nn = shrink->private_data;
return atomic_read(&nn->num_drc_entries);
}
@ -343,8 +345,7 @@ nfsd_reply_cache_count(struct shrinker *shrink, struct shrink_control *sc)
static unsigned long
nfsd_reply_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
{
struct nfsd_net *nn = container_of(shrink,
struct nfsd_net, nfsd_reply_cache_shrinker);
struct nfsd_net *nn = shrink->private_data;
unsigned long freed = 0;
LIST_HEAD(dispose);
unsigned int i;

View File

@ -356,30 +356,28 @@ int nilfs_mdt_delete_block(struct inode *inode, unsigned long block)
*/
int nilfs_mdt_forget_block(struct inode *inode, unsigned long block)
{
pgoff_t index = (pgoff_t)block >>
(PAGE_SHIFT - inode->i_blkbits);
struct page *page;
unsigned long first_block;
pgoff_t index = block >> (PAGE_SHIFT - inode->i_blkbits);
struct folio *folio;
struct buffer_head *bh;
int ret = 0;
int still_dirty;
page = find_lock_page(inode->i_mapping, index);
if (!page)
folio = filemap_lock_folio(inode->i_mapping, index);
if (IS_ERR(folio))
return -ENOENT;
wait_on_page_writeback(page);
folio_wait_writeback(folio);
first_block = (unsigned long)index <<
(PAGE_SHIFT - inode->i_blkbits);
if (page_has_buffers(page)) {
struct buffer_head *bh;
bh = nilfs_page_get_nth_block(page, block - first_block);
bh = folio_buffers(folio);
if (bh) {
unsigned long first_block = index <<
(PAGE_SHIFT - inode->i_blkbits);
bh = get_nth_bh(bh, block - first_block);
nilfs_forget_buffer(bh);
}
still_dirty = PageDirty(page);
unlock_page(page);
put_page(page);
still_dirty = folio_test_dirty(folio);
folio_unlock(folio);
folio_put(folio);
if (still_dirty ||
invalidate_inode_pages2_range(inode->i_mapping, index, index) != 0)
@ -560,17 +558,19 @@ int nilfs_mdt_freeze_buffer(struct inode *inode, struct buffer_head *bh)
{
struct nilfs_shadow_map *shadow = NILFS_MDT(inode)->mi_shadow;
struct buffer_head *bh_frozen;
struct page *page;
struct folio *folio;
int blkbits = inode->i_blkbits;
page = grab_cache_page(shadow->inode->i_mapping, bh->b_folio->index);
if (!page)
return -ENOMEM;
folio = filemap_grab_folio(shadow->inode->i_mapping,
bh->b_folio->index);
if (IS_ERR(folio))
return PTR_ERR(folio);
if (!page_has_buffers(page))
create_empty_buffers(page, 1 << blkbits, 0);
bh_frozen = folio_buffers(folio);
if (!bh_frozen)
bh_frozen = create_empty_buffers(folio, 1 << blkbits, 0);
bh_frozen = nilfs_page_get_nth_block(page, bh_offset(bh) >> blkbits);
bh_frozen = get_nth_bh(bh_frozen, bh_offset(bh) >> blkbits);
if (!buffer_uptodate(bh_frozen))
nilfs_copy_buffer(bh_frozen, bh);
@ -582,8 +582,8 @@ int nilfs_mdt_freeze_buffer(struct inode *inode, struct buffer_head *bh)
brelse(bh_frozen); /* already frozen */
}
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return 0;
}
@ -592,17 +592,19 @@ nilfs_mdt_get_frozen_buffer(struct inode *inode, struct buffer_head *bh)
{
struct nilfs_shadow_map *shadow = NILFS_MDT(inode)->mi_shadow;
struct buffer_head *bh_frozen = NULL;
struct page *page;
struct folio *folio;
int n;
page = find_lock_page(shadow->inode->i_mapping, bh->b_folio->index);
if (page) {
if (page_has_buffers(page)) {
folio = filemap_lock_folio(shadow->inode->i_mapping,
bh->b_folio->index);
if (!IS_ERR(folio)) {
bh_frozen = folio_buffers(folio);
if (bh_frozen) {
n = bh_offset(bh) >> inode->i_blkbits;
bh_frozen = nilfs_page_get_nth_block(page, n);
bh_frozen = get_nth_bh(bh_frozen, n);
}
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
}
return bh_frozen;
}

View File

@ -25,19 +25,19 @@
(BIT(BH_Uptodate) | BIT(BH_Mapped) | BIT(BH_NILFS_Node) | \
BIT(BH_NILFS_Volatile) | BIT(BH_NILFS_Checked))
static struct buffer_head *
__nilfs_get_page_block(struct page *page, unsigned long block, pgoff_t index,
int blkbits, unsigned long b_state)
static struct buffer_head *__nilfs_get_folio_block(struct folio *folio,
unsigned long block, pgoff_t index, int blkbits,
unsigned long b_state)
{
unsigned long first_block;
struct buffer_head *bh;
struct buffer_head *bh = folio_buffers(folio);
if (!page_has_buffers(page))
create_empty_buffers(page, 1 << blkbits, b_state);
if (!bh)
bh = create_empty_buffers(folio, 1 << blkbits, b_state);
first_block = (unsigned long)index << (PAGE_SHIFT - blkbits);
bh = nilfs_page_get_nth_block(page, block - first_block);
bh = get_nth_bh(bh, block - first_block);
touch_buffer(bh);
wait_on_buffer(bh);
@ -51,17 +51,17 @@ struct buffer_head *nilfs_grab_buffer(struct inode *inode,
{
int blkbits = inode->i_blkbits;
pgoff_t index = blkoff >> (PAGE_SHIFT - blkbits);
struct page *page;
struct folio *folio;
struct buffer_head *bh;
page = grab_cache_page(mapping, index);
if (unlikely(!page))
folio = filemap_grab_folio(mapping, index);
if (IS_ERR(folio))
return NULL;
bh = __nilfs_get_page_block(page, blkoff, index, blkbits, b_state);
bh = __nilfs_get_folio_block(folio, blkoff, index, blkbits, b_state);
if (unlikely(!bh)) {
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return NULL;
}
return bh;
@ -184,30 +184,32 @@ void nilfs_page_bug(struct page *page)
}
/**
* nilfs_copy_page -- copy the page with buffers
* @dst: destination page
* @src: source page
* @copy_dirty: flag whether to copy dirty states on the page's buffer heads.
* nilfs_copy_folio -- copy the folio with buffers
* @dst: destination folio
* @src: source folio
* @copy_dirty: flag whether to copy dirty states on the folio's buffer heads.
*
* This function is for both data pages and btnode pages. The dirty flag
* should be treated by caller. The page must not be under i/o.
* Both src and dst page must be locked
* This function is for both data folios and btnode folios. The dirty flag
* should be treated by caller. The folio must not be under i/o.
* Both src and dst folio must be locked
*/
static void nilfs_copy_page(struct page *dst, struct page *src, int copy_dirty)
static void nilfs_copy_folio(struct folio *dst, struct folio *src,
bool copy_dirty)
{
struct buffer_head *dbh, *dbufs, *sbh;
unsigned long mask = NILFS_BUFFER_INHERENT_BITS;
BUG_ON(PageWriteback(dst));
BUG_ON(folio_test_writeback(dst));
sbh = page_buffers(src);
if (!page_has_buffers(dst))
create_empty_buffers(dst, sbh->b_size, 0);
sbh = folio_buffers(src);
dbh = folio_buffers(dst);
if (!dbh)
dbh = create_empty_buffers(dst, sbh->b_size, 0);
if (copy_dirty)
mask |= BIT(BH_Dirty);
dbh = dbufs = page_buffers(dst);
dbufs = dbh;
do {
lock_buffer(sbh);
lock_buffer(dbh);
@ -218,16 +220,16 @@ static void nilfs_copy_page(struct page *dst, struct page *src, int copy_dirty)
dbh = dbh->b_this_page;
} while (dbh != dbufs);
copy_highpage(dst, src);
folio_copy(dst, src);
if (PageUptodate(src) && !PageUptodate(dst))
SetPageUptodate(dst);
else if (!PageUptodate(src) && PageUptodate(dst))
ClearPageUptodate(dst);
if (PageMappedToDisk(src) && !PageMappedToDisk(dst))
SetPageMappedToDisk(dst);
else if (!PageMappedToDisk(src) && PageMappedToDisk(dst))
ClearPageMappedToDisk(dst);
if (folio_test_uptodate(src) && !folio_test_uptodate(dst))
folio_mark_uptodate(dst);
else if (!folio_test_uptodate(src) && folio_test_uptodate(dst))
folio_clear_uptodate(dst);
if (folio_test_mappedtodisk(src) && !folio_test_mappedtodisk(dst))
folio_set_mappedtodisk(dst);
else if (!folio_test_mappedtodisk(src) && folio_test_mappedtodisk(dst))
folio_clear_mappedtodisk(dst);
do {
unlock_buffer(sbh);
@ -269,7 +271,7 @@ repeat:
NILFS_PAGE_BUG(&folio->page,
"found empty page in dat page cache");
nilfs_copy_page(&dfolio->page, &folio->page, 1);
nilfs_copy_folio(dfolio, folio, true);
filemap_dirty_folio(folio_mapping(dfolio), dfolio);
folio_unlock(dfolio);
@ -314,7 +316,7 @@ repeat:
if (!IS_ERR(dfolio)) {
/* overwrite existing folio in the destination cache */
WARN_ON(folio_test_dirty(dfolio));
nilfs_copy_page(&dfolio->page, &folio->page, 0);
nilfs_copy_folio(dfolio, folio, false);
folio_unlock(dfolio);
folio_put(dfolio);
/* Do we not need to remove folio from smap here? */

View File

@ -52,15 +52,4 @@ unsigned long nilfs_find_uncommitted_extent(struct inode *inode,
#define NILFS_PAGE_BUG(page, m, a...) \
do { nilfs_page_bug(page); BUG(); } while (0)
static inline struct buffer_head *
nilfs_page_get_nth_block(struct page *page, unsigned int count)
{
struct buffer_head *bh = page_buffers(page);
while (count-- > 0)
bh = bh->b_this_page;
get_bh(bh);
return bh;
}
#endif /* _NILFS_PAGE_H */

View File

@ -731,10 +731,9 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
continue;
}
head = folio_buffers(folio);
if (!head) {
create_empty_buffers(&folio->page, i_blocksize(inode), 0);
head = folio_buffers(folio);
}
if (!head)
head = create_empty_buffers(folio,
i_blocksize(inode), 0);
folio_unlock(folio);
bh = head;

View File

@ -145,13 +145,12 @@ still_busy:
}
/**
* ntfs_read_block - fill a @page of an address space with data
* @page: page cache page to fill with data
* ntfs_read_block - fill a @folio of an address space with data
* @folio: page cache folio to fill with data
*
* Fill the page @page of the address space belonging to the @page->host inode.
* We read each buffer asynchronously and when all buffers are read in, our io
* completion handler ntfs_end_buffer_read_async(), if required, automatically
* applies the mst fixups to the page before finally marking it uptodate and
* applies the mst fixups to the folio before finally marking it uptodate and
* unlocking it.
*
* We only enforce allocated_size limit because i_size is checked for in
@ -161,7 +160,7 @@ still_busy:
*
* Contains an adapted version of fs/buffer.c::block_read_full_folio().
*/
static int ntfs_read_block(struct page *page)
static int ntfs_read_block(struct folio *folio)
{
loff_t i_size;
VCN vcn;
@ -178,7 +177,7 @@ static int ntfs_read_block(struct page *page)
int i, nr;
unsigned char blocksize_bits;
vi = page->mapping->host;
vi = folio->mapping->host;
ni = NTFS_I(vi);
vol = ni->vol;
@ -188,15 +187,10 @@ static int ntfs_read_block(struct page *page)
blocksize = vol->sb->s_blocksize;
blocksize_bits = vol->sb->s_blocksize_bits;
if (!page_has_buffers(page)) {
create_empty_buffers(page, blocksize, 0);
if (unlikely(!page_has_buffers(page))) {
unlock_page(page);
return -ENOMEM;
}
}
bh = head = page_buffers(page);
BUG_ON(!bh);
head = folio_buffers(folio);
if (!head)
head = create_empty_buffers(folio, blocksize, 0);
bh = head;
/*
* We may be racing with truncate. To avoid some of the problems we
@ -205,11 +199,11 @@ static int ntfs_read_block(struct page *page)
* may leave some buffers unmapped which are now allocated. This is
* not a problem since these buffers will just get mapped when a write
* occurs. In case of a shrinking truncate, we will detect this later
* on due to the runlist being incomplete and if the page is being
* on due to the runlist being incomplete and if the folio is being
* fully truncated, truncate will throw it away as soon as we unlock
* it so no need to worry what we do with it.
*/
iblock = (s64)page->index << (PAGE_SHIFT - blocksize_bits);
iblock = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
read_lock_irqsave(&ni->size_lock, flags);
lblock = (ni->allocated_size + blocksize - 1) >> blocksize_bits;
init_size = ni->initialized_size;
@ -221,7 +215,7 @@ static int ntfs_read_block(struct page *page)
}
zblock = (init_size + blocksize - 1) >> blocksize_bits;
/* Loop through all the buffers in the page. */
/* Loop through all the buffers in the folio. */
rl = NULL;
nr = i = 0;
do {
@ -299,7 +293,7 @@ lock_retry_remap:
if (!err)
err = -EIO;
bh->b_blocknr = -1;
SetPageError(page);
folio_set_error(folio);
ntfs_error(vol->sb, "Failed to read from inode 0x%lx, "
"attribute type 0x%x, vcn 0x%llx, "
"offset 0x%x because its location on "
@ -312,13 +306,13 @@ lock_retry_remap:
/*
* Either iblock was outside lblock limits or
* ntfs_rl_vcn_to_lcn() returned error. Just zero that portion
* of the page and set the buffer uptodate.
* of the folio and set the buffer uptodate.
*/
handle_hole:
bh->b_blocknr = -1UL;
clear_buffer_mapped(bh);
handle_zblock:
zero_user(page, i * blocksize, blocksize);
folio_zero_range(folio, i * blocksize, blocksize);
if (likely(!err))
set_buffer_uptodate(bh);
} while (i++, iblock++, (bh = bh->b_this_page) != head);
@ -349,11 +343,11 @@ handle_zblock:
return 0;
}
/* No i/o was scheduled on any of the buffers. */
if (likely(!PageError(page)))
SetPageUptodate(page);
if (likely(!folio_test_error(folio)))
folio_mark_uptodate(folio);
else /* Signal synchronous i/o error. */
nr = -EIO;
unlock_page(page);
folio_unlock(folio);
return nr;
}
@ -433,7 +427,7 @@ retry_readpage:
/* NInoNonResident() == NInoIndexAllocPresent() */
if (NInoNonResident(ni)) {
/* Normal, non-resident data stream. */
return ntfs_read_block(page);
return ntfs_read_block(folio);
}
/*
* Attribute is resident, implying it is not compressed or encrypted.
@ -507,28 +501,29 @@ err_out:
#ifdef NTFS_RW
/**
* ntfs_write_block - write a @page to the backing store
* @page: page cache page to write out
* ntfs_write_block - write a @folio to the backing store
* @folio: page cache folio to write out
* @wbc: writeback control structure
*
* This function is for writing pages belonging to non-resident, non-mst
* This function is for writing folios belonging to non-resident, non-mst
* protected attributes to their backing store.
*
* For a page with buffers, map and write the dirty buffers asynchronously
* under page writeback. For a page without buffers, create buffers for the
* page, then proceed as above.
* For a folio with buffers, map and write the dirty buffers asynchronously
* under folio writeback. For a folio without buffers, create buffers for the
* folio, then proceed as above.
*
* If a page doesn't have buffers the page dirty state is definitive. If a page
* does have buffers, the page dirty state is just a hint, and the buffer dirty
* state is definitive. (A hint which has rules: dirty buffers against a clean
* page is illegal. Other combinations are legal and need to be handled. In
* particular a dirty page containing clean buffers for example.)
* If a folio doesn't have buffers the folio dirty state is definitive. If
* a folio does have buffers, the folio dirty state is just a hint,
* and the buffer dirty state is definitive. (A hint which has rules:
* dirty buffers against a clean folio is illegal. Other combinations are
* legal and need to be handled. In particular a dirty folio containing
* clean buffers for example.)
*
* Return 0 on success and -errno on error.
*
* Based on ntfs_read_block() and __block_write_full_folio().
*/
static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
static int ntfs_write_block(struct folio *folio, struct writeback_control *wbc)
{
VCN vcn;
LCN lcn;
@ -546,41 +541,29 @@ static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
bool need_end_writeback;
unsigned char blocksize_bits;
vi = page->mapping->host;
vi = folio->mapping->host;
ni = NTFS_I(vi);
vol = ni->vol;
ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
"0x%lx.", ni->mft_no, ni->type, page->index);
"0x%lx.", ni->mft_no, ni->type, folio->index);
BUG_ON(!NInoNonResident(ni));
BUG_ON(NInoMstProtected(ni));
blocksize = vol->sb->s_blocksize;
blocksize_bits = vol->sb->s_blocksize_bits;
if (!page_has_buffers(page)) {
BUG_ON(!PageUptodate(page));
create_empty_buffers(page, blocksize,
head = folio_buffers(folio);
if (!head) {
BUG_ON(!folio_test_uptodate(folio));
head = create_empty_buffers(folio, blocksize,
(1 << BH_Uptodate) | (1 << BH_Dirty));
if (unlikely(!page_has_buffers(page))) {
ntfs_warning(vol->sb, "Error allocating page "
"buffers. Redirtying page so we try "
"again later.");
/*
* Put the page back on mapping->dirty_pages, but leave
* its buffers' dirty state as-is.
*/
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
}
}
bh = head = page_buffers(page);
BUG_ON(!bh);
bh = head;
/* NOTE: Different naming scheme to ntfs_read_block()! */
/* The first block in the page. */
block = (s64)page->index << (PAGE_SHIFT - blocksize_bits);
/* The first block in the folio. */
block = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
read_lock_irqsave(&ni->size_lock, flags);
i_size = i_size_read(vi);
@ -597,14 +580,14 @@ static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
* Be very careful. We have no exclusion from block_dirty_folio
* here, and the (potentially unmapped) buffers may become dirty at
* any time. If a buffer becomes dirty here after we've inspected it
* then we just miss that fact, and the page stays dirty.
* then we just miss that fact, and the folio stays dirty.
*
* Buffers outside i_size may be dirtied by block_dirty_folio;
* handle that here by just cleaning them.
*/
/*
* Loop through all the buffers in the page, mapping all the dirty
* Loop through all the buffers in the folio, mapping all the dirty
* buffers to disk addresses and handling any aliases from the
* underlying block device's mapping.
*/
@ -616,13 +599,13 @@ static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
if (unlikely(block >= dblock)) {
/*
* Mapped buffers outside i_size will occur, because
* this page can be outside i_size when there is a
* this folio can be outside i_size when there is a
* truncate in progress. The contents of such buffers
* were zeroed by ntfs_writepage().
*
* FIXME: What about the small race window where
* ntfs_writepage() has not done any clearing because
* the page was within i_size but before we get here,
* the folio was within i_size but before we get here,
* vmtruncate() modifies i_size?
*/
clear_buffer_dirty(bh);
@ -638,38 +621,38 @@ static int ntfs_write_block(struct page *page, struct writeback_control *wbc)
if (unlikely((block >= iblock) &&
(initialized_size < i_size))) {
/*
* If this page is fully outside initialized
* size, zero out all pages between the current
* initialized size and the current page. Just
* If this folio is fully outside initialized
* size, zero out all folios between the current
* initialized size and the current folio. Just
* use ntfs_read_folio() to do the zeroing
* transparently.
*/
if (block > iblock) {
// TODO:
// For each page do:
// - read_cache_page()
// Again for each page do:
// - wait_on_page_locked()
// - Check (PageUptodate(page) &&
// !PageError(page))
// For each folio do:
// - read_cache_folio()
// Again for each folio do:
// - wait_on_folio_locked()
// - Check (folio_test_uptodate(folio) &&
// !folio_test_error(folio))
// Update initialized size in the attribute and
// in the inode.
// Again, for each page do:
// Again, for each folio do:
// block_dirty_folio();
// put_page()
// folio_put()
// We don't need to wait on the writes.
// Update iblock.
}
/*
* The current page straddles initialized size. Zero
* The current folio straddles initialized size. Zero
* all non-uptodate buffers and set them uptodate (and
* dirty?). Note, there aren't any non-uptodate buffers
* if the page is uptodate.
* FIXME: For an uptodate page, the buffers may need to
* if the folio is uptodate.
* FIXME: For an uptodate folio, the buffers may need to
* be written out because they were not initialized on
* disk before.
*/
if (!PageUptodate(page)) {
if (!folio_test_uptodate(folio)) {
// TODO:
// Zero any non-uptodate buffers up to i_size.
// Set them uptodate and dirty.
@ -727,14 +710,14 @@ lock_retry_remap:
unsigned long *bpos, *bend;
/* Check if the buffer is zero. */
kaddr = kmap_atomic(page);
bpos = (unsigned long *)(kaddr + bh_offset(bh));
bend = (unsigned long *)((u8*)bpos + blocksize);
kaddr = kmap_local_folio(folio, bh_offset(bh));
bpos = (unsigned long *)kaddr;
bend = (unsigned long *)(kaddr + blocksize);
do {
if (unlikely(*bpos))
break;
} while (likely(++bpos < bend));
kunmap_atomic(kaddr);
kunmap_local(kaddr);
if (bpos == bend) {
/*
* Buffer is zero and sparse, no need to write
@ -774,7 +757,7 @@ lock_retry_remap:
if (err == -ENOENT || lcn == LCN_ENOENT) {
bh->b_blocknr = -1;
clear_buffer_dirty(bh);
zero_user(page, bh_offset(bh), blocksize);
folio_zero_range(folio, bh_offset(bh), blocksize);
set_buffer_uptodate(bh);
err = 0;
continue;
@ -801,7 +784,7 @@ lock_retry_remap:
bh = head;
/* Just an optimization, so ->read_folio() is not called later. */
if (unlikely(!PageUptodate(page))) {
if (unlikely(!folio_test_uptodate(folio))) {
int uptodate = 1;
do {
if (!buffer_uptodate(bh)) {
@ -811,7 +794,7 @@ lock_retry_remap:
}
} while ((bh = bh->b_this_page) != head);
if (uptodate)
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
/* Setup all mapped, dirty buffers for async write i/o. */
@ -826,7 +809,7 @@ lock_retry_remap:
} else if (unlikely(err)) {
/*
* For the error case. The buffer may have been set
* dirty during attachment to a dirty page.
* dirty during attachment to a dirty folio.
*/
if (err != -ENOMEM)
clear_buffer_dirty(bh);
@ -839,20 +822,20 @@ lock_retry_remap:
err = 0;
else if (err == -ENOMEM) {
ntfs_warning(vol->sb, "Error allocating memory. "
"Redirtying page so we try again "
"Redirtying folio so we try again "
"later.");
/*
* Put the page back on mapping->dirty_pages, but
* Put the folio back on mapping->dirty_pages, but
* leave its buffer's dirty state as-is.
*/
redirty_page_for_writepage(wbc, page);
folio_redirty_for_writepage(wbc, folio);
err = 0;
} else
SetPageError(page);
folio_set_error(folio);
}
BUG_ON(PageWriteback(page));
set_page_writeback(page); /* Keeps try_to_free_buffers() away. */
BUG_ON(folio_test_writeback(folio));
folio_start_writeback(folio); /* Keeps try_to_free_buffers() away. */
/* Submit the prepared buffers for i/o. */
need_end_writeback = true;
@ -864,11 +847,11 @@ lock_retry_remap:
}
bh = next;
} while (bh != head);
unlock_page(page);
folio_unlock(folio);
/* If no i/o was started, need to end_page_writeback(). */
/* If no i/o was started, need to end writeback here. */
if (unlikely(need_end_writeback))
end_page_writeback(page);
folio_end_writeback(folio);
ntfs_debug("Done.");
return err;
@ -1337,8 +1320,9 @@ done:
*/
static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
{
struct folio *folio = page_folio(page);
loff_t i_size;
struct inode *vi = page->mapping->host;
struct inode *vi = folio->mapping->host;
ntfs_inode *base_ni = NULL, *ni = NTFS_I(vi);
char *addr;
ntfs_attr_search_ctx *ctx = NULL;
@ -1347,14 +1331,13 @@ static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
int err;
retry_writepage:
BUG_ON(!PageLocked(page));
BUG_ON(!folio_test_locked(folio));
i_size = i_size_read(vi);
/* Is the page fully outside i_size? (truncate in progress) */
if (unlikely(page->index >= (i_size + PAGE_SIZE - 1) >>
/* Is the folio fully outside i_size? (truncate in progress) */
if (unlikely(folio->index >= (i_size + PAGE_SIZE - 1) >>
PAGE_SHIFT)) {
struct folio *folio = page_folio(page);
/*
* The page may have dirty, unmapped buffers. Make them
* The folio may have dirty, unmapped buffers. Make them
* freeable here, so the page does not leak.
*/
block_invalidate_folio(folio, 0, folio_size(folio));
@ -1373,7 +1356,7 @@ retry_writepage:
if (ni->type != AT_INDEX_ALLOCATION) {
/* If file is encrypted, deny access, just like NT4. */
if (NInoEncrypted(ni)) {
unlock_page(page);
folio_unlock(folio);
BUG_ON(ni->type != AT_DATA);
ntfs_debug("Denying write access to encrypted file.");
return -EACCES;
@ -1384,14 +1367,14 @@ retry_writepage:
BUG_ON(ni->name_len);
// TODO: Implement and replace this with
// return ntfs_write_compressed_block(page);
unlock_page(page);
folio_unlock(folio);
ntfs_error(vi->i_sb, "Writing to compressed files is "
"not supported yet. Sorry.");
return -EOPNOTSUPP;
}
// TODO: Implement and remove this check.
if (NInoNonResident(ni) && NInoSparse(ni)) {
unlock_page(page);
folio_unlock(folio);
ntfs_error(vi->i_sb, "Writing to sparse files is not "
"supported yet. Sorry.");
return -EOPNOTSUPP;
@ -1400,34 +1383,34 @@ retry_writepage:
/* NInoNonResident() == NInoIndexAllocPresent() */
if (NInoNonResident(ni)) {
/* We have to zero every time due to mmap-at-end-of-file. */
if (page->index >= (i_size >> PAGE_SHIFT)) {
/* The page straddles i_size. */
unsigned int ofs = i_size & ~PAGE_MASK;
zero_user_segment(page, ofs, PAGE_SIZE);
if (folio->index >= (i_size >> PAGE_SHIFT)) {
/* The folio straddles i_size. */
unsigned int ofs = i_size & (folio_size(folio) - 1);
folio_zero_segment(folio, ofs, folio_size(folio));
}
/* Handle mst protected attributes. */
if (NInoMstProtected(ni))
return ntfs_write_mst_block(page, wbc);
/* Normal, non-resident data stream. */
return ntfs_write_block(page, wbc);
return ntfs_write_block(folio, wbc);
}
/*
* Attribute is resident, implying it is not compressed, encrypted, or
* mst protected. This also means the attribute is smaller than an mft
* record and hence smaller than a page, so can simply return error on
* any pages with index above 0. Note the attribute can actually be
* record and hence smaller than a folio, so can simply return error on
* any folios with index above 0. Note the attribute can actually be
* marked compressed but if it is resident the actual data is not
* compressed so we are ok to ignore the compressed flag here.
*/
BUG_ON(page_has_buffers(page));
BUG_ON(!PageUptodate(page));
if (unlikely(page->index > 0)) {
ntfs_error(vi->i_sb, "BUG()! page->index (0x%lx) > 0. "
"Aborting write.", page->index);
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
BUG_ON(folio_buffers(folio));
BUG_ON(!folio_test_uptodate(folio));
if (unlikely(folio->index > 0)) {
ntfs_error(vi->i_sb, "BUG()! folio->index (0x%lx) > 0. "
"Aborting write.", folio->index);
BUG_ON(folio_test_writeback(folio));
folio_start_writeback(folio);
folio_unlock(folio);
folio_end_writeback(folio);
return -EIO;
}
if (!NInoAttr(ni))
@ -1460,12 +1443,12 @@ retry_writepage:
if (unlikely(err))
goto err_out;
/*
* Keep the VM happy. This must be done otherwise the radix-tree tag
* PAGECACHE_TAG_DIRTY remains set even though the page is clean.
* Keep the VM happy. This must be done otherwise
* PAGECACHE_TAG_DIRTY remains set even though the folio is clean.
*/
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
BUG_ON(folio_test_writeback(folio));
folio_start_writeback(folio);
folio_unlock(folio);
attr_len = le32_to_cpu(ctx->attr->data.resident.value_length);
i_size = i_size_read(vi);
if (unlikely(attr_len > i_size)) {
@ -1480,18 +1463,18 @@ retry_writepage:
/* Shrinking cannot fail. */
BUG_ON(err);
}
addr = kmap_atomic(page);
/* Copy the data from the page to the mft record. */
addr = kmap_local_folio(folio, 0);
/* Copy the data from the folio to the mft record. */
memcpy((u8*)ctx->attr +
le16_to_cpu(ctx->attr->data.resident.value_offset),
addr, attr_len);
/* Zero out of bounds area in the page cache page. */
memset(addr + attr_len, 0, PAGE_SIZE - attr_len);
kunmap_atomic(addr);
flush_dcache_page(page);
/* Zero out of bounds area in the page cache folio. */
memset(addr + attr_len, 0, folio_size(folio) - attr_len);
kunmap_local(addr);
flush_dcache_folio(folio);
flush_dcache_mft_record_page(ctx->ntfs_ino);
/* We are done with the page. */
end_page_writeback(page);
/* We are done with the folio. */
folio_end_writeback(folio);
/* Finally, mark the mft record dirty, so it gets written back. */
mark_mft_record_dirty(ctx->ntfs_ino);
ntfs_attr_put_search_ctx(ctx);
@ -1502,18 +1485,18 @@ err_out:
ntfs_warning(vi->i_sb, "Error allocating memory. Redirtying "
"page so we try again later.");
/*
* Put the page back on mapping->dirty_pages, but leave its
* Put the folio back on mapping->dirty_pages, but leave its
* buffers' dirty state as-is.
*/
redirty_page_for_writepage(wbc, page);
folio_redirty_for_writepage(wbc, folio);
err = 0;
} else {
ntfs_error(vi->i_sb, "Resident attribute write failed with "
"error %i.", err);
SetPageError(page);
folio_set_error(folio);
NVolSetErrors(ni->vol);
}
unlock_page(page);
folio_unlock(folio);
if (ctx)
ntfs_attr_put_search_ctx(ctx);
if (m)

Some files were not shown because too many files have changed in this diff Show More