No description
Find a file
Barry Song 73bc32875e mm: hold PTL from the first PTE while reclaiming a large folio
Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE
modifications preceded by pte clear.  While iterating over PTEs of a large
folio, it only starts acquiring PTL from the first valid (present) PTE. 
PTE modifications can temporarily set PTEs to pte_none.  Consequently, the
initial PTEs of a large folio might be skipped in try_to_unmap_one().

For example, for an anon folio, if we skip PTE0, we may have PTE0 which is
still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after
try_to_unmap_one().

So folio will be still mapped, the folio fails to be reclaimed and is put
back to LRU in this round.

This also breaks up PTEs optimization such as CONT-PTE on this large folio
and may lead to accident folio_split() afterwards.  And since a part of
PTEs are now swap entries, accessing those parts will introduce overhead -
do_swap_page.  Although the kernel can withstand all of the above issues,
the situation still seems quite awkward and warrants making it more ideal.

The same race also occurs with small folios, but they have only one PTE,
thus, it won't be possible for them to be partially unmapped.

This patch holds PTL from PTE0, allowing us to avoid reading PTE values
that are in the process of being transformed.  With stable PTE values, we
can ensure that this large folio is either completely reclaimed or that
all PTEs remain untouched in this round.

A corner case is that if we hold PTL from PTE0 and most initial PTEs have
been really unmapped before that, we may increase the duration of holding
PTL.  Thus we only apply this optimization to folios which are still
entirely mapped (not in deferred_split list).

[akpm@linux-foundation.org: rewrap comment, per Matthew]
Link: https://lkml.kernel.org/r/20240306095219.71086-1-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:08 -07:00
arch arm64: mm: swap: support THP_SWAP on hardware with MTE 2024-04-25 20:56:07 -07:00
block block-6.9-20240412 2024-04-12 10:22:33 -07:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto This push fixes a regression that broke iwd as well as a divide by 2024-03-25 10:48:23 -07:00
Documentation docs: hugetlbpage.rst: add hugetlb migration description 2024-04-25 20:56:06 -07:00
drivers mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
fs mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
include arm64: mm: swap: support THP_SWAP on hardware with MTE 2024-04-25 20:56:07 -07:00
init mm: introduce slabobj_ext to support slab object extensions 2024-04-25 20:55:51 -07:00
io_uring io_uring/net: restore msg_control on sendzc retry 2024-04-08 21:48:41 -06:00
ipc sysctl changes for v6.9-rc1 2024-03-18 14:59:13 -07:00
kernel mm: free up PG_slab 2024-04-25 20:56:00 -07:00
lib alloc_tag: Tighten file permissions on /proc/allocinfo 2024-04-25 20:55:59 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm: hold PTL from the first PTE while reclaiming a large folio 2024-04-25 20:56:08 -07:00
net mm: change inlined allocation helpers to account at the call site 2024-04-25 20:55:59 -07:00
rust rust: add a rust helper for krealloc() 2024-04-25 20:55:55 -07:00
samples Tracing updates for 6.9: 2024-03-18 15:11:44 -07:00
scripts lib: add allocation tagging support for memory allocation profiling 2024-04-25 20:55:52 -07:00
security security: Place security_path_post_mknod() where the original IMA call was 2024-04-03 10:21:32 -07:00
sound fix missing vmalloc.h includes 2024-04-25 20:55:49 -07:00
tools selftests/mm: parse VMA range in one go 2024-04-25 20:56:07 -07:00
usr Kbuild updates for v6.8 2024-01-18 17:57:07 -08:00
virt KVM Xen and pfncache changes for 6.9: 2024-03-11 10:42:55 -04:00
.clang-format clang-format: Update with v6.7-rc4's for_each macro list 2023-12-08 23:54:38 +01:00
.cocciconfig
.editorconfig Add .editorconfig file for basic formatting 2023-12-28 16:22:47 +09:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: create a list of all built DTB files 2024-02-19 18:20:39 +09:00
.mailmap MAINTAINERS: update Naoya Horiguchi's email address 2024-04-16 15:39:51 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer 2024-03-27 13:41:02 -05:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add entries for code tagging and memory allocation profiling 2024-04-25 20:55:58 -07:00
Makefile Linux 6.9-rc4 2024-04-14 13:38:39 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.