No description
Find a file
Ryan Roberts 1ede7f1d7e mm: swap: fix race between free_swap_and_cache() and swapoff()
[ Upstream commit 82b1c07a0a ]

There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread.  This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.

This is a theoretical problem and I haven't been able to provoke it from a
test case.  But there has been agreement based on code review that this is
possible (see link below).

Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff().  There was an extra check in _swap_info_get() to confirm that
the swap entry was not free.  This isn't present in get_swap_device()
because it doesn't make sense in general due to the race between getting
the reference and swapoff.  So I've added an equivalent check directly in
free_swap_and_cache().

Details of how to provoke one possible issue (thanks to David Hildenbrand
for deriving this):

--8<-----

__swap_entry_free() might be the last user and result in
"count == SWAP_HAS_CACHE".

swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.

So the question is: could someone reclaim the folio and turn
si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().

Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
still references by swap entries.

Process 1 still references subpage 0 via swap entry.
Process 2 still references subpage 1 via swap entry.

Process 1 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
[then, preempted in the hypervisor etc.]

Process 2 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE

Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().

__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()->
...
WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);

What stops swapoff to succeed after process 2 reclaimed the swap cache
but before process1 finished its call to swap_page_trans_huge_swapped()?

--8<-----

Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com
Fixes: 7c00bafee8 ("mm/swap: free swap slots in batch")
Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redhat.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03 15:19:32 +02:00
arch powerpc: xor_vmx: Add '-mhard-float' to CFLAGS 2024-04-03 15:19:31 +02:00
block block: Clear zone limits for a non-zoned stacked queue 2024-04-03 15:19:27 +02:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto crypto: jitter - fix CRYPTO_JITTERENTROPY help text 2024-03-26 18:20:50 -04:00
Documentation media: mc: Expand MUST_CONNECT flag to always require an enabled link 2024-04-03 15:19:25 +02:00
drivers dm-raid: fix lockdep waring in "pers->hot_add_disk" 2024-04-03 15:19:31 +02:00
fs btrfs: fix off-by-one chunk length calculation at contains_pending_extent() 2024-04-03 15:19:31 +02:00
include mac802154: fix llsec key resources release in mac802154_llsec_key_del 2024-04-03 15:19:31 +02:00
init modules: wait do_free_init correctly 2024-03-26 18:20:52 -04:00
io_uring io_uring/net: correct the type of variable 2024-03-26 18:20:57 -04:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel serial: Lock console when calling into driver before registration 2024-04-03 15:19:31 +02:00
lib pci_iounmap(): Fix MMIO mapping leak 2024-04-03 15:19:25 +02:00
LICENSES
mm mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-03 15:19:32 +02:00
net mac802154: fix llsec key resources release in mac802154_llsec_key_del 2024-04-03 15:19:31 +02:00
rust rust: allocator: Prevent mis-aligned allocation 2023-08-11 12:08:18 +02:00
samples work around gcc bugs with 'asm goto' with outputs 2024-02-23 09:12:28 +01:00
scripts kconfig: fix infinite loop when expanding a macro at the end of file 2024-03-26 18:20:58 -04:00
security smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity() 2024-04-03 15:19:24 +02:00
sound ASoC: rockchip: i2s-tdm: Fix inaccurate sampling rates 2024-03-26 18:20:58 -04:00
tools selftests/mqueue: Set timeout to 180 seconds 2024-04-03 15:19:26 +02:00
usr
virt KVM: Always flush async #PF workqueue when vCPU is being destroyed 2024-04-03 15:19:25 +02:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml
COPYING
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild
Kconfig
MAINTAINERS trace: Relocate event helper files 2024-03-06 14:45:17 +00:00
Makefile Linux 6.1.83 2024-03-26 18:22:57 -04:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.