Go to file
Josef Bacik f1a0a81e72 mm: fix page leak with multiple threads mapping the same page
commit 3fe2895cfe upstream.

We have an application with a lot of threads that use a shared mmap backed
by tmpfs mounted with -o huge=within_size.  This application started
leaking loads of huge pages when we upgraded to a recent kernel.

Using the page ref tracepoints and a BPF program written by Tejun Heo we
were able to determine that these pages would have multiple refcounts from
the page fault path, but when it came to unmap time we wouldn't drop the
number of refs we had added from the faults.

I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
huge=always, and then spawned 20 threads all looping faulting random
offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
page aligned ranges.  This very quickly reproduced the problem.

The problem here is that we check for the case that we have multiple
threads faulting in a range that was previously unmapped.  One thread maps
the PMD, the other thread loses the race and then returns 0.  However at
this point we already have the page, and we are no longer putting this
page into the processes address space, and so we leak the page.  We
actually did the correct thing prior to f9ce0be71d, however it looks
like Kirill copied what we do in the anonymous page case.  In the
anonymous page case we don't yet have a page, so we don't have to drop a
reference on anything.  Previously we did the correct thing for file based
faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
the page we faulted in.

Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
case, this makes us drop the ref on the page properly, and now my
reproducer no longer leaks the huge pages.

[josef@toxicpanda.com: v2]
  Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
Fixes: f9ce0be71d ("mm: Cleanup faultaround and finish_fault() codepaths")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03 12:05:17 +02:00
Documentation x86/bugs: Add retbleed=ibpb 2022-07-23 12:56:50 +02:00
LICENSES LICENSES/LGPL-2.1: Add LGPL-2.1-or-later as valid identifiers 2021-12-16 14:33:10 +01:00
arch ARM: pxa2xx: Fix GPIO descriptor tables 2022-08-03 12:05:15 +02:00
block block: pop cached rq before potentially blocking rq_qos_throttle() 2022-06-29 09:04:33 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:28:03 +02:00
crypto crypto: memneq - move into lib/ 2022-06-22 14:28:06 +02:00
drivers crypto: qat - re-enable registration of algorithms 2022-07-29 17:28:16 +02:00
fs fs: sendfile handles O_NONBLOCK of out_fd 2022-08-03 12:05:16 +02:00
include Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put 2022-08-03 12:05:15 +02:00
init gcc-12: disable '-Warray-bounds' universally for now 2022-06-22 14:27:55 +02:00
ipc ipc/mqueue: use get_tree_nodev() in mqueue_get_tree() 2022-06-09 10:30:30 +02:00
kernel watch-queue: remove spurious double semicolon 2022-07-29 17:28:17 +02:00
lib ida: don't use BUG_ON() for debugging 2022-07-12 16:42:25 +02:00
mm mm: fix page leak with multiple threads mapping the same page 2022-08-03 12:05:17 +02:00
net Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put 2022-08-03 12:05:15 +02:00
samples samples/landlock: Format with clang-format 2022-06-09 10:30:46 +02:00
scripts x86/retbleed: Add fine grained Kconfig knobs 2022-07-23 12:56:56 +02:00
security lockdown: Fix kexec lockdown bypass with ima policy 2022-07-29 17:27:55 +02:00
sound ASoC: SOF: Intel: disable IMR boot when resuming from ACPI S4 and S5 states 2022-07-29 17:28:17 +02:00
tools KVM: selftests: Fix target thread to be migrated in rseq_test 2022-07-29 17:28:13 +02:00
usr Kbuild updates for v5.18 2022-03-31 11:59:03 -07:00
virt KVM: Don't null dereference ops->destroy 2022-07-29 17:28:14 +02:00
.clang-format genirq/msi: Make interrupt allocation less convoluted 2021-12-16 22:22:20 +01:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap hotfixes for 5.18-rc7 2022-05-13 10:22:37 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: replace a Microchip AT91 maintainer 2022-02-09 11:30:01 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: Remove iommu@lists.linux-foundation.org 2022-07-12 16:42:15 +02:00
Makefile Linux 5.18.15 2022-07-29 17:28:18 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.