Go to file
Axel Rasmussen fc71884a5f mm: userfaultfd: add new UFFDIO_POISON ioctl
The basic idea here is to "simulate" memory poisoning for VMs.  A VM
running on some host might encounter a memory error, after which some
page(s) are poisoned (i.e., future accesses SIGBUS).  They expect that
once poisoned, pages can never become "un-poisoned".  So, when we live
migrate the VM, we need to preserve the poisoned status of these pages.

When live migrating, we try to get the guest running on its new host as
quickly as possible.  So, we start it running before all memory has been
copied, and before we're certain which pages should be poisoned or not.

So the basic way to use this new feature is:

- On the new host, the guest's memory is registered with userfaultfd, in
  either MISSING or MINOR mode (doesn't really matter for this purpose).
- On any first access, we get a userfaultfd event. At this point we can
  communicate with the old host to find out if the page was poisoned.
- If so, we can respond with a UFFDIO_POISON - this places a swap marker
  so any future accesses will SIGBUS. Because the pte is now "present",
  future accesses won't generate more userfaultfd events, they'll just
  SIGBUS directly.

UFFDIO_POISON does not handle unmapping previously-present PTEs.  This
isn't needed, because during live migration we want to intercept all
accesses with userfaultfd (not just writes, so WP mode isn't useful for
this).  So whether minor or missing mode is being used (or both), the PTE
won't be present in any case, so handling that case isn't needed.

Similarly, UFFDIO_POISON won't replace existing PTE markers.  This might
be okay to do, but it seems to be safer to just refuse to overwrite any
existing entry (like a UFFD_WP PTE marker).

Link: https://lkml.kernel.org/r/20230707215540.2324998-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:16 -07:00
Documentation memcg: drop kmem.limit_in_bytes 2023-08-18 10:12:11 -07:00
LICENSES
arch arm64: mte: simplify swap tag restoration logic 2023-08-18 10:12:02 -07:00
block block-6.5-2023-07-21 2023-07-22 11:05:15 -07:00
certs
crypto crypto: algif_hash - Fix race between MORE and non-MORE sends 2023-07-08 22:48:42 +10:00
drivers memory tier: rename destroy_memory_type() to put_memory_type() 2023-08-18 10:12:11 -07:00
fs mm: userfaultfd: add new UFFDIO_POISON ioctl 2023-08-18 10:12:16 -07:00
include mm: userfaultfd: add new UFFDIO_POISON ioctl 2023-08-18 10:12:16 -07:00
init mm: remove arguments of show_mem() 2023-08-18 10:12:02 -07:00
io_uring io_uring-6.5-2023-07-28 2023-07-28 10:19:44 -07:00
ipc
kernel mm/mm_init.c: remove obsolete macro HASH_SMALL 2023-08-18 10:12:07 -07:00
lib maple_tree: add a fast path case in mas_wr_slot_store() 2023-08-18 10:12:05 -07:00
mm mm: userfaultfd: add new UFFDIO_POISON ioctl 2023-08-18 10:12:16 -07:00
net A patch to reduce the potential for erroneous RBD exclusive lock 2023-07-28 10:47:24 -07:00
rust
samples arm64: ftrace: Add direct call trampoline samples support 2023-07-10 17:51:54 -04:00
scripts x86: 2023-07-30 11:19:08 -07:00
security security: keys: perform capable check only on privileged operations 2023-07-28 18:07:41 +00:00
sound ASoC: Fixes for v6.5 2023-07-27 14:54:23 +02:00
tools selftests/memfd: sysctl: fix MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2023-08-18 10:12:11 -07:00
usr
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-07-29 11:05:28 -04:00
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore Revert ".gitignore: ignore *.cover and *.mbx" 2023-07-04 15:05:12 -07:00
.mailmap mailmap: update remaining active codeaurora.org email addresses 2023-07-27 13:07:05 -07:00
.rustfmt.toml
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS USB fixes for 6.5-rc4 2023-07-30 11:57:51 -07:00
Makefile Linux 6.5-rc4 2023-07-30 13:23:47 -07:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.