linux-stable/include
Axel Rasmussen 2d5de004e0 userfaultfd: add /dev/userfaultfd for fine grained access control
Historically, it has been shown that intercepting kernel faults with
userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of
time) can be exploited, or at least can make some kinds of exploits
easier.  So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we
changed things so, in order for kernel faults to be handled by
userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must
be configured so that any unprivileged user can do it.

In a typical implementation of a hypervisor with live migration (take
QEMU/KVM as one such example), we do indeed need to be able to handle
kernel faults.  But, both options above are less than ideal:

- Toggling the sysctl increases attack surface by allowing any
  unprivileged user to do it.

- Granting the live migration process CAP_SYS_PTRACE gives it this
  ability, but *also* the ability to "observe and control the
  execution of another process [...], and examine and change [its]
  memory and registers" (from ptrace(2)). This isn't something we need
  or want to be able to do, so granting this permission violates the
  "principle of least privilege".

This is all a long winded way to say: we want a more fine-grained way to
grant access to userfaultfd, without granting other additional permissions
at the same time.

To achieve this, add a /dev/userfaultfd misc device.  This device provides
an alternative to the userfaultfd(2) syscall for the creation of new
userfaultfds.  The idea is, any userfaultfds created this way will be able
to handle kernel faults, without the caller having any special
capabilities.  Access to this mechanism is instead restricted using e.g. 
standard filesystem permissions.

[axelrasmussen@google.com: Handle misc_register() failure properly]
  Link: https://lkml.kernel.org/r/20220819205201.658693-3-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-3-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11 20:25:48 -07:00
..
acpi Merge branch 'acpi-properties' 2022-08-11 19:21:03 +02:00
asm-generic Seventeen hotfixes. Mostly memory management things. Ten patches are 2022-08-28 14:49:59 -07:00
clocksource
crypto for-5.20/block-2022-08-04 2022-08-04 20:00:14 -07:00
drm
dt-bindings power supply and reset changes for the v6.0 series 2022-08-12 09:37:33 -07:00
keys
kunit
kvm
linux mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse 2022-09-11 20:25:46 -07:00
math-emu
media SPDX changes for 6.0-rc1 2022-08-04 12:12:54 -07:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf 2022-08-24 19:18:10 -07:00
pcmcia
ras mm, hwpoison: enable memory error handling on 1GB hugepage 2022-08-08 18:06:44 -07:00
rdma dma-mapping updates 2022-08-06 10:56:45 -07:00
rv
scsi SCSI misc on 20220813 2022-08-13 13:41:48 -07:00
soc net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset 2022-08-17 21:58:32 -07:00
sound
target
trace mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage 2022-09-11 20:25:46 -07:00
uapi userfaultfd: add /dev/userfaultfd for fine grained access control 2022-09-11 20:25:48 -07:00
ufs scsi: ufs: core: Enable link lost interrupt 2022-08-11 22:04:32 -04:00
vdso
video
xen x86/xen: Add support for HVMOP_set_evtchn_upcall_vector 2022-08-12 11:28:21 +02:00