linux-stable/fs
Linus Torvalds 09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
..
9p
adfs adfs: remove writepage implementation 2023-12-29 11:58:33 -08:00
affs affs: d_obtain_alias(ERR_PTR(...)) will do the right thing 2023-12-21 12:51:02 -05:00
afs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs fix buggered locking in bch2_ioctl_subvolume_destroy() 2024-01-12 18:04:01 -08:00
befs befs: d_obtain_alias(ERR_PTR(...)) will do the right thing 2023-12-21 12:51:02 -05:00
bfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
btrfs for-6.8/block-2024-01-08 2024-01-11 13:58:04 -08:00
cachefiles fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
ceph dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
coda dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
configfs
cramfs
crypto fscrypt: document that CephFS supports fscrypt now 2023-12-26 22:55:42 -06:00
debugfs Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog' 2024-01-04 13:19:40 +01:00
devpts fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
dlm dlm: update format header reflect current format 2023-12-20 15:36:48 -06:00
ecryptfs fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
efivarfs efivarfs: automatically update super block flag 2023-12-11 11:19:18 +01:00
efs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
erofs erofs: make erofs_{err,info}() support NULL sb parameter 2024-01-10 19:59:39 +08:00
exfat exfat: do not zero the extended part 2024-01-08 21:57:22 +09:00
exportfs
ext2 fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
ext4 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
f2fs f2fs: fix double free of f2fs_sb_info 2024-01-12 18:55:09 -08:00
fat vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
freevxfs freevxfs: lookup: fix function params kernel-doc 2023-12-20 15:02:58 -08:00
fscache
fuse vfs-6.8.rw 2024-01-08 11:11:51 -08:00
gfs2 dlm for 6.8 2024-01-10 10:17:23 -08:00
hfs hfs: really remove hfs_writepage 2023-12-29 11:58:34 -08:00
hfsplus Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
hostfs hostfs: use d_splice_alias() calling conventions to simplify failure exits 2023-12-21 12:51:00 -05:00
hpfs
hugetlbfs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
iomap mm: add folio_fill_tail() and use it in iomap 2023-12-10 16:51:36 -08:00
isofs
jbd2 jbd2: abort journal when detecting metadata writeback error of fs dev 2024-01-04 23:42:21 -05:00
jffs2 jffs2: mark __jffs2_dbg_superblock_counts() static 2023-12-10 17:21:43 -08:00
jfs jfs: Add missing set_freezable() for freezable kthread 2024-01-02 11:06:52 -06:00
kernfs
lockd sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
minix minixfs kmap_local_page() switchover and related fixes - very similar to sysv series. 2024-01-11 19:54:18 -08:00
netfs
nfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
nfs_common
nfsd misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
nilfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
nls
notify dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ntfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ntfs3 vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
ocfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
omfs
openpromfs
orangefs orangefs: saner arguments passing in readdir guts 2023-12-21 12:53:36 -05:00
overlayfs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
proc 17 hotfixes. 10 address post-6.7 issues and the other 7 are cc:stable. 2024-01-17 09:31:36 -08:00
pstore pstore: inode: Use cleanup.h for struct pstore_private 2023-12-08 14:15:44 -08:00
qnx4 qnx4: Use get_directory_fname() in qnx4_match() 2023-12-13 11:19:18 -08:00
qnx6
quota sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ramfs mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
reiserfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
romfs
smb 11 ksmbd server fixes 2024-01-11 20:27:41 -08:00
squashfs Squashfs: fix variable overflow triggered by sysbot 2023-12-10 17:21:26 -08:00
sysfs
sysv sysv: remove writepage implementation 2023-12-29 11:58:35 -08:00
tracefs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ubifs ubifs: fix kernel-doc warnings 2024-01-06 23:49:50 +01:00
udf misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
ufs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
unicode
vboxsf fs: vboxsf: fix a kernel-doc warning 2023-12-08 15:32:31 -07:00
verity Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
xfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
zonefs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
aio.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
anon_inodes.c Merge branch 'kvm-guestmemfd' into HEAD 2023-11-14 08:31:31 -05:00
attr.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
backing-file.c fs: factor out backing_file_mmap() helper 2023-12-23 16:35:09 +02:00
bad_inode.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
char_dev.c
compat_binfmt_elf.c
coredump.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
d_path.c
dax.c fs : Fix warning using plain integer as NULL 2023-11-18 15:00:01 +01:00
dcache.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
direct-io.c fs : Fix warning using plain integer as NULL 2023-11-18 15:00:01 +01:00
drop_caches.c
eventfd.c eventfd: Remove usage of the deprecated ida_simple_xx() API 2023-12-12 14:24:55 +01:00
eventpoll.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
exec.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
fcntl.c
fhandle.c
file.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
file_table.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
filesystems.c
fs-writeback.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
internal.h dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2023-12-24 15:48:03 -05:00
Kconfig Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
Kconfig.binfmt
kernel_read_file.c
libfs.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
locks.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
Makefile fs: prepare for stackable filesystems backing file helpers 2023-12-23 16:35:08 +02:00
mbcache.c
mnt_idmapping.c mnt_idmapping: decouple from namespaces 2023-11-28 14:08:47 +01:00
mount.h mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
mpage.c fs: convert block_write_full_page to block_write_full_folio 2023-12-29 11:58:35 -08:00
namei.c fix buggered locking in bch2_ioctl_subvolume_destroy() 2024-01-12 18:04:01 -08:00
namespace.c fs: rework listmount() implementation 2024-01-13 13:06:25 +01:00
nsfs.c nsfs: use d_make_root() 2023-11-25 02:49:43 -05:00
open.c vfs-6.8.rw 2024-01-08 11:11:51 -08:00
pipe.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
pnode.c mounts: keep list of mounts in an rbtree 2023-11-18 14:56:16 +01:00
pnode.h
posix_acl.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
proc_namespace.c namespace: extract show_path() helper 2023-11-18 14:56:16 +01:00
read_write.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
readdir.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
remap_range.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
select.c
seq_file.c
signalfd.c
splice.c fs: use splice_copy_file_range() inline helper 2023-12-12 16:20:02 +01:00
stack.c
stat.c vfs-6.8.mount 2024-01-08 10:57:34 -08:00
statfs.c
super.c fscrypt updates for 6.8 2024-01-10 10:24:49 -08:00
sync.c
sysctls.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
timerfd.c
userfaultfd.c Generic: 2024-01-17 13:03:37 -08:00
utimes.c
xattr.c