Commit graph

2537 commits

Author SHA1 Message Date
Christian Brauner
060e6c7d17
porting: document superblock as block device holder
We've changed the holder of the block device which has consequences.
Document this clearly and in detail so filesystem and vfs developers
have a proper digital paper trail.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 14:22:02 +02:00
Christian Brauner
2ba0dd6562
porting: document new block device opening order
We've changed the order of opening block devices and superblock
handling. Let's document this so filesystem and vfs developers have
a proper digital paper trail.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20 14:22:02 +02:00
Linus Torvalds
3669558bdf for-6.6-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmT/hwAACgkQxWXV+ddt
 WDsn7hAAngwEMKEAH9Jvu/BtHgRYcAdsGh5Mxw34aQf1+DAaH03GGsZjN6hfHYo4
 FMsnnvoZD5VPfuaFaQVd+mS9mRzikm503W7KfZFAPAQTOjz50RZbohLnZWa3eFbI
 46OcpoHusxwoYosEmIAt+dcw/gDlT9fpj+W11dKYtwOEjCqGA/OeKoVenfk38hVJ
 r+XhLwZFf4dPIqE3Ht26UtJk87Xs2X0/LQxOX3vM1MZ+l38N4dyo7TQnwfTHlQNw
 AK9sK6vp3rpRR96rvTV1dWr9lnmE7wky+Vh36DN/jxpzbW7Wx8IVoobBpcsO4Tyk
 Vw/rdjB7g7LfBmjLFhWvvQ73jv0WjIUUzXH17RuxOeyAQJ9tXFztVMh+QoVVC/Ka
 NxwA5uqyJKR7DIA+kLL06abUnASUVgP6Krdv9Fk7rYCKWluWk1k9ls9XaFFhytvg
 eeno/UB0px1rwps5P5zfaSXLIXEl53Luy5rFhTMCCNQfXyo+Qe6PJyTafR3E0uP8
 aXJV1lPG+o7qi9Vwg+20yy//1sE5gR0dLrcTaup3/20RK6eljZ/bNSkl3GJR9mlS
 YF+J/Ccia06y8Qo0xaeCofxkoI3J/PK6KPOTt8yZDgYoetYgHhrfBRO0I7ZU4Edq
 10512hAeskzPt6+5348+/jOEENASffXKP3FJSdDEzWd33vtlaHE=
 =mHTa
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - several fixes for handling directory item (inserting, removing,
   iteration, error handling)

 - fix transaction commit stalls when auto relocation is running and
   blocks other tasks that want to commit

 - fix a build error when DEBUG is enabled

 - fix lockdep warning in inode number lookup ioctl

 - fix race when finishing block group creation

 - remove link to obsolete wiki in several files

* tag 'for-6.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org
  btrfs: assert delayed node locked when removing delayed item
  btrfs: remove BUG() after failure to insert delayed dir index item
  btrfs: improve error message after failure to add delayed dir index item
  btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folio
  btrfs: check for BTRFS_FS_ERROR in pending ordered assert
  btrfs: fix lockdep splat and potential deadlock after failure running delayed items
  btrfs: do not block starts waiting on previous transaction commit
  btrfs: release path before inode lookup during the ino lookup ioctl
  btrfs: fix race between finishing block group creation and its item update
2023-09-12 11:28:00 -07:00
Bhaskar Chowdhury
5facccc940 MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org
The wiki has been archived and is not updated anymore. Remove or replace
the links in files that contain it (MAINTAINERS, Kconfig, docs).

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08 14:21:27 +02:00
Linus Torvalds
7ba2090ca6 Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS!  The list of things which don't work with
 encryption should be fairly short, mostly around the edges: fallocate
 (not supported well in CephFS to begin with), copy_file_range (requires
 re-encryption), non-default striping patterns.
 
 This was a multi-year effort principally by Jeff Layton with assistance
 from Xiubo Li, Luís Henriques and others, including several dependant
 changes in the MDS, netfs helper library and fscrypt framework itself.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmT4pl4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi5kzB/4sMgzZyUa3T1vA/G2pPvEkyy1qDxsW
 y+o4dDMWA9twcrBVpNuGd54wbXpmO/LAekHEdorjayH+f0zf10MsnP1ePz9WB3NG
 jr7RRujb+Gpd2OFYJXGSEbd3faTg8M2kpGCCrVe7SFNoyu8z9NwFItwWMog5aBjX
 ODGQrq+kA4ARA6xIqwzF5gP0zr+baT9rWhQdm7Xo9itWdosnbyDLJx1dpEfLuqBX
 te3SmifDzedn3Gw73hdNo/+ybw0kHARoK+RmXCTsoDDQw+JsoO9KxZF5Q8QcDELq
 2woPNp0Hl+Dm4MkzGnPxv56Qj8ZDViS59syXC0CfGRmu4nzF1Rw+0qn5
 =/WlE
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Mixed with some fixes and cleanups, this brings in reasonably complete
  fscrypt support to CephFS! The list of things which don't work with
  encryption should be fairly short, mostly around the edges: fallocate
  (not supported well in CephFS to begin with), copy_file_range
  (requires re-encryption), non-default striping patterns.

  This was a multi-year effort principally by Jeff Layton with
  assistance from Xiubo Li, Luís Henriques and others, including several
  dependant changes in the MDS, netfs helper library and fscrypt
  framework itself"

* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
  ceph: make num_fwd and num_retry to __u32
  ceph: make members in struct ceph_mds_request_args_ext a union
  rbd: use list_for_each_entry() helper
  libceph: do not include crypto/algapi.h
  ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
  ceph: fix updating i_truncate_pagecache_size for fscrypt
  ceph: wait for OSD requests' callbacks to finish when unmounting
  ceph: drop messages from MDS when unmounting
  ceph: update documentation regarding snapshot naming limitations
  ceph: prevent snapshot creation in encrypted locked directories
  ceph: add support for encrypted snapshot names
  ceph: invalidate pages when doing direct/sync writes
  ceph: plumb in decryption during reads
  ceph: add encryption support to writepage and writepages
  ceph: add read/modify/write to ceph_sync_write
  ceph: align data in pages in ceph_sync_write
  ceph: don't use special DIO path for encrypted inodes
  ceph: add truncate size handling support for fscrypt
  ceph: add object version support for sync read
  libceph: allow ceph_osdc_new_request to accept a multi-op read
  ...
2023-09-06 12:10:15 -07:00
Linus Torvalds
65d6e954e3 gfs2 fixes
- Fix a glock state (non-)transition bug when a dlm request times out
   and is canceled, and we have locking requests that can now be granted
   immediately.
 
 - Various fixes and cleanups in how the logd and quotad daemons are
   woken up and terminated.
 
 - Fix several bugs in the quota data reference counting and shrinking.
   Free quota data objects synchronously in put_super() instead of
   letting call_rcu() run wild.
 
 - Make sure not to deallocate quota data during a withdraw; rather, defer
   quota data deallocation to put_super().  Withdraws can happen in
   contexts in which callers on the stack are holding quota data references.
 
 - Many minor quota fixes and cleanups by Bob.
 
 - Update the the mailing list address for gfs2 and dlm.  (It's the same
   list for both and we are moving it to gfs2@lists.linux.dev.)
 
 - Various other minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmT3T7UUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqxhw/+IWp+4cY4htNkTRG7xkheTeQ+5whG
 NU40mp7Hj+WY5GoHqsk676q1pBkVAq5mNN1kt9S/oC6lLHrdu1HLpdIkgFow2nAC
 nDqlEqx9/Da9Q4H/+K442usO90S4o1MmOXOE9xcGcvJLqK4FLDOVDXbUWa43OXrK
 4HxgjgGSNPF4itD+U0o/V4f19jQ+4cNwmo6hGLyhsYillaUHiIXJezQlH7XycByM
 qGJqlG6odJ56wE38NG8Bt9Lj+91PsLLqO1TJxSYyzpf0h9QGQ2ySvu6/esTXwxWO
 XRuT4db7yjyAUhJoJMw+YU77xWQTz0/jriIDS7VqzvR1ns3GPaWdtb31TdUTBG4H
 IvBA8ep3oxHtcYFoPzCLBXgOIDej6KjAgS3RSv51yLeaZRHFUBc21fTSXbcDTIUs
 gkusZlRNQ9ANdBCVyf8hZxbE54HnaBJ8dKMZtynOXJEHs0EtGV8YKCNIpkFLxOvE
 vZkKcRsmVtuZ9fVhX1iH7dYmcsCMPI8RNo47k7hHk2EG8dU+eqyPSbi4QCmErNFf
 DlqX+fIuiDtOkbmWcrb2qdphn6j6bMLhDaOMJGIBOmgOPi+AU9dNAfmtu1cG4u1b
 2TFyUISayiwqHJQgguzvDed15fxexYdgoLB7O9t9TMbCENxisguNa5TsAN6ZkiLQ
 0hY6h80xSR2kCPU=
 =EonA
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix a glock state (non-)transition bug when a dlm request times out
   and is canceled, and we have locking requests that can now be granted
   immediately

 - Various fixes and cleanups in how the logd and quotad daemons are
   woken up and terminated

 - Fix several bugs in the quota data reference counting and shrinking.
   Free quota data objects synchronously in put_super() instead of
   letting call_rcu() run wild

 - Make sure not to deallocate quota data during a withdraw; rather,
   defer quota data deallocation to put_super(). Withdraws can happen in
   contexts in which callers on the stack are holding quota data
   references

 - Many minor quota fixes and cleanups by Bob

 - Update the the mailing list address for gfs2 and dlm. (It's the same
   list for both and we are moving it to gfs2@lists.linux.dev)

 - Various other minor cleanups

* tag 'gfs2-v6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (51 commits)
  MAINTAINERS: Update dlm mailing list
  MAINTAINERS: Update gfs2 mailing list
  gfs2: change qd_slot_count to qd_slot_ref
  gfs2: check for no eligible quota changes
  gfs2: Remove useless assignment
  gfs2: simplify slot_get
  gfs2: Simplify qd2offset
  gfs2: introduce qd_bh_get_or_undo
  gfs2: Remove quota allocation info from quota file
  gfs2: use constant for array size
  gfs2: Set qd_sync_gen in do_sync
  gfs2: Remove useless err set
  gfs2: Small gfs2_quota_lock cleanup
  gfs2: move qdsb_put and reduce redundancy
  gfs2: improvements to sysfs status
  gfs2: Don't try to sync non-changes
  gfs2: Simplify function need_sync
  gfs2: remove unneeded pg_oflow variable
  gfs2: remove unneeded variable done
  gfs2: pass sdp to gfs2_write_buf_to_page
  ...
2023-09-05 13:00:28 -07:00
Linus Torvalds
5eea5820c7 - Stefan Roesch has added ksm statistics to /proc/pid/smaps
- Also a number of singleton patches, mainly cleanups and leftovers.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZPZGXwAKCRDdBJ7gKXxA
 jkjpAP9F0t5xy3JGs8Iew47Yqva+fvvrZdUSx3aHIZ/C3HyaJwEAi7DwzqludyHi
 851+qSdyX3bWnDEuejuNeMykh2QF1wo=
 =pw9A
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-09-04-14-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - Stefan Roesch has added ksm statistics to /proc/pid/smaps

 - Also a number of singleton patches, mainly cleanups and leftovers

* tag 'mm-stable-2023-09-04-14-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/kmemleak: move up cond_resched() call in page scanning loop
  mm: page_alloc: remove stale CMA guard code
  MAINTAINERS: add rmap.h to mm entry
  rmap: remove anon_vma_link() nommu stub
  proc/ksm: add ksm stats to /proc/pid/smaps
  mm/hwpoison: rename hwp_walk* to hwpoison_walk*
  mm: memory-failure: add PageOffline() check
2023-09-05 10:56:27 -07:00
Andreas Gruenbacher
0b93bac227 gfs2: Remove LM_FLAG_PRIORITY flag
The last user of this flag was removed in commit b77b4a4815 ("gfs2:
Rework freeze / thaw logic").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-05 15:58:16 +02:00
Stefan Roesch
8b47933544 proc/ksm: add ksm stats to /proc/pid/smaps
With madvise and prctl KSM can be enabled for different VMA's.  Once it is
enabled we can query how effective KSM is overall.  However we cannot
easily query if an individual VMA benefits from KSM.

This commit adds a KSM section to the /prod/<pid>/smaps file.  It reports
how many of the pages are KSM pages.  Note that KSM-placed zeropages are
not included, only actual KSM pages.

Here is a typical output:

7f420a000000-7f421a000000 rw-p 00000000 00:00 0
Size:             262144 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:               51212 kB
Pss:                8276 kB
Shared_Clean:        172 kB
Shared_Dirty:      42996 kB
Private_Clean:       196 kB
Private_Dirty:      7848 kB
Referenced:        15388 kB
Anonymous:         51212 kB
KSM:               41376 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:             202016 kB
SwapPss:            3882 kB
Locked:                0 kB
THPeligible:    0
ProtectionKey:         0
ksm_state:          0
ksm_skip_base:      0
ksm_skip_count:     0
VmFlags: rd wr mr mw me nr mg anon

This information also helps with the following workflow:
- First enable KSM for all the VMA's of a process with prctl.
- Then analyze with the above smaps report which VMA's benefit the most
- Change the application (if possible) to add the corresponding madvise
calls for the VMA's that benefit the most

[shr@devkernel.io: v5]
  Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-02 15:17:33 -07:00
Linus Torvalds
f35d170615 NFSD 6.6 Release Notes
I'm thrilled to announce that the Linux in-kernel NFS server now
 offers NFSv4 write delegations. A write delegation enables a client
 to cache data and metadata for a single file more aggressively,
 reducing network round trips and server workload. Many thanks to Dai
 Ngo for contributing this facility, and to Jeff Layton and Neil
 Brown for reviewing and testing it.
 
 This release also sees the removal of all support for DES- and
 triple-DES-based Kerberos encryption types in the kernel's SunRPC
 implementation. These encryption types have been deprecated by the
 Internet community for years and are considered insecure. This
 change affects both the in-kernel NFS client and server.
 
 The server's UDP and TCP socket transports have now fully adopted
 David Howells' new bio_vec iterator so that no more than one
 sendmsg() call is needed to transmit each RPC message. In
 particular, this helps kTLS optimize record boundaries when sending
 RPC-with-TLS replies, and it takes the server a baby step closer to
 handling file I/O via folios.
 
 We've begun work on overhauling the SunRPC thread scheduler to
 remove a costly linked-list walk when looking for an idle RPC
 service thread to wake. The pre-requisites are included in this
 release. Thanks to Neil Brown for his ongoing work on this
 improvement.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmTwoa0ACgkQM2qzM29m
 f5cZvw/8CmFVNC27aMrJEhRRhwwrXbLzUkWh9GCYkG98PHYiLxLTvZ6qELXAax/a
 UjSgIDSRcWl4z8M/tyBQtgsw7NADr+7XWqEoXR7HZ5pEEC/KNGM0oQWQ92ojjKYy
 JmHdB02uaDJfcd9ioFTU13cw7q2BQfoe2xLI8yqis2vcVSu92AM7aIw+cvJIpwQB
 inA3TIIsYTV/gPByXSfEtvmYACadoFiMvfvYwaWhjFS9MdSzFmcVG0Dp3EFIig29
 odmWEofcz6uIvUWvUswWEGdoSu7uOKIztSuAI4PlTwaofUaSKG6e5kmtpr3cLERD
 Uhg2lm5JgqkXBd7QHObNimJ4DtQzFwHmhA08qo8rd/zba75mn/Hr5IF0q3Rxs99J
 SRYHcAeP8afKn5Ge0yzoTgCNcqhfz8KLRfoCQX49mljr+muNxld4nMklD2KdUwJi
 XEB512/q3E4nUgopXZiSJYQYAq1CfdR5WpGipZ9X0XK9HZBDF/qhXGtk1YQuNWyj
 ZxJS3bfBza4oVIvP5/ehjCIQwOvqkcrC5zZGDIgDvw9Q6L3L1wqmVntsdCLCLRcJ
 jB4IOsj+DECfJ6w2vP2SZ3GeMtFnyuTQjhUTkjPuAKTBBiKo4Tj0o/agpfDYbWZy
 1l3a2yH5jqJgkm4MaVh3YHRJGc0ub0ccpIrs3QQ4jvjMLQ/3Gcs=
 =XGHs
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "I'm thrilled to announce that the Linux in-kernel NFS server now
  offers NFSv4 write delegations. A write delegation enables a client to
  cache data and metadata for a single file more aggressively, reducing
  network round trips and server workload. Many thanks to Dai Ngo for
  contributing this facility, and to Jeff Layton and Neil Brown for
  reviewing and testing it.

  This release also sees the removal of all support for DES- and
  triple-DES-based Kerberos encryption types in the kernel's SunRPC
  implementation. These encryption types have been deprecated by the
  Internet community for years and are considered insecure. This change
  affects both the in-kernel NFS client and server.

  The server's UDP and TCP socket transports have now fully adopted
  David Howells' new bio_vec iterator so that no more than one sendmsg()
  call is needed to transmit each RPC message. In particular, this helps
  kTLS optimize record boundaries when sending RPC-with-TLS replies, and
  it takes the server a baby step closer to handling file I/O via
  folios.

  We've begun work on overhauling the SunRPC thread scheduler to remove
  a costly linked-list walk when looking for an idle RPC service thread
  to wake. The pre-requisites are included in this release. Thanks to
  Neil Brown for his ongoing work on this improvement"

* tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
  Documentation: Add missing documentation for EXPORT_OP flags
  SUNRPC: Remove unused declaration rpc_modcount()
  SUNRPC: Remove unused declarations
  NFSD: da_addr_body field missing in some GETDEVICEINFO replies
  SUNRPC: Remove return value of svc_pool_wake_idle_thread()
  SUNRPC: make rqst_should_sleep() idempotent()
  SUNRPC: Clean up svc_set_num_threads
  SUNRPC: Count ingress RPC messages per svc_pool
  SUNRPC: Deduplicate thread wake-up code
  SUNRPC: Move trace_svc_xprt_enqueue
  SUNRPC: Add enum svc_auth_status
  SUNRPC: change svc_xprt::xpt_flags bits to enum
  SUNRPC: change svc_rqst::rq_flags bits to enum
  SUNRPC: change svc_pool::sp_flags bits to enum
  SUNRPC: change cache_head.flags bits to enum
  SUNRPC: remove timeout arg from svc_recv()
  SUNRPC: change svc_recv() to return void.
  SUNRPC: call svc_process() from svc_recv().
  nfsd: separate nfsd_last_thread() from nfsd_put()
  nfsd: Simplify code around svc_exit_thread() call in nfsd()
  ...
2023-08-31 15:32:18 -07:00
Linus Torvalds
df57721f9a Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
 krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
 Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
 gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
 Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
 ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
 rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
 Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
 VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
 AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
 =3UUm
 -----END PGP SIGNATURE-----

Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
2023-08-31 12:20:12 -07:00
Linus Torvalds
cd99b9eb4b Documentation work keeps chugging along; stuff for 6.6 includes:
- Work from Carlos Bilbao to integrate rustdoc output into the generated
   HTML documentation.  This took some work to figure out how to do it
   without slowing the docs build and without creating people who don't have
   Rust installed, but Carlos got there.
 
 - Move the loongarch and mips architecture documentation under
   Documentation/arch/.
 
 - Some more maintainer documentation from Jakub
 
 ...plus the usual assortment of updates, translations, and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmTvqNkPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YgIgH/3drfLtlFtzLqDOzrzDXS8yGnE3pPdxw796b
 /ZFzAK16wYKaKevYoIz8bVGGKaE1sEUW0mhlq4KGdfZuxLG8YnWS8URyCW4FDU2E
 6qNL+8oJ8LZfID46f9Q8ZgfEz7yF/mhCqPk7MEswYtwbscs2ZTGCTGYB/5BHlBuT
 LR+M89uLmHgr8S1o24v30OgiX+VvQFyu0xoxIhbiqUZvBd/XdfX2pgYd9BGzMj5q
 C2ZP+V14g36c5pV0EO9TwhCXOF/WVrp7DbjbfWAsqBSLxvpXPydH2q1DUzGeQtP1
 exujrBD1O8q3pPdaNA5R+h6cWlHmUZug9mE4BRLp9ErGrozwJsQ=
 =C3Uv
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.6' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Documentation work keeps chugging along; this includes:

   - Work from Carlos Bilbao to integrate rustdoc output into the
     generated HTML documentation. This took some work to figure out how
     to do it without slowing the docs build and without creating people
     who don't have Rust installed, but Carlos got there

   - Move the loongarch and mips architecture documentation under
     Documentation/arch/

   - Some more maintainer documentation from Jakub

  ... plus the usual assortment of updates, translations, and fixes"

* tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits)
  Docu: genericirq.rst: fix irq-example
  input: docs: pxrc: remove reference to phoenix-sim
  Documentation: serial-console: Fix literal block marker
  docs/mm: remove references to hmm_mirror ops and clean typos
  docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add()
  Documentation: Fix typos
  Documentation/ABI: Fix typos
  scripts: kernel-doc: fix macro handling in enums
  scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN]
  Documentation: riscv: Update boot image header since EFI stub is supported
  Documentation: riscv: Add early boot document
  Documentation: arm: Add bootargs to the table of added DT parameters
  docs: kernel-parameters: Refer to the correct bitmap function
  doc: update params of memhp_default_state=
  docs: Add book to process/kernel-docs.rst
  docs: sparse: fix invalid link addresses
  docs: vfs: clean up after the iterate() removal
  docs: Add a section on surveys to the researcher guidelines
  docs: move mips under arch
  docs: move loongarch under arch
  ...
2023-08-30 20:05:42 -07:00
Linus Torvalds
53ea7f624f New code for 6.6:
* Chandan Babu will be taking over as the XFS release manager.  He has
    reviewed all the patches that are in this branch, though I'm signing
    the branch one last time since I'm still technically maintainer. :P
  * Create a maintainer entry profile for XFS in which we lay out the
    various roles that I have played for many years.  Aside from release
    manager, the remaining roles are as yet unfilled.
  * Start merging online repair -- we now have in-memory pageable memory
    for staging btrees, a bunch of pending fixes, and we've started the
    process of refactoring the scrub support code to support more of
    repair.  In particular, reaping of old blocks from damaged structures.
  * Scrub the realtime summary file.
  * Fix a bug where scrub's quota iteration only ever returned the root
    dquot.  Oooops.
  * Fix some typos.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZOQE2AAKCRBKO3ySh0YR
 pvmZAQDe+KceaVx6Dv2f9ihckeS2dILSpDTo1bh9BeXnt005VwD/ceHTaJxEl8lp
 u/dixFDkRgp9RYtoTAK2WNiUxYetsAc=
 =oZN6
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:

 - Chandan Babu will be taking over as the XFS release manager. He has
   reviewed all the patches that are in this branch, though I'm signing
   the branch one last time since I'm still technically maintainer. :P

 - Create a maintainer entry profile for XFS in which we lay out the
   various roles that I have played for many years.  Aside from release
   manager, the remaining roles are as yet unfilled.

 - Start merging online repair -- we now have in-memory pageable memory
   for staging btrees, a bunch of pending fixes, and we've started the
   process of refactoring the scrub support code to support more of
   repair.  In particular, reaping of old blocks from damaged structures.

 - Scrub the realtime summary file.

 - Fix a bug where scrub's quota iteration only ever returned the root
   dquot.  Oooops.

 - Fix some typos.

[ Pull request from Chandan Babu, but signed tag and description from
  Darrick Wong, thus the first person singular above is Darrick, not
  Chandan ]

* tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (37 commits)
  fs/xfs: Fix typos in comments
  xfs: fix dqiterate thinko
  xfs: don't check reflink iflag state when checking cow fork
  xfs: simplify returns in xchk_bmap
  xfs: rewrite xchk_inode_is_allocated to work properly
  xfs: hide xfs_inode_is_allocated in scrub common code
  xfs: fix agf_fllast when repairing an empty AGFL
  xfs: allow userspace to rebuild metadata structures
  xfs: clear pagf_agflreset when repairing the AGFL
  xfs: allow the user to cancel repairs before we start writing
  xfs: don't complain about unfixed metadata when repairs were injected
  xfs: implement online scrubbing of rtsummary info
  xfs: always rescan allegedly healthy per-ag metadata after repair
  xfs: move the realtime summary file scrubber to a separate source file
  xfs: wrap ilock/iunlock operations on sc->ip
  xfs: get our own reference to inodes that we want to scrub
  xfs: track usage statistics of online fsck
  xfs: improve xfarray quicksort pivot
  xfs: create scaffolding for creating debugfs entries
  xfs: cache pages used for xfarray quicksort convergence
  ...
2023-08-30 12:34:12 -07:00
Linus Torvalds
63580f669d overlayfs update for 6.6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmTu0QoACgkQEVvVyTe/
 1WpbzBAAjIZXzhn8KldDpG0muw9JKaSOxM45uhZE1s/2uKsVCyp4k3lubTbxxYO1
 S9rUjhF2gSJFOfuSOK/XXEKXyu4MGT7iy7pKswu0k8+AHDDRBksPXJKA/AkhLPUr
 vX1pU6aWw2OSn1xdhIgY+F4DveyzYQL/CEoUzFyRPxSB0G/yjktRAjdZ2HL4cAvN
 eVXPyTj0bd4LVj1ITla4uj8DbgivrqmRJbZ9bKnSRE8GXWBriJhV//M2Q3QRno+W
 04TtAvyh+klQeqZFVOQ0reZUFZzYBBZZTmqoFiUzTny7oljWl5F0+JfJOHhRGknG
 LYZCia34+T6TZPhOnZzT/szTDoXVvNJhEf+vBQCqhaCugqJc/2uJdw9CW8ZcDvA9
 ZNOMxEbXE4VgGjJ0HM6MoDMUoIEUiNWEnXWEaKyCAfOPqgYwPy+QeDO4JtBPQpRn
 fwZx7Xpc1FLpTc9feHxzox9o81S8rPRMycUBg2c3KZB6TFnYNDxWIIo365naMCzz
 A8IDVGf+gd+S4NaZvh9FUijciIslYfyFgqwQERZmJnpDk1d1NyeUC7Nn7EkmUpyp
 guRaC+rUcqYP4CpuSHTCPle94qHqiAkbsKSJWebZ2M1j9fjZ+okPw0k83Nih79vu
 vRhs70Ah51v1lpBb0mlDjsV3vKm3Apv8nMJKZvVuC+Cw6Qiob5s=
 =F4Hi
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs updates from Amir Goldstein:

 - add verification feature needed by composefs (Alexander Larsson)

 - improve integration of overlayfs and fanotify (Amir Goldstein)

 - fortify some overlayfs code (Andrea Righi)

* tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: validate superblock in OVL_FS()
  ovl: make consistent use of OVL_FS()
  ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG
  ovl: auto generate uuid for new overlay filesystems
  ovl: store persistent uuid/fsid with uuid=on
  ovl: add support for unique fsid per instance
  ovl: support encoding non-decodable file handles
  ovl: Handle verity during copy-up
  ovl: Validate verity xattr when resolving lowerdata
  ovl: Add versioned header for overlay.metacopy xattr
  ovl: Add framework for verity support
2023-08-30 11:54:09 -07:00
Chuck Lever
b38a6023da Documentation: Add missing documentation for EXPORT_OP flags
The commits that introduced these flags neglected to update the
Documentation/filesystems/nfs/exporting.rst file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-29 17:45:22 -04:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Linus Torvalds
cc0a38d0f6 fscrypt update for 6.6
Just a small documentation improvement.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZOwUaRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyGLAQC3nfSZ/sNNyUwPV46yBWzIDTuKXuCD
 iyd6L1N99pwMiwD5AT0PAylP7xQew7sMBimLvFJLoATP0OPApGR241yH4wo=
 =CYAT
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt update from Eric Biggers:
 "Just a small documentation improvement"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: improve the "Encryption modes and usage" section
2023-08-28 12:15:00 -07:00
Linus Torvalds
6016fc9162 New code for 6.6:
* Make large writes to the page cache fill sparse parts of the cache
    with large folios, then use large memcpy calls for the large folio.
  * Track the per-block dirty state of each large folio so that a
    buffered write to a single byte on a large folio does not result in a
    (potentially) multi-megabyte writeback IO.
  * Allow some directio completions to be performed in the initiating
    task's context instead of punting through a workqueue.  This will
    reduce latency for some io_uring requests.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZM0Z1AAKCRBKO3ySh0YR
 pp7BAQCzkKejCM0185tNIH/faHjzidSisNQkJ5HoB4Opq9U66AEA6IPuAdlPlM/J
 FPW1oPq33Yn7AV4wXjUNFfDLzVb/Fgg=
 =dFBU
 -----END PGP SIGNATURE-----

Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:
 "We've got some big changes for this release -- I'm very happy to be
  landing willy's work to enable large folios for the page cache for
  general read and write IOs when the fs can make contiguous space
  allocations, and Ritesh's work to track sub-folio dirty state to
  eliminate the write amplification problems inherent in using large
  folios.

  As a bonus, io_uring can now process write completions in the caller's
  context instead of bouncing through a workqueue, which should reduce
  io latency dramatically. IOWs, XFS should see a nice performance bump
  for both IO paths.

  Summary:

   - Make large writes to the page cache fill sparse parts of the cache
     with large folios, then use large memcpy calls for the large folio.

   - Track the per-block dirty state of each large folio so that a
     buffered write to a single byte on a large folio does not result in
     a (potentially) multi-megabyte writeback IO.

   - Allow some directio completions to be performed in the initiating
     task's context instead of punting through a workqueue. This will
     reduce latency for some io_uring requests"

* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
  iomap: support IOCB_DIO_CALLER_COMP
  io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
  fs: add IOCB flags related to passing back dio completions
  iomap: add IOMAP_DIO_INLINE_COMP
  iomap: only set iocb->private for polled bio
  iomap: treat a write through cache the same as FUA
  iomap: use an unsigned type for IOMAP_DIO_* defines
  iomap: cleanup up iomap_dio_bio_end_io()
  iomap: Add per-block dirty state tracking to improve performance
  iomap: Allocate ifs in ->write_begin() early
  iomap: Refactor iomap_write_delalloc_punch() function out
  iomap: Use iomap_punch_t typedef
  iomap: Fix possible overflow condition in iomap_write_delalloc_scan
  iomap: Add some uptodate state handling helpers for ifs state bitmap
  iomap: Drop ifs argument from iomap_set_range_uptodate()
  iomap: Rename iomap_page to iomap_folio_state and others
  iomap: Copy larger chunks from userspace
  iomap: Create large folios in the buffered write path
  filemap: Allow __filemap_get_folio to allocate large folios
  filemap: Add fgf_t typedef
  ...
2023-08-28 11:59:52 -07:00
Linus Torvalds
511fb5bafe v6.6-vfs.super
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXpbgAKCRCRxhvAZXjc
 oi8PAQCtXelGZHmTcmevsO8p4Qz7hFpkonZ/TnxKf+RdnlNgPgD+NWi+LoRBpaAj
 xk4z8SqJaTTP4WXrG5JZ6o7EQkUL8gE=
 =2e9I
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull superblock updates from Christian Brauner:
 "This contains the super rework that was ready for this cycle. The
  first part changes the order of how we open block devices and allocate
  superblocks, contains various cleanups, simplifications, and a new
  mechanism to wait on superblock state changes.

  This unblocks work to ultimately limit the number of writers to a
  block device. Jan has already scheduled follow-up work that will be
  ready for v6.7 and allows us to restrict the number of writers to a
  given block device. That series builds on this work right here.

  The second part contains filesystem freezing updates.

  Overview:

  The generic superblock changes are rougly organized as follows
  (ignoring additional minor cleanups):

   (1) Removal of the bd_super member from struct block_device.

       This was a very odd back pointer to struct super_block with
       unclear rules. For all relevant places we have other means to get
       the same information so just get rid of this.

   (2) Simplify rules for superblock cleanup.

       Roughly, everything that is allocated during fs_context
       initialization and that's stored in fs_context->s_fs_info needs
       to be cleaned up by the fs_context->free() implementation before
       the superblock allocation function has been called successfully.

       After sget_fc() returned fs_context->s_fs_info has been
       transferred to sb->s_fs_info at which point sb->kill_sb() if
       fully responsible for cleanup. Adhering to these rules means that
       cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
       brittle and inconsistent.

       Cleanup shouldn't be duplicated between sb->put_super() as
       sb->put_super() is only called if sb->s_root has been set aka
       when the filesystem has been successfully born (SB_BORN). That
       complexity should be avoided.

       This also means that block devices are to be closed in
       sb->kill_sb() instead of sb->put_super(). More details in the
       lower section.

   (3) Make it possible to lookup or create a superblock before opening
       block devices

       There's a subtle dependency on (2) as some filesystems did rely
       on fill_super() to be called in order to correctly clean up
       sb->s_fs_info. All these filesystems have been fixed.

   (4) Switch most filesystem to follow the same logic as the generic
       mount code now does as outlined in (3).

   (5) Use the superblock as the holder of the block device. We can now
       easily go back from block device to owning superblock.

   (6) Export and extend the generic fs_holder_ops and use them as
       holder ops everywhere and remove the filesystem specific holder
       ops.

   (7) Call from the block layer up into the filesystem layer when the
       block device is removed, allowing to shut down the filesystem
       without risk of deadlocks.

   (8) Get rid of get_super().

       We can now easily go back from the block device to owning
       superblock and can call up from the block layer into the
       filesystem layer when the device is removed. So no need to wade
       through all registered superblock to find the owning superblock
       anymore"

Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/

* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
  super: use higher-level helper for {freeze,thaw}
  super: wait until we passed kill super
  super: wait for nascent superblocks
  super: make locking naming consistent
  super: use locking helpers
  fs: simplify invalidate_inodes
  fs: remove get_super
  block: call into the file system for ioctl BLKFLSBUF
  block: call into the file system for bdev_mark_dead
  block: consolidate __invalidate_device and fsync_bdev
  block: drop the "busy inodes on changed media" log message
  dasd: also call __invalidate_device when setting the device offline
  amiflop: don't call fsync_bdev in FDFMTBEG
  floppy: call disk_force_media_change when changing the format
  block: simplify the disk_force_media_change interface
  nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
  xfs use fs_holder_ops for the log and RT devices
  xfs: drop s_umount over opening the log and RT devices
  ext4: use fs_holder_ops for the log device
  ext4: drop s_umount over opening the log device
  ...
2023-08-28 11:04:18 -07:00
Linus Torvalds
de16588a77 v6.6-vfs.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTxQAKCRCRxhvAZXjc
 okaVAP94WAlItvDRt/z2Wtzf0+RqPZeTXEdGTxua8+RxqCyYIQD+OO5nRfKQPHlV
 AqqGJMKItQMSMIYgB5ftqVhNWZfnHgM=
 =pSEW
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual miscellaneous features, cleanups, and fixes
  for vfs and individual filesystems.

  Features:

   - Block mode changes on symlinks and rectify our broken semantics

   - Report file modifications via fsnotify() for splice

   - Allow specifying an explicit timeout for the "rootwait" kernel
     command line option. This allows to timeout and reboot instead of
     always waiting indefinitely for the root device to show up

   - Use synchronous fput for the close system call

  Cleanups:

   - Get rid of open-coded lockdep workarounds for async io submitters
     and replace it all with a single consolidated helper

   - Simplify epoll allocation helper

   - Convert simple_write_begin and simple_write_end to use a folio

   - Convert page_cache_pipe_buf_confirm() to use a folio

   - Simplify __range_close to avoid pointless locking

   - Disable per-cpu buffer head cache for isolated cpus

   - Port ecryptfs to kmap_local_page() api

   - Remove redundant initialization of pointer buf in pipe code

   - Unexport the d_genocide() function which is only used within core
     vfs

   - Replace printk(KERN_ERR) and WARN_ON() with WARN()

  Fixes:

   - Fix various kernel-doc issues

   - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE

   - Fix a mainly theoretical issue in devpts

   - Check the return value of __getblk() in reiserfs

   - Fix a racy assert in i_readcount_dec

   - Fix integer conversion issues in various functions

   - Fix LSM security context handling during automounts that prevented
     NFS superblock sharing"

* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
  cachefiles: use kiocb_{start,end}_write() helpers
  ovl: use kiocb_{start,end}_write() helpers
  aio: use kiocb_{start,end}_write() helpers
  io_uring: use kiocb_{start,end}_write() helpers
  fs: create kiocb_{start,end}_write() helpers
  fs: add kerneldoc to file_{start,end}_write() helpers
  io_uring: rename kiocb_end_write() local helper
  splice: Convert page_cache_pipe_buf_confirm() to use a folio
  libfs: Convert simple_write_begin and simple_write_end to use a folio
  fs/dcache: Replace printk and WARN_ON by WARN
  fs/pipe: remove redundant initialization of pointer buf
  fs: Fix kernel-doc warnings
  devpts: Fix kernel-doc warnings
  doc: idmappings: fix an error and rephrase a paragraph
  init: Add support for rootwait timeout parameter
  vfs: fix up the assert in i_readcount_dec
  fs: Fix one kernel-doc comment
  docs: filesystems: idmappings: clarify from where idmappings are taken
  fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
  vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
  ...
2023-08-28 10:17:14 -07:00
Linus Torvalds
ecd7db2047 v6.6-vfs.tmpfs
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTkgAKCRCRxhvAZXjc
 ouZsAPwNBHB2aPKtzWURuKx5RX02vXTzHX+A/LpuDz5WBFe8zQD+NlaBa4j0MBtS
 rVYM+CjOXnjnsLc8W0euMnfYNvViKgQ=
 =L2+2
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull libfs and tmpfs updates from Christian Brauner:
 "This cycle saw a lot of work for tmpfs that required changes to the
  vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this
  cycle. Things will go back to mm next cycle.

  Features
  ========

   - By far the biggest work is the quota support for tmpfs. New tmpfs
     quota infrastructure is added to support it and a new QFMT_SHMEM
     uapi option is exposed.

     This offers user and group quotas to tmpfs (project quotas will be
     added later). Similar to other filesystems tmpfs quota are not
     supported within user namespaces yet.

   - Add support for user xattrs. While tmpfs already supports security
     xattrs (security.*) and POSIX ACLs for a long time it lacked
     support for user xattrs (user.*). With this pull request tmpfs will
     be able to support a limited number of user xattrs.

     This is accompanied by a fix (see below) to limit persistent simple
     xattr allocations.

   - Add support for stable directory offsets. Currently tmpfs relies on
     the libfs provided cursor-based mechanism for readdir. This causes
     issues when a tmpfs filesystem is exported via NFS.

     NFS clients do not open directories. Instead, each server-side
     readdir operation opens the directory, reads it, and then closes
     it. Since the cursor state for that directory is associated with
     the opened file it is discarded after each readdir operation. Such
     directory offsets are not just cached by NFS clients but also
     various userspace libraries based on these clients.

     As it stands there is no way to invalidate the caches when
     directory offsets have changed and the whole application depends on
     unchanging directory offsets.

     At LSFMM we discussed how to solve this problem and decided to
     support stable directory offsets. libfs now allows filesystems like
     tmpfs to use an xarrary to map a directory offset to a dentry. This
     mechanism is currently only used by tmpfs but can be supported by
     others as well.

  Fixes
  =====

   - Change persistent simple xattrs allocations in libfs from
     GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory
     cgroup limits. Since this is a change to libfs it affects both
     tmpfs and kernfs.

   - Correctly verify {g,u}id mount options.

     A new filesystem context is created via fsopen() which records the
     namespace that becomes the owning namespace of the superblock when
     fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are
     mountable in namespaces. However, fsconfig() calls can occur in a
     namespace different from the namespace where fsopen() has been
     called.

     Currently, when fsconfig() is called to set {g,u}id mount options
     the requested {g,u}id is mapped into a k{g,u}id according to the
     namespace where fsconfig() was called from. The resulting k{g,u}id
     is not guaranteed to be resolvable in the namespace of the
     filesystem (the one that fsopen() was called in).

     This means it's possible for an unprivileged user to create files
     owned by any group in a tmpfs mount since it's possible to set the
     setid bits on the tmpfs directory.

     The contract for {g,u}id mount options and {g,u}id values in
     general set from userspace has always been that they are translated
     according to the caller's idmapping. In so far, tmpfs has been
     doing the correct thing. But since tmpfs is mountable in
     unprivileged contexts it is also necessary to verify that the
     resulting {k,g}uid is representable in the namespace of the
     superblock to avoid such bugs.

     The new mount api's cross-namespace delegation abilities are
     already widely used. Having talked to a bunch of userspace this is
     the most faithful solution with minimal regression risks"

* tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs
  mm: invalidation check mapping before folio_contains
  tmpfs: trivial support for direct IO
  tmpfs,xattr: enable limited user extended attributes
  tmpfs: track free_ispace instead of free_inodes
  xattr: simple_xattr_set() return old_xattr to be freed
  tmpfs: verify {g,u}id mount options correctly
  shmem: move spinlock into shmem_recalc_inode() to fix quota support
  libfs: Remove parent dentry locking in offset_iterate_dir()
  libfs: Add a lock class for the offset map's xa_lock
  shmem: stable directory offsets
  shmem: Refactor shmem_symlink()
  libfs: Add directory operations for stable offsets
  shmem: fix quota lock nesting in huge hole handling
  shmem: Add default quota limit mount options
  shmem: quota support
  shmem: prepare shmem quota infrastructure
  quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks
  shmem: make shmem_get_inode() return ERR_PTR instead of NULL
  shmem: make shmem_inode_acct_block() return error
2023-08-28 09:55:25 -07:00
Matthew Wilcox (Oracle)
40d49a3c9e mm: allow ->huge_fault() to be called without the mmap_lock held
Remove the checks for the VMA lock being held, allowing the page fault
path to call into the filesystem instead of retrying with the mmap_lock
held.  This will improve scalability for DAX page faults.  Also update the
documentation to match (and fix some other changes that have happened
recently).

Link: https://lkml.kernel.org/r/20230818202335.2739663-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:29 -07:00
Yin Fengwei
3bd786f76d mm: convert do_set_pte() to set_pte_range()
set_pte_range() allows to setup page table entries for a specific
range.  It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache_range().

Link: https://lkml.kernel.org/r/20230802151406.3735276-37-willy@infradead.org
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:26 -07:00
Luís Henriques
230bd8b98d ceph: update documentation regarding snapshot naming limitations
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-08-24 11:24:36 +02:00
Bjorn Helgaas
d56b699d76 Documentation: Fix typos
Fix typos in Documentation.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-08-18 11:29:03 -06:00
Jonathan Corbet
99b319d30a docs: vfs: clean up after the iterate() removal
Commit 3e32715496 ("vfs: get rid of old '->iterate' directory operation")
removed the iterate() file_operations member, but neglected to clean up the
associated documentation.  Get rid of the leftovers.

Link: https://lore.kernel.org/r/874jl945bv.fsf@meer.lwn.net
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-08-18 11:03:52 -06:00
GONG, Ruiqi
b93ec212bc doc: idmappings: fix an error and rephrase a paragraph
If the modified paragraph is referring to the idmapping mentioned in the
previous paragraph (i.e. `u0:k10000:r10000`), then it is `u0` that the
upper idmapset starts with, not `u1000`.

Fix this error and rephrase this paragraph a bit to make this reference
more explicit.

Reported-by: Wang Lei <wanglei249@huawei.com>
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Message-Id: <20230816033210.914262-1-gongruiqi@huaweicloud.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-16 09:35:11 +02:00
Alexander Mikhalitsyn
d220efa20b docs: filesystems: idmappings: clarify from where idmappings are taken
Let's clarify from where we take idmapping of each type:
- caller
- filesystem
- mount

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Message-Id: <20230625182047.26854-1-aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-15 08:32:45 +02:00
Amir Goldstein
cbb44f0935 ovl: auto generate uuid for new overlay filesystems
Add a new mount option uuid=auto, which is the default.

If a persistent UUID xattr is found it is used.

Otherwise, an existing ovelrayfs with copied up subdirs in upper dir
that was never mounted with uuid=on retains the null UUID.

A new overlayfs with no copied up subdirs, generates the persistent UUID
on first mount.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:50 +03:00
Amir Goldstein
d9544c1b0d ovl: store persistent uuid/fsid with uuid=on
With uuid=on, store a persistent uuid in xattr on the upper dir to
give the overlayfs instance a persistent identifier.

This also makes f_fsid persistent and more reliable for reporting
fid info in fanotify events.

uuid=on is not supported on non-upper overlayfs or with upper fs
that does not support xattrs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:50 +03:00
Amir Goldstein
b0504bfe1b ovl: add support for unique fsid per instance
The legacy behavior of ovl_statfs() reports the f_fsid filled by
underlying upper fs. This fsid is not unique among overlayfs instances
on the same upper fs.

With mount option uuid=on, generate a non-persistent uuid per overlayfs
instance and use it as the seed for f_fsid, similar to tmpfs.

This is useful for reporting fanotify events with fid info from different
instances of overlayfs over the same upper fs.

The old behavior of null uuid and upper fs fsid is retained with the
mount option uuid=null, which is the default.

The mount option uuid=off that disables uuid checks in underlying layers
also retains the legacy behavior.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:50 +03:00
Alexander Larsson
ae8cba4033 ovl: Add framework for verity support
This adds the scaffolding (docs, config, mount options) for supporting
the new digest field in the metacopy xattr. This contains a fs-verity
digest that need to match the fs-verity digest of the lowerdata
file. The mount option "verity" specifies how this xattr is handled.

If you enable verity ("verity=on") all existing xattrs are validated
before use, and during metacopy we generate verity xattr in the upper
metacopy file (if the source file has verity enabled). This means
later accesses can guarantee that the same data is used.

Additionally you can use "verity=require". In this mode all metacopy
files must have a valid verity xattr. For this to work metadata
copy-up must be able to create a verity xattr (so that later accesses
are validated). Therefore, in this mode, if the lower data file
doesn't have fs-verity enabled we fall back to a full copy rather than
a metacopy.

Actual implementation follows in a separate commit.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-08-12 19:02:38 +03:00
Darrick J. Wong
19e13b0a6d docs: add maintainer entry profile for XFS
Create a new document to list what I think are (within the scope of XFS)
our shared goals and community roles.  Since I will be stepping down
shortly, I feel it's important to write down somewhere all the hats that
I have been wearing for the past six years.

Also, document important extra details about how to contribute to XFS.

Cc: corbet@lwn.net
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2023-08-10 07:47:53 -07:00
Hugh Dickins
2daf18a788
tmpfs,xattr: enable limited user extended attributes
Enable "user." extended attributes on tmpfs, limiting them by tracking
the space they occupy, and deducting that space from the limited ispace
(unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited).

tmpfs inodes and simple xattrs are both unswappable, and have to be in
lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate
for xattrs, without any need for a further mount option.

Add simple_xattr_space() to give approximate but deterministic estimate
of the space taken up by each xattr: with simple_xattrs_free() outputting
the space freed if required (but kernfs and even some tmpfs usages do not
require that, so don't waste time on strlen'ing if not needed).

Security and trusted xattrs were already supported: for consistency and
simplicity, account them from the same pool; though there's a small risk
that a tmpfs with enough space before would now be considered too small.

When extended attributes are used, "df -i" does show more IUsed and less
IFree than can be explained by the inodes: document that (manpage later).

xfstests tests/generic which were not run on tmpfs before but now pass:
020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728
with no new failures.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Message-Id: <2e63b26e-df46-5baa-c7d6-f9a8dd3282c5@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 12:06:04 +02:00
Chuck Lever
6faddda69f libfs: Add directory operations for stable offsets
Create a vector of directory operations in fs/libfs.c that handles
directory seeks and readdir via stable offsets instead of the
current cursor-based mechanism.

For the moment these are unused.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Message-Id: <168814732984.530310.11190772066786107220.stgit@manet.1015granger.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 09:15:40 +02:00
Lukas Czerner
de4c0e7ca8 shmem: Add default quota limit mount options
Allow system administrator to set default global quota limits at tmpfs
mount time.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230725144510.253763-7-cem@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 09:15:40 +02:00
Carlos Maiolino
e09764cff4 shmem: quota support
Now the basic infra-structure is in place, enable quota support for tmpfs.

This offers user and group quotas to tmpfs (project quotas will be added
later). Also, as other filesystems, the tmpfs quota is not supported
within user namespaces yet, so idmapping is not translated.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230725144510.253763-6-cem@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 09:15:39 +02:00
Linus Torvalds
3e32715496
vfs: get rid of old '->iterate' directory operation
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.

Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.

This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.

The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.

Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06 15:08:35 +02:00
Hugh Dickins
253e5df8b8 tmpfs: fix Documentation of noswap and huge mount options
The noswap mount option is surely not one of the three options for sizing:
move its description down.

The huge= mount option does not accept numeric values: those are just in
an internal enum.  Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).

/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).

[rdunlap@infradead.org: fixup Docs table for huge mount options]
  Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442 ("shmem: update documentation")
Fixes: 2c6efe9cf2 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> 
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27 13:07:03 -07:00
Matthew Wilcox (Oracle)
32b29cc9db doc: Correct the description of ->release_folio
The filesystem ->release_folio method is called under more circumstances
now than when the documentation was written.  The second sentence
describing the interpretation of the return value is the wrong polarity
(false indicates failure, not success).  And the third sentence is also
wrong (the kernel calls try_to_free_buffers() instead).

So replace the entire paragraph with a detailed description of what the
state of the folio may be, the meaning of the gfp parameter, why the
method is being called and what the filesystem is expected to do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-07-24 18:04:30 -04:00
Lukas Bulwahn
626c820526 afs: Documentation: correct reference to CONFIG_AFS_FS
Commit 0795e7c031 ("[AFS]: Update the AFS fs documentation.") adds a new
section listing the build configuration options that need to be enabled for
the AFS file system.

The documentation refers to CONFIG_AFS, but the option is called
CONFIG_AFS_FS, since the beginning of Linux's git history.

Refer to the config option with the correct name.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230720094301.9888-1-lukas.bulwahn@gmail.com
2023-07-21 13:46:02 -06:00
Darrick J. Wong
880b957785 fs: distinguish between user initiated freeze and kernel initiated freeze
Userspace can freeze a filesystem using the FIFREEZE ioctl or by
suspending the block device; this state persists until userspace thaws
the filesystem with the FITHAW ioctl or resuming the block device.
Since commit 18e9e5104f ("Introduce freeze_super and thaw_super for
the fsfreeze ioctl") we only allow the first freeze command to succeed.

The kernel may decide that it is necessary to freeze a filesystem for
its own internal purposes, such as suspends in progress, filesystem fsck
activities, or quiescing a device prior to removal.  Userspace thaw
commands must never break a kernel freeze, and kernel thaw commands
shouldn't undo userspace's freeze command.

Introduce a couple of freeze holder flags and wire it into the
sb_writers state.  One kernel and one userspace freeze are allowed to
coexist at the same time; the filesystem will not thaw until both are
lifted.

I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but
for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing
behaviors.

Cc: mcgrof@kernel.org
Cc: jack@suse.cz
Cc: hch@infradead.org
Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2023-07-17 09:00:09 -07:00
Eric Biggers
324718dddd fscrypt: improve the "Encryption modes and usage" section
As the number of supported encryption modes has grown, the part of the
"Encryption modes and usage" section that describes the supported
encryption modes has gotten a bit messy.  It presents useful
information, but it's a bit lacking in high-level context.

Rework the section to hopefully be much more useful.

Link: https://lore.kernel.org/r/20230630064811.22569-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-11 22:56:24 -07:00
Yu-cheng Yu
54007f8182 mm: Introduce VM_SHADOW_STACK for shadow stack memory
New hardware extensions implement support for shadow stack memory, such
as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to
identify these areas, for example, to be used to properly indicate shadow
stack PTEs to the hardware.

Shadow stack VMA creation will be tightly controlled and limited to
anonymous memory to make the implementation simpler and since that is all
that is required. The solution will rely on pte_mkwrite() to create the
shadow stack PTEs, so it will not be required for vm_get_page_prot() to
learn how to create shadow stack memory. For this reason document that
VM_SHADOW_STACK should not be mixed with VM_SHARED.

Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/all/20230613001108.3040476-15-rick.p.edgecombe%40intel.com
2023-07-11 14:12:19 -07:00
Linus Torvalds
946c6b59c5 16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZKmgXAAKCRDdBJ7gKXxA
 joqDAP0V520Jy0cyJrRMvaQRFMqtVeDOdTpAue7ZOQHSi/LZnAD9EEAxDpYF/V4x
 PO27ixXQ4Glm2iYgH7bDX7J73WiA3wg=
 =JsYW
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "16 hotfixes. Six are cc:stable and the remainder address post-6.4
  issues"

The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.

* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib: dhry: fix sleeping allocations inside non-preemptable section
  kasan, slub: fix HW_TAGS zeroing with slub_debug
  kasan: fix type cast in memory_is_poisoned_n
  mailmap: add entries for Heiko Stuebner
  mailmap: update manpage link
  bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
  MAINTAINERS: add linux-next info
  mailmap: add Markus Schneider-Pargmann
  writeback: account the number of pages written back
  mm: call arch_swap_restore() from do_swap_page()
  squashfs: fix cache race with migration
  mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
  docs: update ocfs2-devel mailing list address
  MAINTAINERS: update ocfs2-devel mailing list address
  mm: disable CONFIG_PER_VMA_LOCK until its fixed
  fork: lock VMAs of the parent process when forking
2023-07-08 14:30:25 -07:00
Anthony Iliopoulos
5a569db68c docs: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update all related documentation pointers to reflect the
change.

Link: https://lkml.kernel.org/r/20230628013437.47030-3-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Linus Torvalds
73a3fcdaa7 f2fs update for 6.5-rc1
In this cycle, we've mainly investigated the zoned block device support along
 with patches such as correcting write pointers between f2fs and storage, adding
 asynchronous zone reset flow, and managing the number of open zones. Other than
 them, f2fs adds another mount option, "errors=x" to specify how to handle when
 it detects an unexpected behavior at runtime.
 
 Enhancement:
  - support errors=remount-ro|continue|panic mountoption
  - enforce some inode flag policies
  - allow .tmp compression given extensions
  - add some ioctls to manage the f2fs compression
  - improve looped node chain flow
  - avoid issuing small-sized discard commands during checkpoint
  - implement an asynchronous zone reset
 
 Bug fix:
  - fix deadlock in xattr and inode page lock
  - fix and add sanity check in some error paths
  - fix to avoid NULL pointer dereference f2fs_write_end_io() along with put_super
  - set proper flags to quota files
  - fix potential deadlock due to unpaired node_write lock use
  - fix over-estimating free section during FG GC
  - fix the wrong condition to determine atomic context
 
 As usual, also there are a number of patches having code refactoring and minor
 clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmSlracACgkQQBSofoJI
 UNJETA//RhFkVGbKlZ7cAuB6xwaLPmRi+aUSn3/rBbdq6CjBOslClEYwW981Fe19
 q6VSIX7Zt5RKZg2+meHLhE28rGknLaZ4wFC/F4BDjoykN119g+KXkypjf+OVbatK
 zLImOzXVKjAWJevpYCENcUQY/xI2kNtg3csp6kq9GQoZCD2US/wzzI0QF6xCQ1q3
 WZpHHy2fi6Lry2xGEbDLKlxg9e5nDqhOSp6S/taF+w+RyUMlTcoIU2fIWT0iZ2kI
 taFe+PWHHkJg2oJcgZ+hx5ZACGb1BWdHg8n1N0MK/IiNbA6CWe+iSM9T0TtctM5V
 Gkz4233bFR976O27Bu2znqil+AH3ECZ/F2HbRbh5vTFJIhUMwNQ1GJTgDqE5Vf8h
 R4W/ejbBSLM/g4TK5boyWrrKpAAVmhFhSj+OBboIlFFKchP4RKaqL7Az+m0tGSD3
 0uCVAFtzmctBmgJ9ko+dxwwFwbbGiG6MDYeGIODMdqFQMQrcGqPoOrcWbj2hPoaW
 LRz/OzA8N0fQUfvCH6E31Ypd6cBFrG++FgtFA4lsv10KNsJUOx4MbEWxF5HobV3t
 axpcRsOOb95hAmqenY3sWyiR+IcIEAYlzx6D6QEbNPe47fuLxr9jx52Ollw6RDfj
 RvkaxzAIDeESMWWNftzJ8r8Wt6RzoBv5JjBkFIMbCz28V3v9R44=
 =TAG3
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've mainly investigated the zoned block device
  support along with patches such as correcting write pointers between
  f2fs and storage, adding asynchronous zone reset flow, and managing
  the number of open zones.

  Other than them, f2fs adds another mount option, "errors=x" to specify
  how to handle when it detects an unexpected behavior at runtime.

  Enhancements:
   - support 'errors=remount-ro|continue|panic' mount option
   - enforce some inode flag policies
   - allow .tmp compression given extensions
   - add some ioctls to manage the f2fs compression
   - improve looped node chain flow
   - avoid issuing small-sized discard commands during checkpoint
   - implement an asynchronous zone reset

  Bug fixes:
   - fix deadlock in xattr and inode page lock
   - fix and add sanity check in some error paths
   - fix to avoid NULL pointer dereference f2fs_write_end_io() along
     with put_super
   - set proper flags to quota files
   - fix potential deadlock due to unpaired node_write lock use
   - fix over-estimating free section during FG GC
   - fix the wrong condition to determine atomic context

  As usual, also there are a number of patches with code refactoring and
  minor clean-ups"

* tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: fix to do sanity check on direct node in truncate_dnode()
  f2fs: only set release for file that has compressed data
  f2fs: fix compile warning in f2fs_destroy_node_manager()
  f2fs: fix error path handling in truncate_dnode()
  f2fs: fix deadlock in i_xattr_sem and inode page lock
  f2fs: remove unneeded page uptodate check/set
  f2fs: update mtime and ctime in move file range method
  f2fs: compress tmp files given extension
  f2fs: refactor struct f2fs_attr macro
  f2fs: convert to use sbi directly
  f2fs: remove redundant assignment to variable err
  f2fs: do not issue small discard commands during checkpoint
  f2fs: check zone write pointer points to the end of zone
  f2fs: add f2fs_ioc_get_compress_blocks
  f2fs: cleanup MIN_INLINE_XATTR_SIZE
  f2fs: add helper to check compression level
  f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  f2fs: do more sanity check on inode
  f2fs: compress: fix to check validity of i_compress_flag field
  f2fs: add sanity compress level check for compressed file
  ...
2023-07-05 14:14:37 -07:00
Linus Torvalds
18c9901d74 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmScTY8ACgkQnJ2qBz9k
 QNnfwwgAhZtow1klDLH6qCnWMufB6AwT7VfAHyA3fvyTYMjUKb0sGHkpuh8hqVOb
 Lzb4YB+jSWV8XnMFn/4gFJQU/nAv8bMPavghMGpr5VNjQi7WkxYF/GB6O1I5NOHK
 EnJjDExgdxXDJZORaaXLVJWrtzJuDFgdiSeIwJECFa0MdTHNgPy3XOl+PPxnYQ/V
 xyHyP5ImGgd5O4iy3PFDQBGgOXIMrBX8IMce+qLQNYIvjSIUgmdnIkoUCvsQiisp
 LyKI2LxqAqnpA4h4Ow6hOZDw2VlPT0vDwFVUfFIZMIqs5YgaSbWa1Z6cs37MigAn
 fgUyRVx2y8A2Lwla7rwLaUEToRVADw==
 =ZdcG
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:

 - Support for fanotify events returning file handles for filesystems
   not exportable via NFS

 - Improved error handling exportfs functions

 - Add missing FS_OPEN events when unusual open helpers are used

* tag 'fsnotify_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: move fsnotify_open() hook into do_dentry_open()
  exportfs: check for error return value from exportfs_encode_*()
  fanotify: support reporting non-decodeable file handles
  exportfs: allow exporting non-decodeable file handles to userspace
  exportfs: add explicit flag to request non-decodeable file handles
  exportfs: change connectable argument to bit flags
2023-06-29 13:31:44 -07:00
Linus Torvalds
53ea167b21 Various cleanups and bug fixes in ext4's extent status tree,
journalling, and block allocator subsystems.  Also improve performance
 for parallel DIO overwrites.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmSaIWAACgkQ8vlZVpUN
 gaODEAf9GLk68DvU9iOhgJ1p/lMIqtbY0vvB1aeiQg7Z99mk/Vc//R5qQvtO2oN5
 9G4OMSGKoUO0x9OlvDIw6za1BsE1pGHyBLmei7PO1JpHop6b6hKj+WQVPWb43v15
 TI0vIkWzwJI2eIxsTqvpMkgwZ3aNL9c52xFyjwk/6lAsw4y2wxEls/NZhhE2tAXF
 w/RFmI9RC/AZy1JX3VeruzeiSvAq+JAnsW8iNIoN5nBvWU7yXLA3b4mcoWWrCQ5E
 sKqOkhTeobhYsAie6dxGhri/JrL1HwPOpJ8SWWmrlLWXoMVx1rXxW3OnxIAEl9sz
 05n7Z+6LvI6aEk+rnjCqt4Z1cpIIEA==
 =cAq/
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Various cleanups and bug fixes in ext4's extent status tree,
  journalling, and block allocator subsystems.

  Also improve performance for parallel DIO overwrites"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (55 commits)
  ext4: avoid updating the superblock on a r/o mount if not needed
  jbd2: skip reading super block if it has been verified
  ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
  ext4: refactoring to use the unified helper ext4_quotas_off()
  ext4: turn quotas off if mount failed after enabling quotas
  ext4: update doc about journal superblock description
  ext4: add journal cycled recording support
  jbd2: continue to record log between each mount
  jbd2: remove j_format_version
  jbd2: factor out journal initialization from journal_get_superblock()
  jbd2: switch to check format version in superblock directly
  jbd2: remove unused feature macros
  ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev'
  ext4: Fix reusing stale buffer heads from last failed mounting
  ext4: allow concurrent unaligned dio overwrites
  ext4: clean up mballoc criteria comments
  ext4: make ext4_zeroout_es() return void
  ext4: make ext4_es_insert_extent() return void
  ext4: make ext4_es_insert_delayed_block() return void
  ext4: make ext4_es_remove_extent() return void
  ...
2023-06-29 13:18:36 -07:00
Linus Torvalds
be3c213150 overlayfs update for 6.5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmSTFh8ACgkQEVvVyTe/
 1Wpy5g/+NTL00ptBHHnEZJG7rRu+PzvjB+gzI8YpMT4kiGkJBNKZIxvgURvisDJ/
 coOXySq0yk0y901r2Rb+hP8YSYcSo8nrLV+jJwlZcmFsudtbSbOdUUJiAUjqGZk2
 YxiCl9NrAZJMVQj6AfL1LE0L00NmIxXruGiCXWEnbHjYvmy/LIsLOg9b+vlkPPZM
 xqxQq//hVZel5Lh6rmz9svJIqFF/e3zobX7NBV0MiFGHX926u2TV0ImAnjmhuxdV
 mZ8zKCQ4kGTU++PjOCyKPS+xF9eg6mlCVW9uq/QIN83muhwaUlOimJEBunbNxBQz
 if/OWxSMZv5UFXqYbF6YsHQrWupCV2uEcs8ypcGNsPX3AeX9i8JTGdVWMPax+4HO
 IMZ7Ltxf8Udihsn4s9n0ZMgSDbMTcPRFMNrHXtWU6FuEEjbP5SuD1KREJrh58myL
 I99/0tOICAx3ZRueZNmqz4BJ9zev+86PtPNDqHbZy2qrwBn7C6ly3AFWCCv32mza
 ehINgUv/TtzUBKqkbhznrjinrv5upV3CV9xg4L8ohhuVeoa8h3kGf8H5XSfnhY56
 x1DNWAZK+yZpLf/I39hrWwSME0T1D3GEuI2NlPmAKpqZSgwYdhSAcfebfQhmwD3Q
 B3qY3Dh74dsN9zW9oygcXXtHHizzsomOc/qXkXvYP4MXkQi0+h0=
 =KPxq
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs update from Amir Goldstein:

 - fix two NULL pointer deref bugs (Zhihao Cheng)

 - add support for "data-only" lower layers destined to be used by
   composefs

 - port overlayfs to the new mount api (Christian Brauner)

* tag 'ovl-update-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: (26 commits)
  ovl: add Amir as co-maintainer
  ovl: reserve ability to reconfigure mount options with new mount api
  ovl: modify layer parameter parsing
  ovl: port to new mount api
  ovl: factor out ovl_parse_options() helper
  ovl: store enum redirect_mode in config instead of a string
  ovl: pass ovl_fs to xino helpers
  ovl: clarify ovl_get_root() semantics
  ovl: negate the ofs->share_whiteout boolean
  ovl: check type and offset of struct vfsmount in ovl_entry
  ovl: implement lazy lookup of lowerdata in data-only layers
  ovl: prepare for lazy lookup of lowerdata inode
  ovl: prepare to store lowerdata redirect for lazy lowerdata lookup
  ovl: implement lookup in data-only layers
  ovl: introduce data-only lower layers
  ovl: remove unneeded goto instructions
  ovl: deduplicate lowerdata and lowerstack[]
  ovl: deduplicate lowerpath and lowerstack[]
  ovl: move ovl_entry into ovl_inode
  ovl: factor out ovl_free_entry() and ovl_stack_*() helpers
  ...
2023-06-29 13:01:27 -07:00
Linus Torvalds
3a8a670eee Networking changes for 6.5.
Core
 ----
 
  - Rework the sendpage & splice implementations. Instead of feeding
    data into sockets page by page extend sendmsg handlers to support
    taking a reference on the data, controlled by a new flag called
    MSG_SPLICE_PAGES. Rework the handling of unexpected-end-of-file
    to invoke an additional callback instead of trying to predict what
    the right combination of MORE/NOTLAST flags is.
    Remove the MSG_SENDPAGE_NOTLAST flag completely.
 
  - Implement SCM_PIDFD, a new type of CMSG type analogous to
    SCM_CREDENTIALS, but it contains pidfd instead of plain pid.
 
  - Enable socket busy polling with CONFIG_RT.
 
  - Improve reliability and efficiency of reporting for ref_tracker.
 
  - Auto-generate a user space C library for various Netlink families.
 
 Protocols
 ---------
 
  - Allow TCP to shrink the advertised window when necessary, prevent
    sk_rcvbuf auto-tuning from growing the window all the way up to
    tcp_rmem[2].
 
  - Use per-VMA locking for "page-flipping" TCP receive zerocopy.
 
  - Prepare TCP for device-to-device data transfers, by making sure
    that payloads are always attached to skbs as page frags.
 
  - Make the backoff time for the first N TCP SYN retransmissions
    linear. Exponential backoff is unnecessarily conservative.
 
  - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO).
 
  - Avoid waking up applications using TLS sockets until we have
    a full record.
 
  - Allow using kernel memory for protocol ioctl callbacks, paving
    the way to issuing ioctls over io_uring.
 
  - Add nolocalbypass option to VxLAN, forcing packets to be fully
    encapsulated even if they are destined for a local IP address.
 
  - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
    in-kernel ECMP implementation (e.g. Open vSwitch) select the same
    link for all packets. Support L4 symmetric hashing in Open vSwitch.
 
  - PPPoE: make number of hash bits configurable.
 
  - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
    (ipconfig).
 
  - Add layer 2 miss indication and filtering, allowing higher layers
    (e.g. ACL filters) to make forwarding decisions based on whether
    packet matched forwarding state in lower devices (bridge).
 
  - Support matching on Connectivity Fault Management (CFM) packets.
 
  - Hide the "link becomes ready" IPv6 messages by demoting their
    printk level to debug.
 
  - HSR: don't enable promiscuous mode if device offloads the proto.
 
  - Support active scanning in IEEE 802.15.4.
 
  - Continue work on Multi-Link Operation for WiFi 7.
 
 BPF
 ---
 
  - Add precision propagation for subprogs and callbacks. This allows
    maintaining verification efficiency when subprograms are used,
    or in fact passing the verifier at all for complex programs,
    especially those using open-coded iterators.
 
  - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
    assumed the length is always equal to the amount of written data.
    But some protos allow passing a NULL buffer to discover what
    the output buffer *should* be, without writing anything.
 
  - Accept dynptr memory as memory arguments passed to helpers.
 
  - Add routing table ID to bpf_fib_lookup BPF helper.
 
  - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands.
 
  - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
    maps as read-only).
 
  - Show target_{obj,btf}_id in tracing link fdinfo.
 
  - Addition of several new kfuncs (most of the names are self-explanatory):
    - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
      bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
      and bpf_dynptr_clone().
    - bpf_task_under_cgroup()
    - bpf_sock_destroy() - force closing sockets
    - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs
 
 Netfilter
 ---------
 
  - Relax set/map validation checks in nf_tables. Allow checking
    presence of an entry in a map without using the value.
 
  - Increase ip_vs_conn_tab_bits range for 64BIT builds.
 
  - Allow updating size of a set.
 
  - Improve NAT tuple selection when connection is closing.
 
 Driver API
 ----------
 
  - Integrate netdev with LED subsystem, to allow configuring HW
    "offloaded" blinking of LEDs based on link state and activity
    (i.e. packets coming in and out).
 
  - Support configuring rate selection pins of SFP modules.
 
  - Factor Clause 73 auto-negotiation code out of the drivers, provide
    common helper routines.
 
  - Add more fool-proof helpers for managing lifetime of MDIO devices
    associated with the PCS layer.
 
  - Allow drivers to report advanced statistics related to Time Aware
    scheduler offload (taprio).
 
  - Allow opting out of VF statistics in link dump, to allow more VFs
    to fit into the message.
 
  - Split devlink instance and devlink port operations.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Synopsys EMAC4 IP support (stmmac)
    - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
    - Marvell 88E6250 7 port switches
    - Microchip LAN8650/1 Rev.B0 PHYs
    - MediaTek MT7981/MT7988 built-in 1GE PHY driver
 
  - WiFi:
    - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
    - Realtek RTL8723DS (SDIO variant)
    - Realtek RTL8851BE
 
  - CAN:
    - Fintek F81604
 
 Drivers
 -------
 
  - Ethernet NICs:
    - Intel (100G, ice):
      - support dynamic interrupt allocation
      - use meta data match instead of VF MAC addr on slow-path
    - nVidia/Mellanox:
      - extend link aggregation to handle 4, rather than just 2 ports
      - spawn sub-functions without any features by default
    - OcteonTX2:
      - support HTB (Tx scheduling/QoS) offload
      - make RSS hash generation configurable
      - support selecting Rx queue using TC filters
    - Wangxun (ngbe/txgbe):
      - add basic Tx/Rx packet offloads
      - add phylink support (SFP/PCS control)
    - Freescale/NXP (enetc):
      - report TAPRIO packet statistics
    - Solarflare/AMD:
      - support matching on IP ToS and UDP source port of outer header
      - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
      - add devlink dev info support for EF10
 
  - Virtual NICs:
    - Microsoft vNIC:
      - size the Rx indirection table based on requested configuration
      - support VLAN tagging
    - Amazon vNIC:
      - try to reuse Rx buffers if not fully consumed, useful for ARM
        servers running with 16kB pages
    - Google vNIC:
      - support TCP segmentation of >64kB frames
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - enable USXGMII (88E6191X)
    - Microchip:
     - lan966x: add support for Egress Stage 0 ACL engine
     - lan966x: support mapping packet priority to internal switch
       priority (based on PCP or DSCP)
 
  - Ethernet PHYs:
    - Broadcom PHYs:
      - support for Wake-on-LAN for BCM54210E/B50212E
      - report LPI counter
    - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
    - Micrel PHYs: receive timestamp in the frame (LAN8841)
    - Realtek PHYs: support optional external PHY clock
    - Altera TSE PCS: merge the driver into Lynx PCS which it is
      a variant of
 
  - CAN: Kvaser PCIEcan:
    - support packet timestamping
 
  - WiFi:
    - Intel (iwlwifi):
      - major update for new firmware and Multi-Link Operation (MLO)
      - configuration rework to drop test devices and split
        the different families
      - support for segmented PNVM images and power tables
      - new vendor entries for PPAG (platform antenna gain) feature
    - Qualcomm 802.11ax (ath11k):
      - Multiple Basic Service Set Identifier (MBSSID) and
        Enhanced MBSSID Advertisement (EMA) support in AP mode
      - support factory test mode
    - RealTek (rtw89):
      - add RSSI based antenna diversity
      - support U-NII-4 channels on 5 GHz band
    - RealTek (rtl8xxxu):
      - AP mode support for 8188f
      - support USB RX aggregation for the newer chips
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmSbJM4ACgkQMUZtbf5S
 IrtoDhAAhEim1+LBIKf4lhPcVdZ2p/TkpnwTz5jsTwSeRBAxTwuNJ2fQhFXg13E3
 MnRq6QaEp8G4/tA/gynLvQop+FEZEnv+horP0zf/XLcC8euU7UrKdrpt/4xxdP07
 IL/fFWsoUGNO+L9LNaHwBo8g7nHvOkPscHEBHc2Xrvzab56TJk6vPySfLqcpKlNZ
 CHWDwTpgRqNZzSKiSpoMVd9OVMKUXcPYHpDmfEJ5l+e8vTXmZzOLHrSELHU5nP5f
 mHV7gxkDCTshoGcaed7UTiOvgu1p6E5EchDJxiLaSUbgsd8SZ3u4oXwRxgj33RK/
 fB2+UaLrRt/DdlHvT/Ph8e8Ygu77yIXMjT49jsfur/zVA0HEA2dFb7V6QlsYRmQp
 J25pnrdXmE15llgqsC0/UOW5J1laTjII+T2T70UOAqQl4LWYAQDG4WwsAqTzU0KY
 dueydDouTp9XC2WYrRUEQxJUzxaOaazskDUHc5c8oHp/zVBT+djdgtvVR9+gi6+7
 yy4elI77FlEEqL0ItdU/lSWINayAlPLsIHkMyhSGKX0XDpKjeycPqkNx4UterXB/
 JKIR5RBWllRft+igIngIkKX0tJGMU0whngiw7d1WLw25wgu4sB53hiWWoSba14hv
 tXMxwZs5iGaPcT38oRVMZz8I1kJM4Dz3SyI7twVvi4RUut64EG4=
 =9i4I
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking changes from Jakub Kicinski:
 "WiFi 7 and sendpage changes are the biggest pieces of work for this
  release. The latter will definitely require fixes but I think that we
  got it to a reasonable point.

  Core:

   - Rework the sendpage & splice implementations

     Instead of feeding data into sockets page by page extend sendmsg
     handlers to support taking a reference on the data, controlled by a
     new flag called MSG_SPLICE_PAGES

     Rework the handling of unexpected-end-of-file to invoke an
     additional callback instead of trying to predict what the right
     combination of MORE/NOTLAST flags is

     Remove the MSG_SENDPAGE_NOTLAST flag completely

   - Implement SCM_PIDFD, a new type of CMSG type analogous to
     SCM_CREDENTIALS, but it contains pidfd instead of plain pid

   - Enable socket busy polling with CONFIG_RT

   - Improve reliability and efficiency of reporting for ref_tracker

   - Auto-generate a user space C library for various Netlink families

  Protocols:

   - Allow TCP to shrink the advertised window when necessary, prevent
     sk_rcvbuf auto-tuning from growing the window all the way up to
     tcp_rmem[2]

   - Use per-VMA locking for "page-flipping" TCP receive zerocopy

   - Prepare TCP for device-to-device data transfers, by making sure
     that payloads are always attached to skbs as page frags

   - Make the backoff time for the first N TCP SYN retransmissions
     linear. Exponential backoff is unnecessarily conservative

   - Create a new MPTCP getsockopt to retrieve all info
     (MPTCP_FULL_INFO)

   - Avoid waking up applications using TLS sockets until we have a full
     record

   - Allow using kernel memory for protocol ioctl callbacks, paving the
     way to issuing ioctls over io_uring

   - Add nolocalbypass option to VxLAN, forcing packets to be fully
     encapsulated even if they are destined for a local IP address

   - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure
     in-kernel ECMP implementation (e.g. Open vSwitch) select the same
     link for all packets. Support L4 symmetric hashing in Open vSwitch

   - PPPoE: make number of hash bits configurable

   - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client
     (ipconfig)

   - Add layer 2 miss indication and filtering, allowing higher layers
     (e.g. ACL filters) to make forwarding decisions based on whether
     packet matched forwarding state in lower devices (bridge)

   - Support matching on Connectivity Fault Management (CFM) packets

   - Hide the "link becomes ready" IPv6 messages by demoting their
     printk level to debug

   - HSR: don't enable promiscuous mode if device offloads the proto

   - Support active scanning in IEEE 802.15.4

   - Continue work on Multi-Link Operation for WiFi 7

  BPF:

   - Add precision propagation for subprogs and callbacks. This allows
     maintaining verification efficiency when subprograms are used, or
     in fact passing the verifier at all for complex programs,
     especially those using open-coded iterators

   - Improve BPF's {g,s}setsockopt() length handling. Previously BPF
     assumed the length is always equal to the amount of written data.
     But some protos allow passing a NULL buffer to discover what the
     output buffer *should* be, without writing anything

   - Accept dynptr memory as memory arguments passed to helpers

   - Add routing table ID to bpf_fib_lookup BPF helper

   - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

   - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark
     maps as read-only)

   - Show target_{obj,btf}_id in tracing link fdinfo

   - Addition of several new kfuncs (most of the names are
     self-explanatory):
      - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(),
        bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size()
        and bpf_dynptr_clone().
      - bpf_task_under_cgroup()
      - bpf_sock_destroy() - force closing sockets
      - bpf_cpumask_first_and(), rework bpf_cpumask_any*() kfuncs

  Netfilter:

   - Relax set/map validation checks in nf_tables. Allow checking
     presence of an entry in a map without using the value

   - Increase ip_vs_conn_tab_bits range for 64BIT builds

   - Allow updating size of a set

   - Improve NAT tuple selection when connection is closing

  Driver API:

   - Integrate netdev with LED subsystem, to allow configuring HW
     "offloaded" blinking of LEDs based on link state and activity
     (i.e. packets coming in and out)

   - Support configuring rate selection pins of SFP modules

   - Factor Clause 73 auto-negotiation code out of the drivers, provide
     common helper routines

   - Add more fool-proof helpers for managing lifetime of MDIO devices
     associated with the PCS layer

   - Allow drivers to report advanced statistics related to Time Aware
     scheduler offload (taprio)

   - Allow opting out of VF statistics in link dump, to allow more VFs
     to fit into the message

   - Split devlink instance and devlink port operations

  New hardware / drivers:

   - Ethernet:
      - Synopsys EMAC4 IP support (stmmac)
      - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches
      - Marvell 88E6250 7 port switches
      - Microchip LAN8650/1 Rev.B0 PHYs
      - MediaTek MT7981/MT7988 built-in 1GE PHY driver

   - WiFi:
      - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps
      - Realtek RTL8723DS (SDIO variant)
      - Realtek RTL8851BE

   - CAN:
      - Fintek F81604

  Drivers:

   - Ethernet NICs:
      - Intel (100G, ice):
         - support dynamic interrupt allocation
         - use meta data match instead of VF MAC addr on slow-path
      - nVidia/Mellanox:
         - extend link aggregation to handle 4, rather than just 2 ports
         - spawn sub-functions without any features by default
      - OcteonTX2:
         - support HTB (Tx scheduling/QoS) offload
         - make RSS hash generation configurable
         - support selecting Rx queue using TC filters
      - Wangxun (ngbe/txgbe):
         - add basic Tx/Rx packet offloads
         - add phylink support (SFP/PCS control)
      - Freescale/NXP (enetc):
         - report TAPRIO packet statistics
      - Solarflare/AMD:
         - support matching on IP ToS and UDP source port of outer
           header
         - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6
         - add devlink dev info support for EF10

   - Virtual NICs:
      - Microsoft vNIC:
         - size the Rx indirection table based on requested
           configuration
         - support VLAN tagging
      - Amazon vNIC:
         - try to reuse Rx buffers if not fully consumed, useful for ARM
           servers running with 16kB pages
      - Google vNIC:
         - support TCP segmentation of >64kB frames

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - enable USXGMII (88E6191X)
      - Microchip:
         - lan966x: add support for Egress Stage 0 ACL engine
         - lan966x: support mapping packet priority to internal switch
           priority (based on PCP or DSCP)

   - Ethernet PHYs:
      - Broadcom PHYs:
         - support for Wake-on-LAN for BCM54210E/B50212E
         - report LPI counter
      - Microsemi PHYs: support RGMII delay configuration (VSC85xx)
      - Micrel PHYs: receive timestamp in the frame (LAN8841)
      - Realtek PHYs: support optional external PHY clock
      - Altera TSE PCS: merge the driver into Lynx PCS which it is a
        variant of

   - CAN: Kvaser PCIEcan:
      - support packet timestamping

   - WiFi:
      - Intel (iwlwifi):
         - major update for new firmware and Multi-Link Operation (MLO)
         - configuration rework to drop test devices and split the
           different families
         - support for segmented PNVM images and power tables
         - new vendor entries for PPAG (platform antenna gain) feature
      - Qualcomm 802.11ax (ath11k):
         - Multiple Basic Service Set Identifier (MBSSID) and Enhanced
           MBSSID Advertisement (EMA) support in AP mode
         - support factory test mode
      - RealTek (rtw89):
         - add RSSI based antenna diversity
         - support U-NII-4 channels on 5 GHz band
      - RealTek (rtl8xxxu):
         - AP mode support for 8188f
         - support USB RX aggregation for the newer chips"

* tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits)
  net: scm: introduce and use scm_recv_unix helper
  af_unix: Skip SCM_PIDFD if scm->pid is NULL.
  net: lan743x: Simplify comparison
  netlink: Add __sock_i_ino() for __netlink_diag_dump().
  net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses
  Revert "af_unix: Call scm_recv() only after scm_set_cred()."
  phylink: ReST-ify the phylink_pcs_neg_mode() kdoc
  libceph: Partially revert changes to support MSG_SPLICE_PAGES
  net: phy: mscc: fix packet loss due to RGMII delays
  net: mana: use vmalloc_array and vcalloc
  net: enetc: use vmalloc_array and vcalloc
  ionic: use vmalloc_array and vcalloc
  pds_core: use vmalloc_array and vcalloc
  gve: use vmalloc_array and vcalloc
  octeon_ep: use vmalloc_array and vcalloc
  net: usb: qmi_wwan: add u-blox 0x1312 composition
  perf trace: fix MSG_SPLICE_PAGES build error
  ipvlan: Fix return value of ipvlan_queue_xmit()
  netfilter: nf_tables: fix underflow in chain reference counter
  netfilter: nf_tables: unbind non-anonymous set if rule construction fails
  ...
2023-06-28 16:43:10 -07:00
Linus Torvalds
582c161cf3 hardening updates for v6.5-rc1
- Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)
 
 - Convert strreplace() to return string start (Andy Shevchenko)
 
 - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)
 
 - Add missing function prototypes seen with W=1 (Arnd Bergmann)
 
 - Fix strscpy() kerndoc typo (Arne Welzel)
 
 - Replace strlcpy() with strscpy() across many subsystems which were
   either Acked by respective maintainers or were trivial changes that
   went ignored for multiple weeks (Azeem Shaikh)
 
 - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)
 
 - Add KUnit tests for strcat()-family
 
 - Enable KUnit tests of FORTIFY wrappers under UML
 
 - Add more complete FORTIFY protections for strlcat()
 
 - Add missed disabling of FORTIFY for all arch purgatories.
 
 - Enable -fstrict-flex-arrays=3 globally
 
 - Tightening UBSAN_BOUNDS when using GCC
 
 - Improve checkpatch to check for strcpy, strncpy, and fake flex arrays
 
 - Improve use of const variables in FORTIFY
 
 - Add requested struct_size_t() helper for types not pointers
 
 - Add __counted_by macro for annotating flexible array size members
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSbftQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJj0MD/9X9jzJzCmsAU+yNldeoAzC84Sk
 GVU3RBxGcTNysL1gZXynkIgigw7DWc4htMGeSABHHwQRVP65JCH1Kw/VqIkyumbx
 9LdX6IklMJb4pRT4PVU3azebV4eNmSjlur2UxMeW54Czm91/6I8RHbJOyAPnOUmo
 2oomGdP/hpEHtKR7hgy8Axc6w5ySwQixh2V5sVZG3VbvCS5WKTmTXbs6puuRT5hz
 iHt7v+7VtEg/Qf1W7J2oxfoghvVBsaRrSLrExWT/oZYh1ZxM7DsCAAoG/IsDgHGA
 9LBXiRECgAFThbHVxLvvKZQMXdVk0i8iXLX43XMKC0wTA+NTyH7wlcQQ4RWNMuo8
 sfA9Qm9gMArXaf64aymr3Uwn20Zan0391HdlbhOJZAE6v3PPJbleUnM58AzD2d3r
 5Lz6AIFBxDImy+3f9iDWgacCT5/PkeiXTHzk9QnKhJyKKtRA58XJxj4q2+rPnGJP
 n4haXqoxD5FJbxdXiGKk31RS0U5HBug7wkOcUrTqDHUbc/QNU2b7dxTKUx+zYtCU
 uV5emPzpF4H4z+91WpO47n9gkMAfwV0lt9S2dwS8pxsgqctbmIan+Jgip7rsqZ2G
 OgLXBsb43eEs+6WgO8tVt/ZHYj9ivGMdrcNcsIfikzNs/xweUJ53k2xSEn2xEa5J
 cwANDmkL6QQK7yfeeg==
 =s0j1
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "There are three areas of note:

  A bunch of strlcpy()->strscpy() conversions ended up living in my tree
  since they were either Acked by maintainers for me to carry, or got
  ignored for multiple weeks (and were trivial changes).

  The compiler option '-fstrict-flex-arrays=3' has been enabled
  globally, and has been in -next for the entire devel cycle. This
  changes compiler diagnostics (though mainly just -Warray-bounds which
  is disabled) and potential UBSAN_BOUNDS and FORTIFY _warning_
  coverage. In other words, there are no new restrictions, just
  potentially new warnings. Any new FORTIFY warnings we've seen have
  been fixed (usually in their respective subsystem trees). For more
  details, see commit df8fc4e934.

  The under-development compiler attribute __counted_by has been added
  so that we can start annotating flexible array members with their
  associated structure member that tracks the count of flexible array
  elements at run-time. It is possible (likely?) that the exact syntax
  of the attribute will change before it is finalized, but GCC and Clang
  are working together to sort it out. Any changes can be made to the
  macro while we continue to add annotations.

  As an example of that last case, I have a treewide commit waiting with
  such annotations found via Coccinelle:

    https://git.kernel.org/linus/adc5b3cb48a049563dc673f348eab7b6beba8a9b

  Also see commit dd06e72e68 for more details.

  Summary:

   - Fix KMSAN vs FORTIFY in strlcpy/strlcat (Alexander Potapenko)

   - Convert strreplace() to return string start (Andy Shevchenko)

   - Flexible array conversions (Arnd Bergmann, Wyes Karny, Kees Cook)

   - Add missing function prototypes seen with W=1 (Arnd Bergmann)

   - Fix strscpy() kerndoc typo (Arne Welzel)

   - Replace strlcpy() with strscpy() across many subsystems which were
     either Acked by respective maintainers or were trivial changes that
     went ignored for multiple weeks (Azeem Shaikh)

   - Remove unneeded cc-option test for UBSAN_TRAP (Nick Desaulniers)

   - Add KUnit tests for strcat()-family

   - Enable KUnit tests of FORTIFY wrappers under UML

   - Add more complete FORTIFY protections for strlcat()

   - Add missed disabling of FORTIFY for all arch purgatories.

   - Enable -fstrict-flex-arrays=3 globally

   - Tightening UBSAN_BOUNDS when using GCC

   - Improve checkpatch to check for strcpy, strncpy, and fake flex
     arrays

   - Improve use of const variables in FORTIFY

   - Add requested struct_size_t() helper for types not pointers

   - Add __counted_by macro for annotating flexible array size members"

* tag 'hardening-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (54 commits)
  netfilter: ipset: Replace strlcpy with strscpy
  uml: Replace strlcpy with strscpy
  um: Use HOST_DIR for mrproper
  kallsyms: Replace all non-returning strlcpy with strscpy
  sh: Replace all non-returning strlcpy with strscpy
  of/flattree: Replace all non-returning strlcpy with strscpy
  sparc64: Replace all non-returning strlcpy with strscpy
  Hexagon: Replace all non-returning strlcpy with strscpy
  kobject: Use return value of strreplace()
  lib/string_helpers: Change returned value of the strreplace()
  jbd2: Avoid printing outside the boundary of the buffer
  checkpatch: Check for 0-length and 1-element arrays
  riscv/purgatory: Do not use fortified string functions
  s390/purgatory: Do not use fortified string functions
  x86/purgatory: Do not use fortified string functions
  acpi: Replace struct acpi_table_slit 1-element array with flex-array
  clocksource: Replace all non-returning strlcpy with strscpy
  string: use __builtin_memcpy() in strlcpy/strlcat
  staging: most: Replace all non-returning strlcpy with strscpy
  drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
  ...
2023-06-27 21:24:18 -07:00
Zhang Yi
bd5a594b5b ext4: update doc about journal superblock description
Update the description in doc about the s_head parameter in the journal
superblock.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20230322013353.1843306-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26 19:35:13 -04:00
Linus Torvalds
74774e243c fsverity updates for 6.5
Several updates for fs/verity/:
 
 - Do all hashing with the shash API instead of with the ahash API.  This
   simplifies the code and reduces API overhead.  It should also make
   things slightly easier for XFS's upcoming support for fsverity.  It
   does drop fsverity's support for off-CPU hash accelerators, but that
   support was incomplete and not known to be used.
 
 - Update and export fsverity_get_digest() so that it's ready for
   overlayfs's upcoming support for fsverity checking of lowerdata.
 
 - Improve the documentation for builtin signature support.
 
 - Fix a bug in the large folio support.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZJjsWRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0IsAQCZ9ZP2U0DqLKV025LzcU9epUdS30xJ
 U7WOs8gP63pH4QEAqbU1O6bVhEzdFWGzq5gdzdLePWjOyHrGCVcR2u+fgw4=
 =ptAR
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity updates from Eric Biggers:
 "Several updates for fs/verity/:

   - Do all hashing with the shash API instead of with the ahash API.

     This simplifies the code and reduces API overhead. It should also
     make things slightly easier for XFS's upcoming support for
     fsverity. It does drop fsverity's support for off-CPU hash
     accelerators, but that support was incomplete and not known to be
     used

   - Update and export fsverity_get_digest() so that it's ready for
     overlayfs's upcoming support for fsverity checking of lowerdata

   - Improve the documentation for builtin signature support

   - Fix a bug in the large folio support"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: improve documentation for builtin signature support
  fsverity: rework fsverity_get_digest() again
  fsverity: simplify error handling in verify_data_block()
  fsverity: don't use bio_first_page_all() in fsverity_verify_bio()
  fsverity: constify fsverity_hash_alg
  fsverity: use shash API instead of ahash API
2023-06-26 10:56:13 -07:00
Linus Torvalds
2eedfa9e27 v6.5/vfs.rename.locking
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZJU4NwAKCRCRxhvAZXjc
 ordqAP0RmZEkUA5p98pK+0FEFIsS2S8qChh6YHQHP+hF606FGgEAivb3UPRm9p58
 kRb5yK0/oXDUxGv7A+x+SIMVbcRyLgw=
 =pi6N
 -----END PGP SIGNATURE-----

Merge tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rename locking updates from Christian Brauner:
 "This contains the work from Jan to fix problems with cross-directory
  renames originally reported in [1].

  To quickly sum it up some filesystems (so far we know at least about
  ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to
  lock the directory when it is being renamed into another directory.

  This is because we need to update the parent pointer in the directory
  in that case and if that races with other operations on the directory,
  in particular a conversion from one directory format into another, bad
  things can happen.

  So far we've done the locking in the filesystem code but recently
  Darrick pointed out in [2] that the RENAME_EXCHANGE case was missing.
  That one is particularly nasty because RENAME_EXCHANGE can arbitrarily
  mix regular files and directories and proper lock ordering is not
  achievable in the filesystems alone.

  This patch set adds locking into vfs_rename() so that not only parent
  directories but also moved inodes, regardless of whether they are
  directories or not, are locked when calling into the filesystem.

  This means establishing a locking order for unrelated directories. New
  helpers are added for this purpose and our documentation is updated to
  cover this in detail.

  The locking is now actually easier to follow as we now always lock
  source and target. We've always locked the target independent of
  whether it was a directory or file and we've always locked source if
  it was a regular file. The exact details for why this came about can
  be found in [3] and [4]"

Link: https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3 [1]
Link: https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs [2]
Link: https://lore.kernel.org/all/20230526-schrebergarten-vortag-9cd89694517e@brauner [3]
Link: https://lore.kernel.org/all/20230530-seenotrettung-allrad-44f4b00139d4@brauner [4]

* tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: Restrict lock_two_nondirectories() to non-directory inodes
  fs: Lock moved directories
  fs: Establish locking order for unrelated directories
  Revert "f2fs: fix potential corruption when moving a directory"
  Revert "udf: Protect rename against modification of moved directory"
  ext4: Remove ext4 locking of moved directory
2023-06-26 10:01:26 -07:00
David Howells
dc97391e66 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked().  sendmsg() with
MSG_SPLICE_PAGES should be used instead.  This allows multiple pages and
multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-24 15:50:13 -07:00
Eric Biggers
672d6ef4c7 fsverity: improve documentation for builtin signature support
fsverity builtin signatures (CONFIG_FS_VERITY_BUILTIN_SIGNATURES) aren't
the only way to do signatures with fsverity, and they have some major
limitations.  Yet, more users have tried to use them, e.g. recently by
https://github.com/ostreedev/ostree/pull/2640.  In most cases this seems
to be because users aren't sufficiently familiar with the limitations of
this feature and what the alternatives are.

Therefore, make some updates to the documentation to try to clarify the
properties of this feature and nudge users in the right direction.

Note that the Integrity Policy Enforcement (IPE) LSM, which is not yet
upstream, is planned to use the builtin signatures.  (This differs from
IMA, which uses its own signature mechanism.)  For that reason, my
earlier patch "fsverity: mark builtin signatures as deprecated"
(https://lore.kernel.org/r/20221208033548.122704-1-ebiggers@kernel.org),
which marked builtin signatures as "deprecated", was controversial.

This patch therefore stops short of marking the feature as deprecated.
I've also revised the language to focus on better explaining the feature
and what its alternatives are.

Link: https://lore.kernel.org/r/20230620041937.5809-1-ebiggers@kernel.org
Reviewed-by: Colin Walters <walters@verbum.org>
Reviewed-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-20 22:47:55 -07:00
Amir Goldstein
af5f2396b6 ovl: store enum redirect_mode in config instead of a string
Do all the logic to set the mode during mount options parsing and
do not keep the option string around.

Use a constant_table to translate from enum redirect mode to string
in preperation for new mount api option parsing.

The mount option "off" is translated to either "follow" or "nofollow",
depending on the "redirect_always_follow" build/module config, so
in effect, there are only three possible redirect modes.

This results in a minor change to the string that is displayed
in show_options() - when redirect_dir is enabled by default and the user
mounts with the option "redirect_dir=off", instead of displaying the mode
"redirect_dir=off" in show_options(), the displayed mode will be either
"redirect_dir=follow" or "redirect_dir=nofollow", depending on the value
of "redirect_always_follow" build/module config.

The displayed mode reflects the effective mode, so mounting overlayfs
again with the dispalyed redirect_dir option will result with the same
effective and displayed mode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-19 14:02:01 +03:00
Amir Goldstein
37ebf056d6 ovl: introduce data-only lower layers
Introduce the format lowerdir=lower1:lower2::lowerdata1::lowerdata2
where the lower layers on the right of the :: separators are not merged
into the overlayfs merge dirs.

Data-only lower layers are only allowed at the bottom of the stack.

The files in those layers are only meant to be accessible via absolute
redirect from metacopy files in lower layers.  Following changes will
implement lookup in the data layers.

This feature was requested for composefs ostree use case, where the
lower data layer should only be accessiable via absolute redirects
from metacopy inodes.

The lower data layers are not required to a have a unique uuid or any
uuid at all, because they are never used to compose the overlayfs inode
st_ino/st_dev.

Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2023-06-19 14:01:13 +03:00
Jan Kara
28eceeda13
fs: Lock moved directories
When a directory is moved to a different directory, some filesystems
(udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to
update their pointer to the parent and this must not race with other
operations on the directory. Lock the directories when they are moved.
Although not all filesystems need this locking, we perform it in
vfs_rename() because getting the lock ordering right is really difficult
and we don't want to expose these locking details to filesystems.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230601105830.13168-5-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-02 15:00:18 +02:00
Arnd Bergmann
e910c8e3aa autofs: use flexible array in ioctl structure
Commit df8fc4e934 ("kbuild: Enable -fstrict-flex-arrays=3") introduced a warning
for the autofs_dev_ioctl structure:

In function 'check_name',
    inlined from 'validate_dev_ioctl' at fs/autofs/dev-ioctl.c:131:9,
    inlined from '_autofs_dev_ioctl' at fs/autofs/dev-ioctl.c:624:8:
fs/autofs/dev-ioctl.c:33:14: error: 'strchr' reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
   33 |         if (!strchr(name, '/'))
      |              ^~~~~~~~~~~~~~~~~
In file included from include/linux/auto_dev-ioctl.h:10,
                 from fs/autofs/autofs_i.h:10,
                 from fs/autofs/dev-ioctl.c:14:
include/uapi/linux/auto_dev-ioctl.h: In function '_autofs_dev_ioctl':
include/uapi/linux/auto_dev-ioctl.h:112:14: note: source object 'path' of size 0
  112 |         char path[0];
      |              ^~~~

This is easily fixed by changing the gnu 0-length array into a c99
flexible array. Since this is a uapi structure, we have to be careful
about possible regressions but this one should be fine as they are
equivalent here. While it would break building with ancient gcc versions
that predate c99, it helps building with --std=c99 and -Wpedantic builds
in user space, as well as non-gnu compilers. This means we probably
also want it fixed in stable kernels.

Cc: stable@vger.kernel.org
Cc: Kees Cook <keescook@chromium.org>
Cc: Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230523081944.581710-1-arnd@kernel.org
2023-05-30 16:42:00 -07:00
Steve French
ab6cacf833 smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb
Documentation/filesystems/cifs contains both server and client information
so its pathname is misleading.  In addition, the directory fs/smb
now contains both server and client, so move Documentation/filesystems/cifs
to Documentation/filesystems/smb

Suggested-by: Namjae Jeon <linkinjeon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:29:21 -05:00
Steve French
bf8a352d49 cifs: correct references in Documentation to old fs/cifs path
The fs/cifs directory has moved to fs/smb/client, correct mentions
of this in Documentation and comments.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-24 16:29:21 -05:00
Amir Goldstein
304e9c83e8 exportfs: add explicit flag to request non-decodeable file handles
So far, all callers of exportfs_encode_inode_fh(), except for fsnotify's
show_mark_fhandle(), check that filesystem can decode file handles, but
we would like to add more callers that do not require a file handle that
can be decoded.

Introduce a flag to explicitly request a file handle that may not to be
decoded later and a wrapper exportfs_encode_fid() that sets this flag
and convert show_mark_fhandle() to use the new wrapper.

This will be used to allow adding fanotify support to filesystems that
do not support NFS export.

Acked-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230502124817.3070545-3-amir73il@gmail.com>
2023-05-22 18:08:37 +02:00
Randy Dunlap
bd415b5c95 Documentation/filesystems: ramfs-rootfs-initramfs: use :Author:
Use the :Author: markup instead of making it a chapter heading.
This cleans up the table of contents for this file.

Fixes: 7f46a240b0 ("[PATCH] ramfs, rootfs, and initramfs docs")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Landley <rob@landley.net>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230508055928.3548-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-05-16 12:55:35 -06:00
Randy Dunlap
7b38ecfdf9 Documentation/filesystems: sharedsubtree: add section headings
Several of the sections are missing underlines. This makes the
generated contents have missing entries, so add the underlines.

Fixes: 16c01b20ae ("doc/filesystems: more mount cleanups")
Fixes: 9cfcceea8f ("[PATCH] Complete description of shared subtrees.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230508055938.6550-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-05-16 12:50:05 -06:00
Chao Yu
b62e71be21 f2fs: support errors=remount-ro|continue|panic mountoption
This patch supports errors=remount-ro|continue|panic mount option
for f2fs.

f2fs behaves as below in three different modes:
mode			continue	remount-ro	panic
access ops		normal		noraml		N/A
syscall errors		-EIO		-EROFS		N/A
mount option		rw		ro		N/A
pending dir write	keep		keep		N/A
pending non-dir write	drop		keep		N/A
pending node write	drop		keep		N/A
pending meta write	keep		keep		N/A

By default it uses "continue" mode.

[Yangtao helps to clean up function's name]
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-08 11:18:04 -07:00
Linus Torvalds
8e15605be8 9p patches for 6.4 merge window
This pull request includes a number of patches that didn't quite make
 the cut last merge window while we addressed some outstanding issues
 and review comments.  It includes some new caching modes for those that
 only want readahead caches and reworks how we do writeback caching so we
 are not keeping extra references around which both causes performance problems
 and uses lots of additional resources on the server.
 
 It also includes a new flag to force disabling of xattrs which can also
 cause major performance issues, particularly if the underlying filesystem
 on the server doesn't support them.
 
 Finally it adds a couple of additional mount options to better support
 directio and enabling caches when the server doesn't support qid.version.
 
 There was one late-breaking bug report that has also been included as its
 own patch where I forgot to propagate an embarassing bit-logic fix to the
 various variations of open.  Since that was only added to for-next a week
 ago, if you would like to not include it, I can include it in the first
 round of fixes for -rc2.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmRS/LEACgkQiP/V+0pf
 /5i84RAAqFtTYPYxg0VMLvjix0BCCE7CYR+vr7UsoGFeuVRyFsKh7G6fhmRXUhpG
 2kf+1CK+xzQd9PEKiQnVmGhib9SCdeWqb9EUpCgLZEmrSci0qUenzjf3Jg5Qgwhx
 u6xu9tipzHeHMFBD6n0f+j2fZEsDAv5IzgL14F9YfJQueVsL+HUOifLncruWnUEn
 rJAf7omhE1x+FfWsNaB22AksvYUXtQfoG3MgCPWql/XKlo/6xeW/8/eprutIO06M
 LUp3ie4RbSRZiE63SfPhxFMJCZ7g+R1JSKe1J8i9bbbwzYsVO8dovwdMCw0tP7ZQ
 QjVq+Ng4x0Rn5Bmekj18ua7zRsbl96DaoqcnYWKUOHkRVJleTG9t31MeZQaRC+gh
 F3321XkqDBHMXPVVLVQ1Gb1Pxt8dk9iFwmTMxktTrM4n5Zwv8ldMDKQgOhDIv9WA
 dDTjZ7mYpk2f9atxA0oys5oebQlT3D0CM/3p6n/PsXXpk30tat99kgcKlpN8SrL+
 Fs0UkXTQuu0Vin8zaMarQ2TW2UGQVM8qRD2gTooRnOItdqItx17czaE2ALOdIuVs
 LCbDxOXNPP/YbzakNUxh5ldI3Z3eahbilVfGa8ILfvprKnH7MaMmSo0rkP4XHj1h
 JDUiN7JOCOW4dvfaXa4knSIjs62Y9oTS0MWrO9Ajq8bg4ku6U7c=
 =qXIe
 -----END PGP SIGNATURE-----

Merge tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
 "This includes a number of patches that didn't quite make the cut last
  merge window while we addressed some outstanding issues and review
  comments. It includes some new caching modes for those that only want
  readahead caches and reworks how we do writeback caching so we are not
  keeping extra references around which both causes performance problems
  and uses lots of additional resources on the server.

  It also includes a new flag to force disabling of xattrs which can
  also cause major performance issues, particularly if the underlying
  filesystem on the server doesn't support them.

  Finally it adds a couple of additional mount options to better support
  directio and enabling caches when the server doesn't support
  qid.version.

  There was one late-breaking bug report that has also been included as
  its own patch where I forgot to propagate an embarassing bit-logic fix
  to the various variations of open"

* tag '9p-6.4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Fix bit operation logic error
  fs/9p: Rework cache modes and add new options to Documentation
  fs/9p: remove writeback fid and fix per-file modes
  fs/9p: Add new mount modes
  9p: Add additional debug flags and open modes
  fs/9p: allow disable of xattr support on mount
  fs/9p: Remove unnecessary superblock flags
  fs/9p: Consolidate file operations and add readahead and writeback
2023-05-04 14:37:53 -07:00
Linus Torvalds
1e098dec61 driver ntfs3 for linux 6.4
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmRLZc0ACgkQqbAzH4Mk
 B7YWkA/+JHsryYjInVGmYfQ4oCmq1F2cEDgKdTeths+IZ+2vbk03hUbvmLBg8YR3
 VvYZ2YqNAs9HuZxP6mnVZB9YOMIFEdV3Pyto68l9jwip4irpjVrABtxwn6udTAmx
 RXVDDLRxN53UyoWgRoOfAEJUXfmOvw7laj8T9PrsKxXKhlOKmK8mSKtq8xTt+ECo
 8cJ0Nw2d0TVY6+Ou6CCfH8CzHtQjInHnKSt/KynJ+OJHxLmHRGlbDJbyjuJU0OBr
 grXngKPmqvHTiO3Zs14gRv5tFXNMGdvRcomy/XnztD/nIC2jEODI2uUCSedEh5vf
 IpzjKwQF1qRseGHwr2U0THPHOso4IP2T79WRQEuLY8DUGJClIGGmcdfeBqNnjbey
 1GVUis/leBvQb1CMAkX5HjGXO9i6xEfuTxwHSlk+Wu8euEDQ8OWudyeQmOYiHnqS
 vKZxXY88DDjATSokSUSSb8IW63z+JEPmovsmLKLpWusvWikIKhIEnjRZUG5XhgS3
 Ux66Owt9yguKgVAPX66b9PGQx4wy3GAR6FRxG1BpDX0XooGbzGRTAn213sjkBKiV
 JLswbQTJ5LOXv8bS1srNMPhQFeqFcEgrDbC+WxmOpzTvutimYmzUTDrOPq92Xw0f
 oi/i5kCoPFmOfoRsLhIl+lft1XLZ5iYFvj0faDJ5AWUOjb73Z3s=
 =xlIh
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_6.4' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:
 "New code:

   - add missed "nocase" in ntfs_show_options

   - extend information on failures/errors

   - small optimizations

  Fixes:

   - some logic errors

   - some dead code was removed

   - code is refactored and reformatted according to the new version of
     clang-format

  Code removal:

   - 'noacsrules' option.

     Currently, this option does not work properly, and its use leads to
     unstable results. If we figure out how to implement it without
     errors, we will add it later

   - writepage"

* tag 'ntfs3_for_6.4' of https://github.com/Paragon-Software-Group/linux-ntfs3: (30 commits)
  fs/ntfs3: Fix root inode checking
  fs/ntfs3: Print details about mount fails
  fs/ntfs3: Add missed "nocase" in ntfs_show_options
  fs/ntfs3: Code formatting and refactoring
  fs/ntfs3: Changed ntfs_get_acl() to use dentry
  fs/ntfs3: Remove field sbi->used.bitmap.set_tail
  fs/ntfs3: Undo critial modificatins to keep directory consistency
  fs/ntfs3: Undo endian changes
  fs/ntfs3: Optimization in ntfs_set_state()
  fs/ntfs3: Fix ntfs_create_inode()
  fs/ntfs3: Remove noacsrules
  fs/ntfs3: Use bh_read to simplify code
  fs/ntfs3: Fix a possible null-pointer dereference in ni_clear()
  fs/ntfs3: Refactoring of various minor issues
  fs/ntfs3: Restore overflow checking for attr size in mi_enum_attr
  fs/ntfs3: Check for extremely large size of $AttrDef
  fs/ntfs3: Improved checking of attribute's name length
  fs/ntfs3: Add null pointer checks
  fs/ntfs3: fix spelling mistake "attibute" -> "attribute"
  fs/ntfs3: Add length check in indx_get_root
  ...
2023-04-29 10:52:37 -07:00
Linus Torvalds
56c455b38d xfs: New code for 6.4
o Added detailed design documentation for the upcoming online repair feature
 o major update to online scrub to complete the reverse mapping cross-referencing
   infrastructure enabling us to fully validate allocated metadata against owner
   records. This is the last piece of scrub infrastructure needed before we can
   start merging online repair functionality.
 o Fixes for the ascii-ci hashing issues
 o deprecation of the ascii-ci functionality
 o on-disk format verification bug fixes
 o various random bug fixes for syzbot and other bug reports
 
 Signed-off-by: Dave Chinner <david@fromorbit.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmRMTZ8UHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h3XtA//bZYjsYRU3hzyGLKee++5t/zbiqZB
 KWw8zuPEdEsSAPphK4DQYO7XPWetgFh8iBU39M8TM0+g5YacjzBLGADjQiEv7naN
 IxSoElQQzZbvMcUPOnuRaoopn0v7pbWIDRo3hKWaRDKCrnMGOnTvDFuC/VX0RAbn
 GzPimbuvaYJPXTnWTwsKeAuVYP4HLdTh2R1gUMjyY80Ed08hxhCzrXSvjEtuxOOy
 tDk50wJUhgx7UTgFBsXug1wXLCYwDFvAUjpsBKnmq+vSl0MpI3TdCetmSQbuvAeu
 gvkRyBMOcqcY5rlozcKPpyXwy7I0ftXOY4xpUSW8H9tAx0oVImkC69DsAjotQV0r
 r6vEtcw7LgmaS9kbA6G2Z4JfKEHuf2d/6OI4onZh25b5SWq7+qFBPo67AoFg8UQf
 bKSf3QQNVLTyWqpRf8Z3XOEBygYGsDUuxrm2AA5Aar4t4T3y5oAKFKkf4ZAlAYxH
 KViQsq0qVcoQ4k4txZgU7XQrftKyu2csqxqtKDozH7FutxscchZEwvjdQ6jnS2+L
 2Qlf6On8edfEkPKzF7/1cgxUXCXuTqakFVetChXjZ1/ZFt9LUYphvESdaolJ8Aqz
 lwEy5UrbC2oMrBDT7qESLWs3U66mPhRaaFfuLUJRyhHN3Y0tVVA2mgNzyD6oBQVy
 ffIbZ3+1QEPOaOQ=
 =lBJy
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Dave Chinner:
 "This consists mainly of online scrub functionality and the design
  documentation for the upcoming online repair functionality built on
  top of the scrub code:

   - Added detailed design documentation for the upcoming online repair
     feature

   - major update to online scrub to complete the reverse mapping
     cross-referencing infrastructure enabling us to fully validate
     allocated metadata against owner records. This is the last piece of
     scrub infrastructure needed before we can start merging online
     repair functionality.

   - Fixes for the ascii-ci hashing issues

   - deprecation of the ascii-ci functionality

   - on-disk format verification bug fixes

   - various random bug fixes for syzbot and other bug reports"

* tag 'xfs-6.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (107 commits)
  xfs: fix livelock in delayed allocation at ENOSPC
  xfs: Extend table marker on deprecated mount options table
  xfs: fix duplicate includes
  xfs: fix BUG_ON in xfs_getbmap()
  xfs: verify buffer contents when we skip log replay
  xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
  xfs: remove WARN when dquot cache insertion fails
  xfs: don't consider future format versions valid
  xfs: deprecate the ascii-ci feature
  xfs: test the ascii case-insensitive hash
  xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
  xfs: cross-reference rmap records with refcount btrees
  xfs: cross-reference rmap records with inode btrees
  xfs: cross-reference rmap records with free space btrees
  xfs: cross-reference rmap records with ag btrees
  xfs: introduce bitmap type for AG blocks
  xfs: convert xbitmap to interval tree
  xfs: drop the _safe behavior from the xbitmap foreach macro
  xfs: don't load local xattr values during scrub
  xfs: remove the for_each_xbitmap_ helpers
  ...
2023-04-29 10:44:27 -07:00
Linus Torvalds
33afd4b763 Mainly singleton patches all over the place. Series of note are:
- updates to scripts/gdb from Glenn Washburn
 
 - kexec cleanups from Bjorn Helgaas
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr+6wAKCRDdBJ7gKXxA
 jn4NAP4u/hj/kR2dxYehcVLuQqJspCRZZBZlAReFJyHNQO6voAEAk0NN9rtG2+/E
 r0G29CJhK+YL0W6mOs8O1yo9J1rZnAM=
 =2CUV
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Mainly singleton patches all over the place.

  Series of note are:

   - updates to scripts/gdb from Glenn Washburn

   - kexec cleanups from Bjorn Helgaas"

* tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits)
  mailmap: add entries for Paul Mackerras
  libgcc: add forward declarations for generic library routines
  mailmap: add entry for Oleksandr
  ocfs2: reduce ioctl stack usage
  fs/proc: add Kthread flag to /proc/$pid/status
  ia64: fix an addr to taddr in huge_pte_offset()
  checkpatch: introduce proper bindings license check
  epoll: rename global epmutex
  scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()
  scripts/gdb: create linux/vfs.py for VFS related GDB helpers
  uapi/linux/const.h: prefer ISO-friendly __typeof__
  delayacct: track delays from IRQ/SOFTIRQ
  scripts/gdb: timerlist: convert int chunks to str
  scripts/gdb: print interrupts
  scripts/gdb: raise error with reduced debugging information
  scripts/gdb: add a Radix Tree Parser
  lib/rbtree: use '+' instead of '|' for setting color.
  proc/stat: remove arch_idle_time()
  checkpatch: check for misuse of the link tags
  checkpatch: allow Closes tags with links
  ...
2023-04-27 19:57:00 -07:00
Linus Torvalds
7fa8a8ee94 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
 
 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
 
 - zsmalloc performance improvements from Sergey Senozhatsky.
 
 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.
 
 - VFS rationalizations from Christoph Hellwig:
 
   - removal of most of the callers of write_one_page().
 
   - make __filemap_get_folio()'s return value more useful
 
 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing.  Use `mount -o noswap'.
 
 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.
 
 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).
 
 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.
 
 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
   than exclusive, and has fixed a bunch of errors which were caused by its
   unintuitive meaning.
 
 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.
 
 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.
 
 - Cleanups to do_fault_around() by Lorenzo Stoakes.
 
 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.
 
 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.
 
 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.
 
 - Lorenzo has also contributed further cleanups of vma_merge().
 
 - Chaitanya Prakash provides some fixes to the mmap selftesting code.
 
 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.
 
 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.
 
 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.
 
 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.
 
 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.
 
 - Yosry Ahmed has improved the performance of memcg statistics flushing.
 
 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.
 
 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.
 
 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.
 
 - Pankaj Raghav has made page_endio() unneeded and has removed it.
 
 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.
 
 - Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
 
 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.
 
 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.
 
 - Peter Xu fixes a few issues with uffd-wp at fork() time.
 
 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
 jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
 =J+Dj
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
   switching from a user process to a kernel thread.

 - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
   Raghav.

 - zsmalloc performance improvements from Sergey Senozhatsky.

 - Yue Zhao has found and fixed some data race issues around the
   alteration of memcg userspace tunables.

 - VFS rationalizations from Christoph Hellwig:
     - removal of most of the callers of write_one_page()
     - make __filemap_get_folio()'s return value more useful

 - Luis Chamberlain has changed tmpfs so it no longer requires swap
   backing. Use `mount -o noswap'.

 - Qi Zheng has made the slab shrinkers operate locklessly, providing
   some scalability benefits.

 - Keith Busch has improved dmapool's performance, making part of its
   operations O(1) rather than O(n).

 - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
   permitting userspace to wr-protect anon memory unpopulated ptes.

 - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
   rather than exclusive, and has fixed a bunch of errors which were
   caused by its unintuitive meaning.

 - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
   which causes minor faults to install a write-protected pte.

 - Vlastimil Babka has done some maintenance work on vma_merge():
   cleanups to the kernel code and improvements to our userspace test
   harness.

 - Cleanups to do_fault_around() by Lorenzo Stoakes.

 - Mike Rapoport has moved a lot of initialization code out of various
   mm/ files and into mm/mm_init.c.

 - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
   DRM, but DRM doesn't use it any more.

 - Lorenzo has also coverted read_kcore() and vread() to use iterators
   and has thereby removed the use of bounce buffers in some cases.

 - Lorenzo has also contributed further cleanups of vma_merge().

 - Chaitanya Prakash provides some fixes to the mmap selftesting code.

 - Matthew Wilcox changes xfs and afs so they no longer take sleeping
   locks in ->map_page(), a step towards RCUification of pagefaults.

 - Suren Baghdasaryan has improved mmap_lock scalability by switching to
   per-VMA locking.

 - Frederic Weisbecker has reworked the percpu cache draining so that it
   no longer causes latency glitches on cpu isolated workloads.

 - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
   logic.

 - Liu Shixin has changed zswap's initialization so we no longer waste a
   chunk of memory if zswap is not being used.

 - Yosry Ahmed has improved the performance of memcg statistics
   flushing.

 - David Stevens has fixed several issues involving khugepaged,
   userfaultfd and shmem.

 - Christoph Hellwig has provided some cleanup work to zram's IO-related
   code paths.

 - David Hildenbrand has fixed up some issues in the selftest code's
   testing of our pte state changing.

 - Pankaj Raghav has made page_endio() unneeded and has removed it.

 - Peter Xu contributed some rationalizations of the userfaultfd
   selftests.

 - Yosry Ahmed has fixed an issue around memcg's page recalim
   accounting.

 - Chaitanya Prakash has fixed some arm-related issues in the
   selftests/mm code.

 - Longlong Xia has improved the way in which KSM handles hwpoisoned
   pages.

 - Peter Xu fixes a few issues with uffd-wp at fork() time.

 - Stefan Roesch has changed KSM so that it may now be used on a
   per-process and per-cgroup basis.

* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
  mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
  shmem: restrict noswap option to initial user namespace
  mm/khugepaged: fix conflicting mods to collapse_file()
  sparse: remove unnecessary 0 values from rc
  mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
  hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
  maple_tree: fix allocation in mas_sparse_area()
  mm: do not increment pgfault stats when page fault handler retries
  zsmalloc: allow only one active pool compaction context
  selftests/mm: add new selftests for KSM
  mm: add new KSM process and sysfs knobs
  mm: add new api to enable ksm per process
  mm: shrinkers: fix debugfs file permissions
  mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
  migrate_pages_batch: fix statistics for longterm pin retry
  userfaultfd: use helper function range_in_vma()
  lib/show_mem.c: use for_each_populated_zone() simplify code
  mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
  fs/buffer: convert create_page_buffers to folio_create_buffers
  fs/buffer: add folio_create_empty_buffers helper
  ...
2023-04-27 19:42:02 -07:00
Linus Torvalds
556eb8b791 Driver core changes for 6.4-rc1
Here is the large set of driver core changes for 6.4-rc1.
 
 Once again, a busy development cycle, with lots of changes happening in
 the driver core in the quest to be able to move "struct bus" and "struct
 class" into read-only memory, a task now complete with these changes.
 
 This will make the future rust interactions with the driver core more
 "provably correct" as well as providing more obvious lifetime rules for
 all busses and classes in the kernel.
 
 The changes required for this did touch many individual classes and
 busses as many callbacks were changed to take const * parameters
 instead.  All of these changes have been submitted to the various
 subsystem maintainers, giving them plenty of time to review, and most of
 them actually did so.
 
 Other than those changes, included in here are a small set of other
 things:
   - kobject logging improvements
   - cacheinfo improvements and updates
   - obligatory fw_devlink updates and fixes
   - documentation updates
   - device property cleanups and const * changes
   - firwmare loader dependency fixes.
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
 LEGadNS38k5fs+73UaxV
 =7K4B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.4-rc1.

  Once again, a busy development cycle, with lots of changes happening
  in the driver core in the quest to be able to move "struct bus" and
  "struct class" into read-only memory, a task now complete with these
  changes.

  This will make the future rust interactions with the driver core more
  "provably correct" as well as providing more obvious lifetime rules
  for all busses and classes in the kernel.

  The changes required for this did touch many individual classes and
  busses as many callbacks were changed to take const * parameters
  instead. All of these changes have been submitted to the various
  subsystem maintainers, giving them plenty of time to review, and most
  of them actually did so.

  Other than those changes, included in here are a small set of other
  things:

   - kobject logging improvements

   - cacheinfo improvements and updates

   - obligatory fw_devlink updates and fixes

   - documentation updates

   - device property cleanups and const * changes

   - firwmare loader dependency fixes.

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
  device property: make device_property functions take const device *
  driver core: update comments in device_rename()
  driver core: Don't require dynamic_debug for initcall_debug probe timing
  firmware_loader: rework crypto dependencies
  firmware_loader: Strip off \n from customized path
  zram: fix up permission for the hot_add sysfs file
  cacheinfo: Add use_arch[|_cache]_info field/function
  arch_topology: Remove early cacheinfo error message if -ENOENT
  cacheinfo: Check cache properties are present in DT
  cacheinfo: Check sib_leaf in cache_leaves_are_shared()
  cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
  cacheinfo: Add arm64 early level initializer implementation
  cacheinfo: Add arch specific early level initializer
  tty: make tty_class a static const structure
  driver core: class: remove struct class_interface * from callbacks
  driver core: class: mark the struct class in struct class_interface constant
  driver core: class: make class_register() take a const *
  driver core: class: mark class_release() as taking a const *
  driver core: remove incorrect comment for device_create*
  MIPS: vpe-cmp: remove module owner pointer from struct class usage.
  ...
2023-04-27 11:53:57 -07:00
Linus Torvalds
5c7ecada25 f2fs update for 6.4-rc1
In this round, we've mainly modified to support non-power-of-two zone size,
 which is not required for f2fs by design. In order to avoid arch dependency,
 we refactored the messy rb_entry structure shared across different extent_cache.
 In addition to the improvement, we've also fixed several subtle bugs and
 error cases.
 
 Enhancement:
 - support non-power-of-two zone size for zoned device
 - remove sharing the rb_entry structure in extent cache
 - refactor f2fs_gc to call checkpoint in urgent condition
 - support iopoll
 
 Bug fix:
 - fix potential corruption when moving a directory
 - fix to avoid use-after-free for cached IPU bio
 - fix the folio private usage
 - avoid kernel warnings or panics in the cp_error case
 - fix to recover quota data correctly
 - fix some bugs in atomic operations
 - fix system crash due to lack of free space in LFS
 - fix null pointer panic in tracepoint in __replace_atomic_write_block
 - fix iostat lock protection
 - fix scheduling while atomic in decompression path
 - preserve direct write semantics when buffering is forced
 - fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmRIHMMACgkQQBSofoJI
 UNI3Mw//eQvxUXaWtCjTJQtXPotaah6ZcvnMMtfl6Cf0Z8Sq4L9q4yQMA16MXbLU
 zz3cexKXIHTzqWfqFLunaj6cmH/THAY3L3fTkFhE+dx1H2IaFprGLW3H8hW/58tr
 j9365RPVY2d/3agB1KikTj6FQ5OTGibkZagjsC28VmQ30VLIm+4jnHdIoX92UP+k
 87JQ/fbG2XAiHX/ifcVuMXY3++db9jaZahsmhdJ1LNTZzztO241RzrNoBsLcSwSZ
 DkPgJXARQzFNDRfveRXSbV3ygR9C62pNITtSGC86ZRLyoAmko9se+nMEFH7YEkUy
 Rhf0Qzq2Gy6ThiVo8ZjuLvNycF0oj3OefX1PQLT6vzkv3Sv4Yij48bN1HqPdYsKH
 3hPZd2V7A3o2LCJPPPNjZ/6nuKhrX+kU33FjUrxiYqz7Lt74j70vVEHQ7vSCGkrQ
 YpQYVXFr1hdejdemCpwgdvcEegNlV0GfqCG5KL1f7jJiGHfvxZnOEJ3x9dCQFTIE
 xVoWTzw9pbmBkTudrFNVRlX2RSQYSvgLFwUhQ3WE0qNu0mUMP+4E+50iKHYraJ7R
 W1TajZ+ttUJAnZ076vGGEOxabefEdtReOtdstohcJlDaGm5sI9I9CXQRvY4ZSymW
 l7ZHY/b+/IzP+/fLEX7DgTnWip37H14FImvjYRGpSEzc6sXiOUU=
 =qHTl
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs update from Jaegeuk Kim:
 "In this round, we've mainly modified to support non-power-of-two zone
  size, which is not required for f2fs by design. In order to avoid arch
  dependency, we refactored the messy rb_entry structure shared across
  different extent_cache. In addition to the improvement, we've also
  fixed several subtle bugs and error cases.

  Enhancements:
   - support non-power-of-two zone size for zoned device
   - remove sharing the rb_entry structure in extent cache
   - refactor f2fs_gc to call checkpoint in urgent condition
   - support iopoll

  Bug fixes:
   - fix potential corruption when moving a directory
   - fix to avoid use-after-free for cached IPU bio
   - fix the folio private usage
   - avoid kernel warnings or panics in the cp_error case
   - fix to recover quota data correctly
   - fix some bugs in atomic operations
   - fix system crash due to lack of free space in LFS
   - fix null pointer panic in tracepoint in __replace_atomic_write_block
   - fix iostat lock protection
   - fix scheduling while atomic in decompression path
   - preserve direct write semantics when buffering is forced
   - fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()"

* tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
  f2fs: remove unnessary comment in __may_age_extent_tree
  f2fs: allocate node blocks for atomic write block replacement
  f2fs: use cow inode data when updating atomic write
  f2fs: remove power-of-two limitation of zoned device
  f2fs: allocate trace path buffer from names_cache
  f2fs: add has_enough_free_secs()
  f2fs: relax sanity check if checkpoint is corrupted
  f2fs: refactor f2fs_gc to call checkpoint in urgent condition
  f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio
  f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del()
  f2fs: support iopoll method
  f2fs: remove batched_trim_sections node description
  f2fs: fix to check return value of inc_valid_block_count()
  f2fs: fix to check return value of f2fs_do_truncate_blocks()
  f2fs: fix passing relative address when discard zones
  f2fs: fix potential corruption when moving a directory
  f2fs: add radix_tree_preload_end in error case
  f2fs: fix to recover quota data correctly
  f2fs: fix to check readonly condition correctly
  docs: f2fs: Correct instruction to disable checkpoint
  ...
2023-04-26 09:42:10 -07:00
Linus Torvalds
61d325dcbc Changes since last update:
- Add sub-page block size support for uncompressed files;
 
  - Support flattened block device for multi-blob images to be attached
    into virtual machines (including cloud servers) and bare metals;
 
  - Support long xattr name prefixes to optimize images with common
    xattr namespaces (e.g. files with overlayfs xattrs) use cases;
 
  - Various minor cleanups & fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZETCNREceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBJCMAP9VkAPycbbqa6qWUASdyh/HGyuLJTHSfmsJ
 zO4y6hBgOwD9GXg55sY8ycvcOx9ayaUt5V5f9zhs4wdGcoPhj5fWzgA=
 =nUva
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, sub-page block support for uncompressed files is
  available. It's mainly used to enable original signing ('golden')
  4k-block images on arm64 with 16/64k pages. In addition, end users
  could also use this feature to build a manifest to directly refer to
  golden tar data.

  Besides, long xattr name prefix support is also introduced in this
  cycle to avoid too many xattrs with the same prefix (e.g. overlayfs
  xattrs). It's useful for erofs + overlayfs combination (like Composefs
  model): the image size is reduced by ~14% and runtime performance is
  also slightly improved.

  Others are random fixes and cleanups as usual.

  Summary:

   - Add sub-page block size support for uncompressed files

   - Support flattened block device for multi-blob images to be attached
     into virtual machines (including cloud servers) and bare metals

   - Support long xattr name prefixes to optimize images with common
     xattr namespaces (e.g. files with overlayfs xattrs) use cases

   - Various minor cleanups & fixes"

* tag 'erofs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: cleanup i_format-related stuffs
  erofs: sunset erofs_dbg()
  erofs: fix potential overflow calculating xattr_isize
  erofs: get rid of z_erofs_fill_inode()
  erofs: enable long extended attribute name prefixes
  erofs: handle long xattr name prefixes properly
  erofs: add helpers to load long xattr name prefixes
  erofs: introduce on-disk format for long xattr name prefixes
  erofs: move packed inode out of the compression part
  erofs: keep meta inode into erofs_buf
  erofs: initialize packed inode after root inode is assigned
  erofs: stop parsing non-compact HEAD index if clusterofs is invalid
  erofs: don't warn ztailpacking feature anymore
  erofs: simplify erofs_xattr_generic_get()
  erofs: rename init_inode_xattrs with erofs_ prefix
  erofs: move several xattr helpers into xattr.c
  erofs: tidy up EROFS on-disk naming
  erofs: support flattened block device for multi-blob images
  erofs: set block size to the on-disk block size
  erofs: avoid hardcoded blocksize for subpage block support
2023-04-24 14:25:39 -07:00
Linus Torvalds
e2eff52ce5 v6.4/vfs.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZEJxiQAKCRCRxhvAZXjc
 oj4jAP4pt/SuHdZQzv9k2j+UA4hpDmwt7UUWhLxtycQrt9oEoQD/U9hCwnYY9Ksj
 5TSZiK/OGyoslmF1zfAC+1R2crgr3A0=
 =PpBS
 -----END PGP SIGNATURE-----

Merge tag 'v6.4/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains a pile of various smaller fixes. Most of them aren't
  very interesting so this just highlights things worth mentioning:

   - Various filesystems contained the same little helper to convert
     from the mode of a dentry to the DT_* type of that dentry.

     They have now all been switched to rely on the generic
     fs_umode_to_dtype() helper. All custom helpers are removed (Jeff)

   - Fsnotify now reports ACCESS and MODIFY events for splice
     (Chung-Chiang Cheng)

   - After converting timerfd a long time ago to rely on
     wait_event_interruptible_*() apis, convert eventfd as well. This
     removes the complex open-coded wait code (Wen Yang)

   - Simplify sysctl registration for devpts, avoiding the declaration
     of two tables. Instead, just use a prefixed path with
     register_sysctl() (Luis)

   - The setattr_should_drop_sgid() helper is now exported so NFS can
     use it. By switching NFS to this helper an NFS setgid inheritance
     bug is fixed (me)"

* tag 'v6.4/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode()
  pnode: pass mountpoint directly
  eventfd: use wait_event_interruptible_locked_irq() helper
  splice: report related fsnotify events
  fs: consolidate duplicate dt_type helpers
  nfs: use vfs setgid helper
  Update relatime comments to include equality
  fs/buffer: Remove redundant assignment to err
  fs_context: drop the unused lsm_flags member
  fs/namespace: fnic: Switch to use %ptTd
  Documentation: update idmappings.rst
  devpts: simplify two-level sysctl registration for pty_kern_table
  eventpoll: align comment with nested epoll limitation
2023-04-24 13:39:58 -07:00
Linus Torvalds
c23f28975a Commit volume in documentation is relatively low this time, but there is
still a fair amount going on, including:
 
 - Reorganizing the architecture-specific documentation under
   Documentation/arch.  This makes the structure match the source directory
   and helps to clean up the mess that is the top-level Documentation
   directory a bit.  This work creates the new directory and moves x86 and
   most of the less-active architectures there.  The current plan is to move
   the rest of the architectures in 6.5, with the patches going through the
   appropriate subsystem trees.
 
 - Some more Spanish translations and maintenance of the Italian
   translation.
 
 - A new "Kernel contribution maturity model" document from Ted.
 
 - A new tutorial on quickly building a trimmed kernel from Thorsten.
 
 Plus the usual set of updates and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmRGze0PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y/VsH/RyWqinorRVFZmHqRJMRhR0j7hE2pAgK5prE
 dGXYVtHHNQ+25thNaqhZTOLYFbSX6ii2NG7sLRXmyOTGIZrhUCFFXCHkuq4ZUypR
 gJpMUiKQVT4dhln3gIZ0k09NSr60gz8UTcq895N9UFpUdY1SCDhbCcLc4uXTRajq
 NrdgFaHWRkPb+gBRbXOExYm75DmCC6Ny5AyGo2rXfItV//ETjWIJVQpJhlxKrpMZ
 3LgpdYSLhEFFnFGnXJ+EAPJ7gXDi2Tg5DuPbkvJyFOTouF3j4h8lSS9l+refMljN
 xNRessv+boge/JAQidS6u8F2m2ESSqSxisv/0irgtKIMJwXaoX4=
 =1//8
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.4' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "Commit volume in documentation is relatively low this time, but there
  is still a fair amount going on, including:

   - Reorganize the architecture-specific documentation under
     Documentation/arch

     This makes the structure match the source directory and helps to
     clean up the mess that is the top-level Documentation directory a
     bit. This work creates the new directory and moves x86 and most of
     the less-active architectures there.

     The current plan is to move the rest of the architectures in 6.5,
     with the patches going through the appropriate subsystem trees.

   - Some more Spanish translations and maintenance of the Italian
     translation

   - A new "Kernel contribution maturity model" document from Ted

   - A new tutorial on quickly building a trimmed kernel from Thorsten

  Plus the usual set of updates and fixes"

* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
  media: Adjust column width for pdfdocs
  media: Fix building pdfdocs
  docs: clk: add documentation to log which clocks have been disabled
  docs: trace: Fix typo in ftrace.rst
  Documentation/process: always CC responsible lists
  docs: kmemleak: adjust to config renaming
  ELF: document some de-facto PT_* ABI quirks
  Documentation: arm: remove stih415/stih416 related entries
  docs: turn off "smart quotes" in the HTML build
  Documentation: firmware: Clarify firmware path usage
  docs/mm: Physical Memory: Fix grammar
  Documentation: Add document for false sharing
  dma-api-howto: typo fix
  docs: move m68k architecture documentation under Documentation/arch/
  docs: move parisc documentation under Documentation/arch/
  docs: move ia64 architecture docs under Documentation/arch/
  docs: Move arc architecture docs under Documentation/arch/
  docs: move nios2 documentation under Documentation/arch/
  docs: move openrisc documentation under Documentation/arch/
  docs: move superh documentation under Documentation/arch/
  ...
2023-04-24 12:35:49 -07:00
Chunguang Wu
522dc4e5f5 fs/proc: add Kthread flag to /proc/$pid/status
The command `ps -ef ` and `top -c` mark kernel thread by '[' and ']', but
sometimes the result is not correct.  The task->flags in /proc/$pid/stat
is good, but we need remember the value of PF_KTHREAD is 0x00200000 and
convert dec to hex.  If we have no binary program and shell script which
read /proc/$pid/stat, we can know it directly by `cat /proc/$pid/status`.

Link: https://lkml.kernel.org/r/20230416052404.2920-1-fullspring2018@gmail.com
Signed-off-by: Chunguang Wu <fullspring2018@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21 14:54:34 -07:00
Jingbo Xu
d3c4bdcc75 erofs: set block size to the on-disk block size
Set the block size to that specified in on-disk superblock.

Also remove the hard constraint of PAGE_SIZE block size for the
uncompressed device backend.  This constraint is temporarily remained
for compressed device and fscache backend, as there is more work needed
to handle the condition where the block size is not equal to PAGE_SIZE.

It is worth noting that the on-disk block size is read prior to
erofs_superblock_csum_verify(), as the read block size is needed in the
latter.

Besides, later we are going to make erofs refer to tar data blobs (which
is 512-byte aligned) for OCI containers, where the block size is 512
bytes.  In this case, the 512-byte block size may not be adequate for a
directory to contain enough dirents.  To fix this, we are also going to
introduce directory block size independent on the block size.

Due to we have already supported block size smaller than PAGE_SIZE now,
disable all these images with such separated directory block size until
we supported this feature later.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230313135309.75269-3-jefflexu@linux.alibaba.com
[ Gao Xiang: update documentation. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17 01:15:45 +08:00
Wang Han
562abda401 docs: f2fs: Correct instruction to disable checkpoint
This should be 'disable' rather than 'disabled'.

Reported-by: LoveSy <shana@zju.edu.cn>
Signed-off-by: Wang Han <wanghan1995315@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-12 20:00:36 -07:00
Darrick J. Wong
03786f0afb xfs: document future directions of online fsck
Add the seventh and final chapter of the online fsck documentation,
where we talk about future functionality that can tie in with the
functionality provided by the online fsck patchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:52 -07:00
Darrick J. Wong
af051dfb81 xfs: document the userspace fsck driver program
Add the sixth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
driver program xfs_scrub.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:51 -07:00
Darrick J. Wong
a26aa25247 xfs: document directory tree repairs
Directory tree repairs are the least complete part of online fsck, due
to the lack of directory parent pointers.  However, even without that
feature, we can still make some corrections to the directory tree -- we
can salvage as many directory entries as we can from a damaged
directory, and we can reattach orphaned inodes to the lost+found, just
as xfs_repair does now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:51 -07:00
Darrick J. Wong
2f754f7fb9 xfs: document metadata file repair
File-based metadata (such as xattrs and directories) can be extremely
large.  To reduce the memory requirements and maximize code reuse, it is
very convenient to create a temporary file, use the regular dir/attr
code to store salvaged information, and then atomically swap the extents
between the file being repaired and the temporary file.  Record the high
level concepts behind how temporary files and atomic content swapping
should work, and then present some case studies of what the actual
repair functions do.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:50 -07:00
Darrick J. Wong
a0d856eede xfs: document full filesystem scans for online fsck
Certain parts of the online fsck code need to scan every file in the
entire filesystem.  It is not acceptable to block the entire filesystem
while this happens, which means that we need to be clever in allowing
scans to coordinate with ongoing filesystem updates.  We also need to
hook the filesystem so that regular updates propagate to the staging
records.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:50 -07:00
Darrick J. Wong
d697887193 xfs: document online file metadata repair code
Add to the fifth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
kernel to repair file metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:49 -07:00
Darrick J. Wong
7fb8ccffd3 xfs: document btree bulk loading
Add a discussion of the btree bulk loading code, which makes it easy to
take an in-memory recordset and write it out to disk in an efficient
manner.  This also enables atomic switchover from the old to the new
structure with minimal potential for leaking the old blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:49 -07:00
Darrick J. Wong
5f658dad23 xfs: document pageable kernel memory
Add a discussion of pageable kernel memory, since online fsck needs
quite a bit more memory than most other parts of the filesystem to stage
records and other information.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:48 -07:00
Darrick J. Wong
bae43864c0 xfs: document how online fsck deals with eventual consistency
Writes to an XFS filesystem employ an eventual consistency update model
to break up complex multistep metadata updates into small chained
transactions.  This is generally good for performance and scalability
because XFS doesn't need to prepare for enormous transactions, but it
also means that online fsck must be careful not to attempt a fsck action
unless it can be shown that there are no other threads processing a
transaction chain.  This part of the design documentation covers the
thinking behind the consistency model and how scrub deals with it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:48 -07:00
Darrick J. Wong
e5edad5262 xfs: document the filesystem metadata checking strategy
Begin the fifth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
kernel to examine filesystem metadata and cross-reference it around the
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:47 -07:00
Darrick J. Wong
4f7f646970 xfs: document the user interface for online fsck
Start the fourth chapter of the online fsck design documentation, which
discusses the user interface and the background scrubbing service.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:47 -07:00
Darrick J. Wong
9a30b5b521 xfs: document the testing plan for online fsck
Start the third chapter of the online fsck design documentation.  This
covers the testing plan to make sure that both online and offline fsck
can detect arbitrary problems and correct them without making things
worse.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:46 -07:00
Darrick J. Wong
88757e04c9 xfs: document the general theory underlying online fsck design
Start the second chapter of the online fsck design documentation.
This covers the general theory underlying how online fsck works.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:45 -07:00
Darrick J. Wong
a8f6c2e54d xfs: document the motivation for online fsck design
Start the first chapter of the online fsck design documentation.
This covers the motivations for creating this in the first place.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11 18:59:45 -07:00
Eric Van Hensbergen
4eb3117888
fs/9p: Rework cache modes and add new options to Documentation
Switch cache modes to a bit-mask and use legacy
cache names as shortcuts.  Update documentation to
include information on both shortcuts and bitmasks.

This patch also fixes missing guards related to fscache.

Update the documentation for new mount flags
and cache modes.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2023-04-09 21:41:21 +00:00
Matthew Wilcox (Oracle)
58ef47ef7d mm: hold the RCU read lock over calls to ->map_pages
Prevent filesystems from doing things which sleep in their map_pages
method.  This is in preparation for a pagefault path protected only by
RCU.

Link: https://lkml.kernel.org/r/20230327174515.1811532-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:43:00 -07:00
Tomas Mudrunka
bd23024b97 mm/memtest: add results of early memtest to /proc/meminfo
Currently the memtest results were only presented in dmesg.

When running a large fleet of devices without ECC RAM it's currently not
easy to do bulk monitoring for memory corruption.  You have to parse
dmesg, but that's a ring buffer so the error might disappear after some
time.  In general I do not consider dmesg to be a great API to query RAM
status.

In several companies I've seen such errors remain undetected and cause
issues for way too long.  So I think it makes sense to provide a
monitoring API, so that we can safely detect and act upon them.

This adds /proc/meminfo entry which can be easily used by scripts.

Link: https://lkml.kernel.org/r/20230321103430.7130-1-tomas.mudrunka@gmail.com
Signed-off-by: Tomas Mudrunka <tomas.mudrunka@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05 19:42:55 -07:00
Greg Kroah-Hartman
cd8fe5b6db Merge 6.3-rc5 into driver-core-next
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03 09:33:30 +02:00
Luis Chamberlain
2c6efe9cf2 shmem: add support to ignore swap
In doing experimentations with shmem having the option to avoid swap
becomes a useful mechanism.  One of the *raves* about brd over shmem is
you can avoid swap, but that's not really a good reason to use brd if we
can instead use shmem.  Using brd has its own good reasons to exist, but
just because "tmpfs" doesn't let you do that is not a great reason to
avoid it if we can easily add support for it.

I don't add support for reconfiguring incompatible options, but if we
really wanted to we can add support for that.

To avoid swap we use mapping_set_unevictable() upon inode creation, and
put a WARN_ON_ONCE() stop-gap on writepages() for reclaim.

Link: https://lkml.kernel.org/r/20230309230545.2930737-7-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:15 -07:00
Luis Chamberlain
d0f5a85442 shmem: update documentation
Update the docs to reflect a bit better why some folks prefer tmpfs over
ramfs and clarify a bit more about the difference between brd ramdisks.

While at it, add THP docs for tmpfs, both the mount options and the sysfs
file.

Link: https://lkml.kernel.org/r/20230309230545.2930737-6-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Adam Manzanares <a.manzanares@samsung.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-28 16:20:15 -07:00