Commit graph

74873 commits

Author SHA1 Message Date
Linus Torvalds
c2b19fd753 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHz0I4ACgkQnJ2qBz9k
 QNkvkAf/fD0swq13rFWiMDo+Azj9BwdjTeK7kEhGBCw9rL3BchH+OTMgPmcdplTJ
 uzXVyWZ98hsCBY4Aclu3Grcy1Cj9A3+JW9FqrWbwWy8t583DuaV31UL99AXb/4sH
 xYAmMm9V9gO5ckIAffHzHcEyN+dHMj89zl8vAtxGnlwR6gZLE1wjB3AmWBlctJpE
 bWQWCpuJsW+7o7ky5hIQ2hQWLxb5OejdYtLu6Su4tM86xfMwi3pB0tzilJ6jorAH
 eH5LBt5pVDI+gYQY1WYoZTy4eQKM0lhVW3WplhbNJwkJUupNS5xCXFfwB8bUppwc
 +hyGn5G3GJZ2Ya+xALO3GfS8fX9ESQ==
 =dQRa
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull udf and quota fixes from Jan Kara:
 "Fixes for crashes in UDF when inode expansion fails and one quota
  cleanup"

* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: cleanup double word in comment
  udf: Restore i_lenAlloc when inode expansion fails
  udf: Fix NULL ptr deref when converting from inline format
2022-01-28 17:19:49 +02:00
Dai Ngo
ab451ea952 nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
From RFC 7530 Section 16.34.5:

o  The server has not recorded an unconfirmed { v, x, c, *, * } and
   has recorded a confirmed { v, x, c, *, s }.  If the principals of
   the record and of SETCLIENTID_CONFIRM do not match, the server
   returns NFS4ERR_CLID_INUSE without removing any relevant leased
   client state, and without changing recorded callback and
   callback_ident values for client { x }.

The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.

Fixes: 2b63482185 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-28 09:04:00 -05:00
Usama Arif
f6133fbd37 io_uring: remove unused argument from io_rsrc_node_alloc
io_ring_ctx is not used in the function.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Link: https://lore.kernel.org/r/20220127140444.4016585-1-usama.arif@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-27 10:18:53 -07:00
J. Bruce Fields
d19a7af73b lockd: fix failure to cleanup client locks
In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS
case in posix_lock_inode, presumably just by random luck since we're not
actually initializing fl_flags here.

This probably didn't matter before commit 7f024fcd5c ("Keep read and
write fds with each nlm_file") since we wouldn't previously unlock
unless we knew there were locks.

But now it causes lockd to give up on removing more locks.

We could just initialize fl_flags, but really it seems dubious to be
calling vfs_lock_file with random values in some of the fields.

Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[ cel: fixed checkpatch.pl nit ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-27 10:45:25 -05:00
Jeff Layton
4584a768f2 ceph: set pool_ns in new inode layout for async creates
Dan reported that he was unable to write to files that had been
asynchronously created when the client's OSD caps are restricted to a
particular namespace.

The issue is that the layout for the new inode is only partially being
filled. Ensure that we populate the pool_ns_data and pool_ns_len in the
iinfo before calling ceph_fill_inode.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/54013
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-26 20:17:50 +01:00
Jeff Layton
932a9b5870 ceph: properly put ceph_string reference after async create attempt
The reference acquired by try_prep_async_create is currently leaked.
Ensure we put it.

Cc: stable@vger.kernel.org
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-26 20:17:50 +01:00
Xiubo Li
89d43d0551 ceph: put the requests/sessions when it fails to alloc memory
When failing to allocate the sessions memory we should make sure
the req1 and req2 and the sessions get put. And also in case the
max_sessions decreased so when kreallocate the new memory some
sessions maybe missed being put.

And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR,
which will lead to a distinct access fault.

URL: https://tracker.ceph.com/issues/53819
Fixes: e1a4541ec0 ("ceph: flush the mdlog before waiting on unsafe reqs")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-26 20:17:50 +01:00
Dave Chinner
ebb7fb1557 xfs, iomap: limit individual ioend chain lengths in writeback
Trond Myklebust reported soft lockups in XFS IO completion such as
this:

 watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
 CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
 Workqueue: xfs-conv/md127 xfs_end_io [xfs]
 RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
 Call Trace:
  wake_up_page_bit+0x8a/0x110
  iomap_finish_ioend+0xd7/0x1c0
  iomap_finish_ioends+0x7f/0xb0
  xfs_end_ioend+0x6b/0x100 [xfs]
  xfs_end_io+0xb9/0xe0 [xfs]
  process_one_work+0x1a7/0x360
  worker_thread+0x1fa/0x390
  kthread+0x116/0x130
  ret_from_fork+0x35/0x40

Ioends are processed as an atomic completion unit when all the
chained bios in the ioend have completed their IO. Logically
contiguous ioends can also be merged and completed as a single,
larger unit.  Both of these things can be problematic as both the
bio chains per ioend and the size of the merged ioends processed as
a single completion are both unbound.

If we have a large sequential dirty region in the page cache,
write_cache_pages() will keep feeding us sequential pages and we
will keep mapping them into ioends and bios until we get a dirty
page at a non-sequential file offset. These large sequential runs
can will result in bio and ioend chaining to optimise the io
patterns. The pages iunder writeback are pinned within these chains
until the submission chaining is broken, allowing the entire chain
to be completed. This can result in huge chains being processed
in IO completion context.

We get deep bio chaining if we have large contiguous physical
extents. We will keep adding pages to the current bio until it is
full, then we'll chain a new bio to keep adding pages for writeback.
Hence we can build bio chains that map millions of pages and tens of
gigabytes of RAM if the page cache contains big enough contiguous
dirty file regions. This long bio chain pins those pages until the
final bio in the chain completes and the ioend can iterate all the
chained bios and complete them.

OTOH, if we have a physically fragmented file, we end up submitting
one ioend per physical fragment that each have a small bio or bio
chain attached to them. We do not chain these at IO submission time,
but instead we chain them at completion time based on file
offset via iomap_ioend_try_merge(). Hence we can end up with unbound
ioend chains being built via completion merging.

XFS can then do COW remapping or unwritten extent conversion on that
merged chain, which involves walking an extent fragment at a time
and running a transaction to modify the physical extent information.
IOWs, we merge all the discontiguous ioends together into a
contiguous file range, only to then process them individually as
discontiguous extents.

This extent manipulation is computationally expensive and can run in
a tight loop, so merging logically contiguous but physically
discontigous ioends gains us nothing except for hiding the fact the
fact we broke the ioends up into individual physical extents at
submission and then need to loop over those individual physical
extents at completion.

Hence we need to have mechanisms to limit ioend sizes and
to break up completion processing of large merged ioend chains:

1. bio chains per ioend need to be bound in length. Pure overwrites
go straight to iomap_finish_ioend() in softirq context with the
exact bio chain attached to the ioend by submission. Hence the only
way to prevent long holdoffs here is to bound ioend submission
sizes because we can't reschedule in softirq context.

2. iomap_finish_ioends() has to handle unbound merged ioend chains
correctly. This relies on any one call to iomap_finish_ioend() being
bound in runtime so that cond_resched() can be issued regularly as
the long ioend chain is processed. i.e. this relies on mechanism #1
to limit individual ioend sizes to work correctly.

3. filesystems have to loop over the merged ioends to process
physical extent manipulations. This means they can loop internally,
and so we break merging at physical extent boundaries so the
filesystem can easily insert reschedule points between individual
extent manipulations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-and-tested-by: Trond Myklebust <trondmy@hammerspace.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-01-26 09:19:20 -08:00
Linus Torvalds
0280e3c58f NFS Client Updates for Linux 5.17
- New Features:
   - Basic handling for case insensitive filesystems
   - Initial support for fs_locations and server trunking
 
 - Bugfixes and Cleanups:
   - Cleanups to how the "struct cred *" is handled for the nfs_access_entry
   - Ensure the server has an up to date ctimes before hardlinking or renaming
   - Update 'blocks used' after writeback, fallocate, and clone
   - nfs_atomic_open() fixes
   - Improvements to sunrpc tracing
   - Various null check & indenting related cleanups
   - Some improvements to the sunrpc sysfs code
     - Use default_groups in kobj_type
     - Fix some potential races and reference leaks
   - A few tracepoint cleanups in xprtrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmHodsIACgkQ18tUv7Cl
 QOt3xQ//c9JPmMJZZoZtaD5UrHg28iyxaJpOUUpwC/jxQhLOETCf+nU1cELYgLq5
 4W06NBYEmjDJ/tihUvcGMKLvbCtQR9Zl9HepFKDTLTQpGmRFD4enwSmMNvW/AV+h
 I7PoN6J1DX/TZ5InOHH9asyoC2MjwrNHMn3bbQVT0qy+i3T76zJiBF79eWTnPR48
 kKPnF1I0p4LKGJy+y+y/z2mdCsz7tzFkhssxVhot0nafxXzbUOp1H9aiwxroRiUC
 ljbBA0TX8FWkGpGFt3y2QK2fMD7ovDpRhLFYiJClmeERXJVH5mXL9O5XfN5AL0xe
 W/QqT5lbWfeHLkpm2j87yTyaHASC7hGKsAyPD0zWLDcNZws61l1Sy4BHymSE5ZVh
 zt7sJjBnOWAtntyUGBg78G2vhBsd63GzrtcqAOlrngwA5ohJ8f32qvBQGyw4MQu9
 75CjRcO8K8mnf9BJ6I1vYPycjkUh9RSFfNdnUEAI9ZwiTEC/hfEvH/omvEtZsNol
 jBgv2SItTkdMZlEppEL4gxuaYT2wiZf2C6Gco215iPAqLC6dudoroN6yoLk/LRd0
 OWZLl5XTr3j6m5QDm22k5CG080vl6XiAxmAFaFSLza6Q34Jmuluc0gLAZZxvqXk9
 Ay7dQt9PQQk6mXD5Hreb0E5N9zcm2LkfvWpyGJ7mTV7sSHjA2DU=
 =wcVT
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:

   - Basic handling for case insensitive filesystems

   - Initial support for fs_locations and server trunking

  Bugfixes and Cleanups:

   - Cleanups to how the "struct cred *" is handled for the
     nfs_access_entry

   - Ensure the server has an up to date ctimes before hardlinking or
     renaming

   - Update 'blocks used' after writeback, fallocate, and clone

   - nfs_atomic_open() fixes

   - Improvements to sunrpc tracing

   - Various null check & indenting related cleanups

   - Some improvements to the sunrpc sysfs code:
      - Use default_groups in kobj_type
      - Fix some potential races and reference leaks

   - A few tracepoint cleanups in xprtrdma"

[ This should have gone in during the merge window, but didn't. The
  original pull request - sent during the merge window - had gotten
  marked as spam and discarded due missing DKIM headers in the email
  from Anna.   - Linus ]

* tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits)
  SUNRPC: Don't dereference xprt->snd_task if it's a cookie
  xprtrdma: Remove definitions of RPCDBG_FACILITY
  xprtrdma: Remove final dprintk call sites from xprtrdma
  sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()
  net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change
  NFSv4.1 test and add 4.1 trunking transport
  SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
  NFSv4 handle port presence in fs_location server string
  NFSv4 expose nfs_parse_server_name function
  NFSv4.1 query for fs_location attr on a new file system
  NFSv4 store server support for fs_location attribute
  NFSv4 remove zero number of fs_locations entries error check
  NFSv4: nfs_atomic_open() can race when looking up a non-regular file
  NFSv4: Handle case where the lookup of a directory fails
  NFSv42: Fallocate and clone should also request 'blocks used'
  NFSv4: Allow writebacks to request 'blocks used'
  SUNRPC: use default_groups in kobj_type
  NFS: use default_groups in kobj_type
  NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
  NFS: Add a helper to remove case-insensitive aliases
  ...
2022-01-25 20:16:03 +02:00
Linus Torvalds
49d766f3a0 for-5.17-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmHwDeMACgkQxWXV+ddt
 WDtdMQ//QFqkIB34zW5N3uX1xBFht/G/bCPNdGiK5YerjJZj1f6Rmsytbb6qlWHg
 NlB/XEPeQaQVrSfF37svnvATgySPaqePsufrT2XYu3x2w8muPTl460wmzdMt5h47
 rGB+ct4JdLBH4KJgqe2Bilrqg+FJmL3XT5k0aU3driy4Gb+bcDGeEyVmTWcnNRIg
 DzfUlNwTKhAhZDl8D3B9X2vV8TZDBtrRLquI94eYvooF3LYDL+kExLUW8WDmmAfy
 mjnANs8c+EtcVAzN7tW+O1UqdYYJ8Yo4ngk1nVVRdRvA2BDp9ixgWi/m/3jZ3JmJ
 jySV1zsZJB3ZGp/hIuEvtGY7jheDtbTnfgtI+vwjVdr208acs+XhfDckuOZBZIUY
 7Zk+Qif/narxFAoAvkgkH5QDNSSReKqaHgzohfnzSQqrfO0bh6fw1FnBOm4iXT7C
 cXvReD4m36g46SdTsxnvttpXizXIFe4JPOkpRkBzxIQFaMTA4Is43W0lYC24Ppxj
 A0UVevh3HPhOYzABynuy0EnknZeylb6P+WpGG6Ge+sVrVquQiwR01n4HeoaJO3qe
 re46uUGwO8Q30blYY50HBSJp0bpcciPZRVMJaspcAT9KD0fJ1s/csd2lQyP4ewn6
 A0zg6eabc0PD3LwdlHqp//jTNft/BL4RVZ2c3uM+mgXnGeekcoQ=
 =EysX
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Several fixes for defragmentation that got broken in 5.16 after
  refactoring and added subpage support. The observed bugs are excessive
  IO or uninterruptible ioctl.

  All stable material"

* tag 'for-5.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: update writeback index when starting defrag
  btrfs: add back missing dirty page rate limiting to defrag
  btrfs: fix deadlock when reserving space during defrag
  btrfs: defrag: properly update range->start for autodefrag
  btrfs: defrag: fix wrong number of defragged sectors
  btrfs: allow defrag to be interruptible
  btrfs: fix too long loop when defragging a 1 byte file
2022-01-25 18:29:10 +02:00
Filipe Manana
27cdfde181 btrfs: update writeback index when starting defrag
When starting a defrag, we should update the writeback index of the
inode's mapping in case it currently has a value beyond the start of the
range we are defragging. This can help performance and often result in
getting less extents after writeback - for e.g., if the current value
of the writeback index sits somewhere in the middle of a range that
gets dirty by the defrag, then after writeback we can get two smaller
extents instead of a single, larger extent.

We used to have this before the refactoring in 5.16, but it was removed
without any reason to do so. Originally it was added in kernel 3.1, by
commit 2a0f7f5769 ("Btrfs: fix recursive auto-defrag"), in order to
fix a loop with autodefrag resulting in dirtying and writing pages over
and over, but some testing on current code did not show that happening,
at least with the test described in that commit.

So add back the behaviour, as at the very least it is a nice to have
optimization.

Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-24 18:16:28 +01:00
Filipe Manana
3c9d31c715 btrfs: add back missing dirty page rate limiting to defrag
A defrag operation can dirty a lot of pages, specially if operating on
the entire file or a large file range. Any task dirtying pages should
periodically call balance_dirty_pages_ratelimited(), as stated in that
function's comments, otherwise they can leave too many dirty pages in
the system. This is what we did before the refactoring in 5.16, and
it should have remained, just like in the buffered write path and
relocation. So restore that behaviour.

Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-24 18:10:56 +01:00
Filipe Manana
0cb5950f3f btrfs: fix deadlock when reserving space during defrag
When defragging we can end up collecting a range for defrag that has
already pages under delalloc (dirty), as long as the respective extent
map for their range is not mapped to a hole, a prealloc extent or
the extent map is from an old generation.

Most of the time that is harmless from a functional perspective at
least, however it can result in a deadlock:

1) At defrag_collect_targets() we find an extent map that meets all
   requirements but there's delalloc for the range it covers, and we add
   its range to list of ranges to defrag;

2) The defrag_collect_targets() function is called at defrag_one_range(),
   after it locked a range that overlaps the range of the extent map;

3) At defrag_one_range(), while the range is still locked, we call
   defrag_one_locked_target() for the range associated to the extent
   map we collected at step 1);

4) Then finally at defrag_one_locked_target() we do a call to
   btrfs_delalloc_reserve_space(), which will reserve data and metadata
   space. If the space reservations can not be satisfied right away, the
   flusher might be kicked in and start flushing delalloc and wait for
   the respective ordered extents to complete. If this happens we will
   deadlock, because both flushing delalloc and finishing an ordered
   extent, requires locking the range in the inode's io tree, which was
   already locked at defrag_collect_targets().

So fix this by skipping extent maps for which there's already delalloc.

Fixes: eb793cf857 ("btrfs: defrag: introduce helper to collect target file extents")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-24 18:10:52 +01:00
Gao Xiang
7865827c43 erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
Just code rearrange. No logic changes.

Link: https://lore.kernel.org/r/20220121091412.86086-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-24 22:36:53 +08:00
Gao Xiang
e33f42b20b erofs: fix fsdax partition offset handling
After seeking time on testing today upstream fsdax, I found it
actually doesn't work well as below:

[  186.492983] ------------[ cut here ]------------
[  186.493629] WARNING: CPU: 1 PID: 205 at fs/iomap/iter.c:33 iomap_iter+0x2f6/0x310

The problem is that m_dax_part_off should be applied to physical
addresses and very sorry about that I didn't catch this eariler.

Anyway, let's fix it up now. Also, I need to find a way to set up
a standalone testcase to look after this later.

Link: https://lore.kernel.org/r/20220113051845.244461-1-hsiangkao@linux.alibaba.com
Fixes: de20511477 ("fsdax: shift partition offset handling into the file systems")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-01-24 22:36:27 +08:00
Jan Kara
ea8569194b udf: Restore i_lenAlloc when inode expansion fails
When we fail to expand inode from inline format to a normal format, we
restore inode to contain the original inline formatting but we forgot to
set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
causing further problems such as warnings and lost data down the line.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 7e49b6f248 ("udf: Convert UDF to new truncate calling sequence")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-24 14:45:02 +01:00
Jan Kara
7fc3b7c298 udf: Fix NULL ptr deref when converting from inline format
udf_expand_file_adinicb() calls directly ->writepage to write data
expanded into a page. This however misses to setup inode for writeback
properly and so we can crash on inode->i_wb dereference when submitting
page for IO like:

  BUG: kernel NULL pointer dereference, address: 0000000000000158
  #PF: supervisor read access in kernel mode
...
  <TASK>
  __folio_start_writeback+0x2ac/0x350
  __block_write_full_page+0x37d/0x490
  udf_expand_file_adinicb+0x255/0x400 [udf]
  udf_file_write_iter+0xbe/0x1b0 [udf]
  new_sync_write+0x125/0x1c0
  vfs_write+0x28e/0x400

Fix the problem by marking the page dirty and going through the standard
writeback path to write the page. Strictly speaking we would not even
have to write the page but we want to catch e.g. ENOSPC errors early.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 52ebea749a ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-24 14:45:02 +01:00
Amir Goldstein
29044dae2e fsnotify: fix fsnotify hooks in pseudo filesystems
Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

To fix the regression in pseudo filesystems, convert d_delete() calls
to d_drop() (see commit 46c46f8df9 ("devpts_pty_kill(): don't bother
with d_delete()") and move the fsnotify hook after d_drop().

Add a missing fsnotify_unlink() hook in nfsdfs that was found during
the audit of fsnotify hooks in pseudo filesystems.

Note that the fsnotify hooks in simple_recursive_removal() follow
d_invalidate(), so they require no change.

Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-24 14:17:02 +01:00
Amir Goldstein
a37d9a17f0 fsnotify: invalidate dcache before IN_DELETE event
Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.

Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

To fix the regression, create a new hook fsnotify_delete() that takes
the unlinked inode as an argument and use a helper d_delete_notify() to
pin the inode, so we can pass it to fsnotify_delete() after d_delete().

Backporting hint: this regression is from v5.3. Although patch will
apply with only trivial conflicts to v5.4 and v5.10, it won't build,
because fsnotify_delete() implementation is different in each of those
versions (see fsnotify_link()).

A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
filesystem that do not need to call d_delete().

Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-24 14:16:46 +01:00
Namjae Jeon
9ca8581e79 ksmbd: fix SMB 3.11 posix extension mount failure
cifs client set 4 to DataLength of create_posix context, which mean
Mode variable of create_posix context is only available. So buffer
validation of ksmbd should check only the size of Mode except for
the size of Reserved variable.

Fixes: 8f77150c15 ("ksmbd: add buffer validation for SMB2_CREATE_CONTEXT")
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Steve French <smfrench@gmail.com>
Tested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-23 18:19:04 -06:00
Dylan Yudaken
b36a205004 io_uring: fix bug in slow unregistering of nodes
In some cases io_rsrc_ref_quiesce will call io_rsrc_node_switch_start,
and then immediately flush the delayed work queue &ctx->rsrc_put_work.

However the percpu_ref_put does not immediately destroy the node, it
will be called asynchronously via RCU. That ends up with
io_rsrc_node_ref_zero only being called after rsrc_put_work has been
flushed, and so the process ends up sleeping for 1 second unnecessarily.

This patch executes the put code immediately if we are busy
quiescing.

Fixes: 4a38aed2a0 ("io_uring: batch reap of dead file registrations")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220121123856.3557884-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-23 09:12:53 -07:00
Linus Torvalds
3689f9f8b0 bitmap patches for 5.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
 b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
 A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
 iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
 m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
 9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
 MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
 nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
 CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
 5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
 =RKW4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux

Pull bitmap updates from Yury Norov:

 - introduce for_each_set_bitrange()

 - use find_first_*_bit() instead of find_next_*_bit() where possible

 - unify for_each_bit() macros

* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
  vsprintf: rework bitmap_list_string
  lib: bitmap: add performance test for bitmap_print_to_pagebuf
  bitmap: unify find_bit operations
  mm/percpu: micro-optimize pcpu_is_populated()
  Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
  find: micro-optimize for_each_{set,clear}_bit()
  include/linux: move for_each_bit() macros from bitops.h to find.h
  cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
  tools: sync tools/bitmap with mother linux
  all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
  cpumask: use find_first_and_bit()
  lib: add find_first_and_bit()
  arch: remove GENERIC_FIND_FIRST_BIT entirely
  include: move find.h from asm_generic to linux
  bitops: move find_bit_*_le functions from le.h to find.h
  bitops: protect find_first_{,zero}_bit properly
2022-01-23 06:20:44 +02:00
Linus Torvalds
1c52283265 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is the post-linux-next queue. Material which was based on or
  dependent upon material which was in -next.

  69 patches.

  Subsystems affected by this patch series: mm (migration and zsmalloc),
  sysctl, proc, and lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (69 commits)
  mm: hide the FRONTSWAP Kconfig symbol
  frontswap: remove support for multiple ops
  mm: mark swap_lock and swap_active_head static
  frontswap: simplify frontswap_register_ops
  frontswap: remove frontswap_test
  mm: simplify try_to_unuse
  frontswap: remove the frontswap exports
  frontswap: simplify frontswap_init
  frontswap: remove frontswap_curr_pages
  frontswap: remove frontswap_shrink
  frontswap: remove frontswap_tmem_exclusive_gets
  frontswap: remove frontswap_writethrough
  mm: remove cleancache
  lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()
  lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()
  proc: remove PDE_DATA() completely
  fs: proc: store PDE()->data into inode->i_private
  zsmalloc: replace get_cpu_var with local_lock
  zsmalloc: replace per zpage lock with pool->migrate_lock
  locking/rwlocks: introduce write_lock_nested
  ...
2022-01-22 11:28:23 +02:00
Linus Torvalds
8205ae327e 18 cifs/smb3 fixes, half are related to multichannel and reconnect
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHrkScACgkQiiy9cAdy
 T1EY6Av8C2by0N3vBK/hhjq+ocKhoLWzlziyOlVX6Hzrqa2BH5I9UBUrGmI6rXRI
 SP9QiyjO8WlVGajOYMAgQpjysKJjRnJTkGGvCb8iV4VfQPq8yYfuoLRaihO20WI4
 VaRAwKAZ6RjNrOtbYQ6KPx2OXDE0OUZ6Y8QuDfgn6jXzsMfjTCmM4BGntUNweiUU
 Yn7sBBIh0XWy1D3Fl3Dm47t6d1iW1/I9WIym/2vHWqxI8QkWv05z9Dk5Sijrl8v/
 pa9eQ05Ym8bRTI7dU9be8uySVTO21sdPitNZ4L4vNqH4KY0L3pKqUL4E9QzVcv4O
 wnO8Q7wtBIOEgREuRF041d14vJsRiMUKjxUVG5PubhCE7pSesHHhgh0DN4ymDmKi
 cBYkpYIcPLopfOvhE4CEPjULE3DI91KNM8x8ePB+fm/ZbnPHrXyhOcUCz3gzKYGf
 jsYHVVAofWEU1HRPQjTPtwxuof/3u1ZgkzIj+gqzbVOWIQO1JAsdBE2TQQKTadL9
 Y7aKEfci
 =/mdT
 -----END PGP SIGNATURE-----

Merge tag '5.17-rc-part2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - multichannel fixes, addressing additional reconnect and DFS scenarios

 - reenabling fscache support (indexing rewrite, metadata caching e.g.)

 - send additional version information during NTLMSSP negotiate to
   improve debugging

 - fix for a mount race

 - DFS fixes

 - fix for a memory leak for stable

* tag '5.17-rc-part2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  smb3: send NTLMSSP version information
  cifs: Support fscache indexing rewrite
  cifs: cifs_ses_mark_for_reconnect should also update reconnect bits
  cifs: update tcpStatus during negotiate and sess setup
  cifs: make status checks in version independent callers
  cifs: remove repeated state change in dfs tree connect
  cifs: fix the cifs_reconnect path for DFS
  cifs: remove unused variable ses_selected
  cifs: protect all accesses to chan_* with chan_lock
  cifs: fix the connection state transitions with multichannel
  cifs: check reconnects for channels of active tcons too
  smb3: add new defines from protocol specification
  cifs: serialize all mount attempts
  cifs: quirk for STATUS_OBJECT_NAME_INVALID returned for non-ASCII dfs refs
  cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty
  cifs: clean up an inconsistent indenting
  cifs: free ntlmsspblob allocated in negotiate
2022-01-22 11:12:26 +02:00
Linus Torvalds
1cb69c8044 New code for 5.17:
- Minor cleanup of ioctl32 cruft
 - Clean up open coded inodegc workqueue function calls
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHpnxUACgkQ+H93GTRK
 tOtpig/9GzzlPSpbjGMU3BW+dfqvYMLV9YcB3bkZLXPJIb/D2UPXzOQd2SvM9ScJ
 XwM857EjrmV7XifEezbBvla33N9ToNv8/IVdmctcc2as4c6VzDXmkiyze7wD+2D/
 i9UJEOjEtaTvqioOMQJpfaATI7y/9N45UFfcjkbmc4lTP0xOvd1Owx9+ujrCYZ9e
 UccsC7ovUzHxxuK1uFhRMI0up1f5urdFA0wFKvEGsfhIzBthU6BDtYUjuzhC/vLV
 DxqPlQgkRqQfzB9loFQOiNNSC2kVF+SKPXZS7k/HiQIALDOKtFNnSanXG5oPB5+d
 DnuSCvJ+RfejmEoDtMydQu0KXN5fq9g7FodH3tAMPPNn8N1l+qJuC//+6VHZjvIB
 GhfHyWkpg/mzSAAqU0sqeGSvzG1byawYNUzueIIcEWLP1qg4O3QkXW7VJ+yuGb70
 gsO7CQjnW20CmJqeftNsBaBor114UjUJLBQsXR8uowKHq/L1MhNh7Ir6lTfQdti+
 S099EVjrrFV0ZVRou1ZyQ5ZWsdGNtz0p0hEEVEPG2ERHOKfbVWLwbL38s8JY7C56
 hBVwQRwZScdIb/SYTWxPXbBJNR/BXUyxrCikk66isQT0kwFCHXE67YaXbFklxybs
 uzKzVYEBPtYhvDeFTa53GEotGw/LDtZD07E58yowlJUqmJtMLM0=
 =EIqW
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "One of the patches removes some dead code from xfs_ioctl32.h and the
  other fixes broken workqueue flushing in the inode garbage collector.

   - Minor cleanup of ioctl32 cruft

   - Clean up open coded inodegc workqueue function calls"

* tag 'xfs-5.17-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: flush inodegc workqueue tasks before cancel
  xfs: remove unused xfs_ioctl32.h declarations
2022-01-22 11:04:27 +02:00
Linus Torvalds
7fd350f6ff fscache fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmHrJ4UACgkQ+7dXa6fL
 C2taFg/+K0nKY3TDKc7JVxe9BEUa/uUvNba5j16qKWJKnjCrjqU0RWJSUnVOxcwH
 eBwxjaoZTmtGqS3snvKMYNIGLf7FAEeRY3a1JO81sdoCA2X7P/4HtfDO+DdaCFpB
 tYr9+Hfjy+2baMLEimTWp2SYsfVKD/J4gHeqdRF12QMQJ01m+X9fcV7TqpkyPo5C
 L35Gi1RVDXzwn81qOV9yrmkOBhPK4hiREmCmD0BGEGT/+FYlYVBOv92n31CQMl1Q
 ef8eMNiDYgXQ0iWoUYIJqMBs88M7DwAmeaA/qi85gzfGM0Z5pBE3SNy02F/81b8b
 PHgmzDfj42AuAQ9ZOuwREwtvQrpGsCsw5+60klzZ6PiFtO+q/PtS/ODTdi4foDDd
 ura6RM3wqlE9P4Vi7gj9NYhdUlp5p0YYPlJOGr6CepYS63O3DgwXyoVMPUE5s5fF
 Boc3Ef7fKy25Wc1s/ZVSbvvgh9KYDJOKqxs4ELeNSW8GD7uA2d0XWkhjzZIsVdKZ
 kd7VEYr2/j6gpPR63yw2MVOfymJN1WYuu6ittuL40hSYW8hbBg/m8hVE1E7b7NlP
 +u10xvCwN9LBK+757I733wV3hB4rYa6pg9KBjHDXyCLQ1uCKVE0QaILTSjqC/mCM
 OTPFmi50y1oa/9BxooEg5+ShXOH95EE4jTBCaL1lv7JYfYNUbYA=
 =CgMV
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull more fscache updates from David Howells:
 "A set of fixes and minor updates for the fscache rewrite:

   - Fix mishandling of volume collisions (the wait condition is
     inverted and so it was only waiting if the volume collision was
     already resolved).

   - Fix miscalculation of whether there's space available in
     cachefiles.

   - Make sure a default cache name is set on a cache if the user hasn't
     set one by the time they bind the cache.

   - Adjust the way the backing inode is presented in tracepoints, add a
     tracepoint for mkdir and trace directory lookup.

   - Add a tracepoint for failure to set the active file mark.

   - Add an explanation of the checks made on the backing filesystem.

   - Check that the backing filesystem supports tmpfile.

   - Document how the page-release cancellation of the read-skip
     optimisation works.

  And I've included a change for netfslib:

   - Make ops->init_rreq() optional"

* tag 'fscache-fixes-20220121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  netfs: Make ops->init_rreq() optional
  fscache: Add a comment explaining how page-release optimisation works
  cachefiles: Check that the backing filesystem supports tmpfiles
  cachefiles: Explain checks in a comment
  cachefiles: Trace active-mark failure
  cachefiles: Make some tracepoint adjustments
  cachefiles: set default tag name if it's unspecified
  cachefiles: Calculate the blockshift in terms of bytes, not pages
  fscache: Fix the volume collision wait condition
2022-01-22 10:59:32 +02:00
Christoph Hellwig
0a4ee51818 mm: remove cleancache
Patch series "remove Xen tmem leftovers".

Since the removal of the Xen tmem driver in 2019, the cleancache hooks
are entirely unused, as are large parts of frontswap.  This series
against linux-next (with the folio changes included) removes
cleancaches, and cuts down frontswap to the bits actually used by zswap.

This patch (of 13):

The cleancache subsystem is unused since the removal of Xen tmem driver
in commit 814bbf49dc ("xen: remove tmem driver").

[akpm@linux-foundation.org: remove now-unreachable code]

Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de
Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:38 +02:00
Muchun Song
359745d783 proc: remove PDE_DATA() completely
Remove PDE_DATA() completely and replace it with pde_data().

[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]

Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00
Muchun Song
6dfbbae14a fs: proc: store PDE()->data into inode->i_private
PDE_DATA(inode) is introduced to get user private data and hide the
layout of struct proc_dir_entry.  The inode->i_private is used to do the
same thing as well.  Save a copy of user private data to inode->
i_private when proc inode is allocated.  This means the user also can
get their private data by inode->i_private.

Introduce pde_data() to wrap inode->i_private so that we can remove
PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of
pde_data().  It will be easier if we decide to remove PDE_DATE() in the
future.

Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00
Xiaoming Ni
f0bc21b268 fs/coredump: move coredump sysctls into its own file
This moves the fs/coredump.c respective sysctls to its own file.

Link: https://lkml.kernel.org/r/20211129211943.640266-6-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
d8c0418aac kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases()
Rename sysctl_init() to sysctl_init_bases() so to reflect exactly what
this is doing.

Link: https://lkml.kernel.org/r/20211129211943.640266-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
ab171b952c fs: move namespace sysctls and declare fs base directory
This moves the namespace sysctls to its own file as part of the
kernel/sysctl.c spring cleaning

Since we have now removed all sysctls for "fs", we now have to declare
it on the filesystem code, we do that using the new helper, which
reduces boiler plate code.

We rename init_fs_shared_sysctls() to init_fs_sysctls() to reflect that
now fs/sysctls.c is taking on the burden of being the first to register
the base directory as well.

Lastly, since init code will load in the order in which we link it we
have to move the sysctl code to be linked in early, so that its early
init routine runs prior to other fs code.  This way, other filesystem
code can register their own sysctls using the helpers after this:

  * register_sysctl_init()
  * register_sysctl()

Link: https://lkml.kernel.org/r/20211129211943.640266-3-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
51cb8dfc5a sysctl: add and use base directory declarer and registration helper
Patch series "sysctl: add and use base directory declarer and
registration helper".

In this patch series we start addressing base directories, and so we
start with the "fs" sysctls.  The end goal is we end up completely
moving all "fs" sysctl knobs out from kernel/sysctl.

This patch (of 6):

Add a set of helpers which can be used to declare and register base
directory sysctls on their own.  We do this so we can later move each of
the base sysctl directories like "fs", "kernel", etc, to their own
respective files instead of shoving the declarations and registrations
all on kernel/sysctl.c.  The lazy approach has caught up and with this,
we just end up extending the list of base directories / sysctls on one
file and this makes maintenance difficult due to merge conflicts from
many developers.

The declarations are used first by kernel/sysctl.c for registration its
own base which over time we'll try to clean up.  It will be used in the
next patch to demonstrate how to cleanly deal with base sysctl
directories.

[mcgrof@kernel.org: null-terminate the ctl_table arrays]
  Link: https://lkml.kernel.org/r/YafJY3rXDYnjK/gs@bombadil.infradead.org

Link: https://lkml.kernel.org/r/20211129211943.640266-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211129211943.640266-2-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
1998f19324 fs: move pipe sysctls to is own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the pipe sysctls to its own file.

Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
66ad398634 fs: move fs/exec.c sysctls into its own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the fs/exec.c respective sysctls to its own file.

Since checkpatch complains about style issues with the old code, this
move also fixes a few of those minor style issues:

  * Use pr_warn() instead of prink(WARNING
  * New empty lines are wanted at the beginning of routines

Link: https://lkml.kernel.org/r/20211129205548.605569-9-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
9c011be132 fs: move namei sysctls to its own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move namei's own sysctl knobs to its own file.

Other than the move we also avoid initializing two static variables to 0
as this is not needed:

  * sysctl_protected_symlinks
  * sysctl_protected_hardlinks

Link: https://lkml.kernel.org/r/20211129205548.605569-8-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
dd81faa883 fs: move locking sysctls where they are used
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

The locking fs sysctls are only used on fs/locks.c, so move them there.

Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
d1d8ac9edf fs: move shared sysctls to fs/sysctls.c
To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move sysctls which are shared between filesystems into a common file
outside of kernel/sysctl.c.

Link: https://lkml.kernel.org/r/20211129205548.605569-6-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
54771613e8 sysctl: move maxolduid as a sysctl specific const
The maxolduid value is only shared for sysctl purposes for use on a max
range.  Just stuff this into our shared const array.

[akpm@linux-foundation.org: fix sysctl_vals[], per Mickaël]

Link: https://lkml.kernel.org/r/20211129205548.605569-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
c8c0c239d5 fs: move dcache sysctls to its own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the dcache sysctl clutter out of kernel/sysctl.c.  This is a
small one-off entry, perhaps later we can simplify this representation,
but for now we use the helpers we have.  We won't know how we can
simplify this further untl we're fully done with the cleanup.

[arnd@arndb.de: avoid unused-function warning]
  Link: https://lkml.kernel.org/r/20211203190123.874239-2-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211129205548.605569-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
204d5a24e1 fs: move fs stat sysctls to file_table.c
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

We can create the sysctl dynamically on early init for fs stat to help
with this clutter.  This dusts off the fs stat syctls knobs and puts
them into where they are declared.

Link: https://lkml.kernel.org/r/20211129205548.605569-3-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:36 +02:00
Luis Chamberlain
1d67fe5850 fs: move inode sysctls to its own file
Patch series "sysctl: 4th set of kernel/sysctl cleanups".

This is slimming down the fs uses of kernel/sysctl.c to the point that
the next step is to just get rid of the fs base directory for it and
move that elsehwere, so that next patch series starts dealing with that
to demo how we can end up cleaning up a full base directory from
kernel/sysctl.c, one at a time.

This patch (of 9):

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the inode sysctls to its own file.  Since we are no longer using
this outside of fs/ remove the extern declaration of its respective proc
helper.

We use early_initcall() as it is the earliest we can use.

[arnd@arndb.de: avoid unused-variable warning]
  Link: https://lkml.kernel.org/r/20211203190123.874239-1-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211129205548.605569-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211129205548.605569-2-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Luis Chamberlain
b1f2aff888 sysctl: share unsigned long const values
Provide a way to share unsigned long values.  This will allow others to
not have to re-invent these values.

Link: https://lkml.kernel.org/r/20211124231435.1445213-9-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Luis Chamberlain
3ba442d533 fs: move binfmt_misc sysctl to its own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

This moves the binfmt_misc sysctl to its own file to help remove clutter
from kernel/sysctl.c.

Link: https://lkml.kernel.org/r/20211124231435.1445213-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Luis Chamberlain
ee9efac48a sysctl: add helper to register a sysctl mount point
The way to create a subdirectory on top of sysctl_mount_point is a bit
obscure, and *why* we do that even so more.  Provide a helper which
makes it clear why we do this.

[akpm@linux-foundation.org: export register_sysctl_mount_point() to
modules]

Link: https://lkml.kernel.org/r/20211124231435.1445213-4-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Xiaoming Ni
a8f5de894f eventpoll: simplify sysctl declaration with register_sysctl()
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the epoll_table sysctl to fs/eventpoll.c and use
register_sysctl().

Link: https://lkml.kernel.org/r/20211123202422.819032-9-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Xiaoming Ni
7b9ad122b5 inotify: simplify subdirectory registration with register_sysctl()
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under.  Simplify this by using
register_sysctl() and specifying the directory path directly.

Move inotify_user sysctl to inotify_user.c while at it to remove clutter
from kernel/sysctl.c.

[mcgrof@kernel.org: remember to register fanotify_table]
  Link: https://lkml.kernel.org/r/YZ5A6iWLb0h3N3RC@bombadil.infradead.org
[mcgrof@kernel.org: update commit log to reflect new path we decided to take]

Link: https://lkml.kernel.org/r/20211123202422.819032-7-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Luis Chamberlain
c42ff46f97 ocfs2: simplify subdirectory registration with register_sysctl()
There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under.  Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
	{
		.procname = E1,
		.maxlen = 0,
		.mode = 0555,
		.child = sysctls,
	},
	{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
	{
		.procname = E2,
		.maxlen = 0,
		.mode = 0555,
		.child = subdir,
	},
	{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
-	{
-		.procname = E1,
-		.maxlen = 0,
-		.mode = 0555,
-		.child = sysctls,
-	},
-	{ }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
-	{
-		.procname = E2,
-		.maxlen = 0,
-		.mode = 0555,
-		.child = subdir,
-	},
-	{ }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
  return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) };
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Link: https://lkml.kernel.org/r/20211123202422.819032-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:35 +02:00
Xiaoming Ni
49a4de7571 dnotify: move dnotify sysctl to dnotify.c
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move dnotify sysctls to dnotify.c and use the new
register_sysctl_init() to register the sysctl interface.

[mcgrof@kernel.org: adjust the commit log to justify the move]

Link: https://lkml.kernel.org/r/20211123202347.818157-10-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:34 +02:00
Xiaoming Ni
86b12b6c5d aio: move aio sysctl to aio.c
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

Move aio sysctl to aio.c and use the new register_sysctl_init() to
register the sysctl interface for aio.

[mcgrof@kernel.org: adjust commit log to justify the move]

Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:34 +02:00
Xiaoming Ni
78e36f3b0d sysctl: move some boundary constants from sysctl.c to sysctl_vals
sysctl has helpers which let us specify boundary values for a min or max
int value.  Since these are used for a boundary check only they don't
change, so move these variables to sysctl_vals to avoid adding duplicate
variables.  This will help with our cleanup of kernel/sysctl.c.

[akpm@linux-foundation.org: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"]
[mcgrof@kernel.org: major rebase]

Link: https://lkml.kernel.org/r/20211123202347.818157-3-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:34 +02:00
Xiaoming Ni
3ddd9a808c sysctl: add a new register_sysctl_init() interface
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.

Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink.  People keeps
stuffing their sysctls in that file and this creates a maintenance
burden.  So this effort is aimed at placing sysctls where they actually
belong.

I'm going to split patches up into series as there is quite a bit of
work.

This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.

The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.

I'll only post the first two patch sets for now.  We can address the
rest once the first two patch sets get completely reviewed / Acked.

This patch (of 9):

The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong.  The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls.  In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.

We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable.  So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered.  It would be counter productive to stop
boot if a simple sysctl registration failed.

Provide a helper for init then, and document the recommended init levels
to use for callers of this routine.  We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.

[mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c                  ]

Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org
Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:34 +02:00
Jeffle Xu
cef0223191 netfs: Make ops->init_rreq() optional
Make the ops->init_rreq() callback optional.  This isn't required for the
erofs changes I'm implementing to do on-demand read through fscache[1].
Further, ceph has an empty init_rreq method that can then be removed and
it's marked optional in the documentation.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/20211227125444.21187-1-jefflexu@linux.alibaba.com/ [1]
Link: https://lore.kernel.org/r/20211228124419.103020-1-jefflexu@linux.alibaba.com
Link: https://lore.kernel.org/r/164251410387.3435901.2504600788262093313.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
6633213139 cachefiles: Check that the backing filesystem supports tmpfiles
Add a check that the backing filesystem supports the creation of
tmpfiles[1].

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/568749bd7cc02908ecf6f3d6a611b6f9cf5c4afd.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/164251406558.3435901.1249023136670058162.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
14b9d0902d cachefiles: Explain checks in a comment
Add a comment to explain the checks that cachefiles is making of the
backing filesystem[1].

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/568749bd7cc02908ecf6f3d6a611b6f9cf5c4afd.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/164251405621.3435901.771439791811515914.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
b64a331498 cachefiles: Trace active-mark failure
Add a tracepoint to log failure to apply an active mark to a file in
addition to tracing successfully setting and unsetting the mark.

Also include the backing file inode number in the message logged to dmesg.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251404666.3435901.17331742792401482190.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
8c39b8bc82 cachefiles: Make some tracepoint adjustments
Make some adjustments to tracepoints to make the tracing a bit more
followable:

 (1) Standardise on displaying the backing inode number as "B=<hex>" with
     no leading zeros.

 (2) Make the cachefiles_lookup tracepoint log the directory inode number
     as well as the looked-up inode number.

 (3) Add a cachefiles_lookup tracepoint into cachefiles_get_directory() to
     log directory lookup.

 (4) Add a new cachefiles_mkdir tracepoint and use that to log a successful
     mkdir from cachefiles_get_directory().

 (5) Make the cachefiles_unlink and cachefiles_rename tracepoints log the
     inode number of the affected file/dir rather than dentry struct
     pointers.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251403694.3435901.9797725381831316715.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
Jeffle Xu
c7ca731557 cachefiles: set default tag name if it's unspecified
fscache_acquire_cache() requires a non-empty name, while 'tag <name>'
command is optional for cachefilesd.

Thus set default tag name if it's unspecified to avoid the regression of
cachefilesd. The logic is the same with that before rewritten.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251399914.3435901.4761991152407411408.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
5638b067d3 cachefiles: Calculate the blockshift in terms of bytes, not pages
Cachefiles keeps track of how much space is available on the backing
filesystem and refuses new writes permission to start if there isn't enough
(we especially don't want ENOSPC happening).  It also tracks the amount of
data pending in DIO writes (cache->b_writing) and reduces the amount of
free space available by this amount before deciding if it can set up a new
write.

However, the old fscache I/O API was very much page-granularity dependent
and, as such, cachefiles's cache->bshift was meant to be a multiplier to
get from PAGE_SIZE to block size (ie. a blocksize of 512 would give a shift
of 3 for a 4KiB page) - and this was incorrectly being used to turn the
number of bytes in a DIO write into a number of blocks, leading to a
massive over estimation of the amount of data in flight.

Fix this by changing cache->bshift to be a multiplier from bytes to
blocksize and deal with quantities of blocks, not quantities of pages.

Fix also the rounding in the calculation in cachefiles_write() which needs
a "- 1" inserting.

Fixes: 047487c947 ("cachefiles: Implement the I/O routines")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251398954.3435901.7138806620218474123.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
David Howells
80a00ab834 fscache: Fix the volume collision wait condition
The condition that the waits in fscache_wait_on_volume_collision() are
waiting until are inverted.  This suddenly started happening on the
upstream kernel with something like the following appearing in dmesg when
running xfstests:

	CacheFiles: cachefiles: Inode already in use: Iafs,example.com,100055

Fix them by inverting the conditions.

Fixes: 62ab633523 ("fscache: Implement volume registration")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164251398010.3435901.943876048104930939.stgit@warthog.procyon.org.uk/ # v1
2022-01-21 21:36:28 +00:00
Linus Torvalds
f3a78227ee io_uring-5.17-2022-01-21
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHqtf8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoIbD/9pfiURokRzb/UncQoS+zX0mMQxToYWluMN
 xyyAHLRqAyylzdZykJFnxfGKMN4z/cfBKJugISEd+O+88wjNLUALUBzD56C6vbVq
 h+gPL+zBRNu/Bx5bfEYEI+zhUr7AE1gtlg+x/hzZy1x/Q2b2Fyuely+FaHY/ChUy
 +HUnLVP74Gj2H7XY6a1yEuEQRdRRBn0NjWVU0v8g8zNLjO+XUKJFE5se1EAjIcBy
 IzcMvbDbnfGnHQRp7saq3TaY8+OaADSch0QYS1pDyCRUcqMTBnONTgyOkwxNQA+z
 58KHM7n9QDp3QzRPE4TbvjHuX1J9fbw9wekIQIm6eAQfGRa6IvoMbUaI1btAsFLU
 zsIRUWNPv2w7tzG4OMY438bISnUhnGKnNG4ZnWCx7oa+GF8SgOJ7D8EOGzqimh8n
 YBFfoLTkKfcVcrb+G/cfa5qWK35qv1nGTVk1A9V37eOi4YRBOFnbfmzMAb0CC5bw
 8zpFyRS45VJLjnS+FUdvpqZeSuP0sZlKx6KH0hUa2Nd/7xSmsVkNuexCrttTWuLX
 sfHcXj3rMG7YBXRtxHnMH7rDsdph4h5CPPqqMXHD8ho4YYxvFNktxu8IJgJRV6+J
 cDH937ADsYNM8MBAgPZ3i9Kg8mVttnpHLMRhCeVpPKzZtr2tLoUh0mXLHb87docu
 L12jvdCS0A==
 =bFY0
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.17-2022-01-21' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fix the io_uring POLLFREE handling, similarly to how it was done for
   aio (Pavel)

 - Remove (now) unused function (Jiapeng)

 - Small series fixing an issue with work cancelations. A window exists
   where work isn't locatable in the pending list, and isn't active in a
   worker yet either. (me)

* tag 'io_uring-5.17-2022-01-21' of git://git.kernel.dk/linux-block:
  io-wq: delete dead lock shuffling code
  io_uring: perform poll removal even if async work removal is successful
  io-wq: add intermediate work step between pending list and active work
  io-wq: perform both unstarted and started work cancelations in one go
  io-wq: invoke work cancelation with wqe->lock held
  io-wq: make io_worker lock a raw spinlock
  io-wq: remove useless 'work' argument to __io_worker_busy()
  io_uring: fix UAF due to missing POLLFREE handling
  io_uring: Remove unused function req_ref_put
2022-01-21 16:07:21 +02:00
Linus Torvalds
31d949782e Withdraw the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl definitions.
Remove the header definitions for these ioctls.  The just-removed
 implementation has allowed callers to read stale disk contents for more
 than **21 years** and nobody noticed or complained, which implies a lack
 of users aside from exploit programs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlp28ACgkQ+H93GTRK
 tOt77Q/+KP7XtwaloGbkBfMzk9JhVh4wISagQ8BxZ5Jb8cHg9ekWJz585he9siNY
 mfcuaLh5onWiBRsfqo4RXg03X4E3c/U+Q3oqe405O3TZkK2LOZMjkPE1ijDdTej6
 V9WhCTCRGuJA01sKgFuuwFJxJRjIROE29FPgRQP9EvLtEITIxLbHMFZRYKpYh143
 EhNwzQQwwPg/L4/m1qmWfC3L+Z7qTm6EQhOnpyzxlKxyX7qXIjBHi6WvErcDeJ3F
 pS7v1fwcZxctrh7PrKwhXSrkbMmd5J3p5qI/MrCtGEKWNXk+rv6AC0n4gcXvpJ/v
 wL0OTyik9pwA8V0XPQcuWvQXmrm8vR2XvMok6gXkHB1jCfzYAwJsrHQbop4pyCFe
 U3HU46x0g7UFXY7jUjztD8YNIT1+B+ducetCCGAhI97HiQrSsqSvgvPFZNle7Cef
 Oheab4iIs1zUblNrVzyGCQmK42ankypxPbfrrtvLi7SFrLRAGXeWeqDf1RXJnt5b
 xrOqCe1hgXR4RJrkTPWiiQindLlhDuywfa+Q1Y5fYZatsTtgceE/HIOg80x4pPgR
 4Ip7hW9lsjoDckpu0bC0bvYiqhrYM1eztpUToYdy7FeOkQKkPHO9xm/m1tHbqzmi
 bF3hkBo6bLByXiY/ZXzrQGrErJ6OTdNVpsR1vYjoaycrQt6wznI=
 =hq3i
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs irix ioctl housecleaning from Darrick Wong:
 "Withdraw the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl definitions.

  This is the third and final of a series of small pull requests that
  perform some long overdue housecleaning of XFS ioctls. This time,
  we're withdrawing all variants of the ALLOCSP and FREESP ioctls from
  XFS' userspace API. This might be a little premature since we've only
  just removed the functionality, but as I pointed out in the last pull
  request, nobody (including fstests) noticed that it was broken for 20
  years.

  In response to the patch, we received a single comment from someone
  who stated that they 'augment' the ioctl for their own purposes, but
  otherwise acquiesced to the withdrawal. I still want to try to clobber
  these old ioctl definitions in 5.17.

  So remove the header definitions for these ioctls. The just-removed
  implementation has allowed callers to read stale disk contents for
  more than **21 years** and nobody noticed or complained, which implies
  a lack of users aside from exploit programs"

* tag 'xfs-5.17-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove the XFS_IOC_{ALLOC,FREE}SP* definitions
2022-01-21 08:51:48 +02:00
Linus Torvalds
d701a8ccac Remove the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl families.
Linux has always used fallocate as the space management system call,
 whereas these Irix legacy ioctls only ever worked on XFS, and have been
 the cause of recent stale data disclosure vulnerabilities.  As
 equivalent functionality is available elsewhere, remove the code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlplQACgkQ+H93GTRK
 tOtAcRAAg11WggF9ycNLwnczUs4NmTV1cwhz8+eTuwr2yul3gl/mrO3MyjMmkrnm
 1rXjwg28GKtps04Ugh+8TTL+QkDn6Uteco27OZbmUf00a0MoC7JG4VkEQVtXjcaK
 zvevfutTH7Vnl49m+YBrLtonrTqmND46quKoPKsv0a5nlbXHSNMouUkayWXDSyOl
 8tRcNWLy76L+XCxEU21cD1NBw3Vr0mCiId4xTcbNFw3TUVAGoZgghzC2d/gHFiwN
 1PM7G51TKUNm3dybH0mt/jLF/fLsVxFnznnlW4bb/XzMuU4geqd0r1AQuIdbwZa9
 uB+PkFWwN5frTEFELYTamAa4LlAe2oQ0hmSGLfC/zEtPcOv4h6qHNgRsN9wfG+H9
 oYUeRY+2zHcD7jYJsaZZt5WCIDVncOlJMclRdpbpujkJzJX9ZjAi++PTgDxdMjFa
 egwDAvOdgijgtz8erN0gglJrqJzQQp6ByNtht5rZjHz7LkrWYtt57TOoS986pW7X
 /MwBLjT/4Xig/XaFVrmMohF3VPrG/eH/DpTnHotzQzZRYQWbKZwCgin6+kKC8cV8
 Y+eE1jKZunL4Ms/GmrxencNzsDSJtkKyR5LkHCqgH8YUPJM3vYDcleZY+UgEKq0a
 z0fw3MZvxM2jsUIk7+J8uQ8esSqUb5hNXkUJsUraUtG3Z6ZeaOg=
 =2QZ3
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs irix ioctl housecleaning from Darrick Wong:
 "Remove the XFS_IOC_ALLOCSP* and XFS_IOC_FREESP* ioctl families.

  This is the second of a series of small pull requests that perform
  some long overdue housecleaning of XFS ioctls. This time, we're
  vacating the implementation of all variants of the ALLOCSP and FREESP
  ioctls, which are holdovers from EFS in Irix, circa 1993. Roughly
  equivalent functionality have been available for both ioctls since
  2.6.25 (April 2008):

   - XFS_IOC_FREESP ftruncates a file.

   - XFS_IOC_ALLOCSP is the equivalent of fallocate.

  As noted in the fix patch for CVE 2021-4155, the ALLOCSP ioctl has
  been serving up stale disk blocks since 2000, and in 21 years
  **nobody** noticed. On those grounds I think it's safe to vacate the
  implementation.

  Note that we lose the ability to preallocate and truncate relative to
  the current file position, but as nobody's ever implemented that for
  the VFS, I conclude that it's not in high demand.

  Linux has always used fallocate as the space management system call,
  whereas these Irix legacy ioctls only ever worked on XFS, and have
  been the cause of recent stale data disclosure vulnerabilities. As
  equivalent functionality is available elsewhere, remove the code"

* tag 'xfs-5.17-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls
2022-01-21 08:47:25 +02:00
Linus Torvalds
12a8fb20f1 Withdraw the ioctl definition for the FSSETDM ioctl.
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHlpbEACgkQ+H93GTRK
 tOs8MBAAjOyUswNqVMYgMB6eww7oT8j54jGdTmo+d+N2hIoX4OAV6GdCI6te/7DE
 yz6TtzbQ9fs9lDYsdRsMjfwr18R8N1m0D23cpFSJ6qq6e2NJi0mqVwmC8mPRNAJH
 68yL0p6O4Tk5jj7kae+zsv50nsSANXclfoHjsZQ0DGOuGagmPkNVqT82KIC4n3dn
 aw66xhuaiLLJE+4boLe4NexRVBbyOuHQ2uB7xUnKwc9tHvjAf8EFCIhUV1wp0qZf
 CfA2wg8+Jzwrqz/gVRKUZOjz7LeIY6E2qCBrA+DATv2dcv7QhmvGHaQ9OkrvIE72
 CbvI92IhvOcKFzpfMrRYGhOh7KE6SkxLGqsAXgnjPoFQkCDudgCaExBO96RMMd6u
 cX43mXWZbUl++Sh2GhPD/xkiskLRZFjiHJbKBX/5nwjU2BzTHQY/7Jy07fIkR4c4
 IrkKgiXfSJT4j/KeAMkBpZ7THMjRMSUgwliSWHL0QWUz5Bou8WRnHUl8CMsu9vDJ
 fYeekXDQYuAX+UrcsDlbA0UukigOLSIZiQTAEgSbIkd/+Zb6U6e0IF7pTTZJ9uFs
 bndLFYqZtEAySDrMCBM+W8VYmR48EDxfN8xsdS1kbZIqEdNhmkEMj9tMf0rs+FRi
 lo1vMi08O7VcuyiyNrKs0e1d1Gkd2jwmwIskSQweslP5BbfPzRE=
 =d+fT
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs ioctl housecleaning from Darrick Wong:
 "This is the first of a series of small pull requests that perform some
  long overdue housecleaning of XFS ioctls. This first pull request
  removes the FSSETDM ioctl, which was used to set DMAPI event
  attributes on XFS files. The DMAPI support has never been merged
  upstream and the implementation of FSSETDM itself was removed two
  years ago, so let's withdraw it completely.

   - Withdraw the ioctl definition for the FSSETDM ioctl"

* tag 'xfs-5.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove the XFS_IOC_FSSETDM definitions
2022-01-21 08:44:07 +02:00
Christoph Hellwig
5298d4bfe8 unicode: clean up the Kconfig symbol confusion
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d047870 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2022-01-20 19:57:24 -05:00
Yang Li
217663f101 fanotify: remove variable set but not used
The code that uses the pointer info has been removed in 7326e382c2
("fanotify: report old and/or new parent+name in FAN_RENAME event").
and fanotify_event_info() doesn't change 'event', so the declaration and
assignment of info can be removed.

Eliminate the following clang warning:
fs/notify/fanotify/fanotify_user.c:161:24: warning: variable ‘info’ set
but not used

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-20 13:57:22 +01:00
Linus Torvalds
64f29d8856 The highlight is the new mount "device" string syntax implemented
by Venky Shankar.  It solves some long-standing issues with using
 different auth entities and/or mounting different CephFS filesystems
 from the same cluster, remounting and also misleading /proc/mounts
 contents.  The existing syntax of course remains to be maintained.
 
 On top of that, there is a couple of fixes for edge cases in quota
 and a new mount option for turning on unbuffered I/O mode globally
 instead of on a per-file basis with ioctl(CEPH_IOC_SYNCIO).
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmHpP5ATHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi0TgB/480i2lPHgA3ujJNqo5Q6z+W0vtTA2+
 Wx+4rAUgIESJVunbFxvecPbzyUXTe7wWFI11TCVHPpf6GyIIDTD+uHd3kKWtLsfL
 Zkk1/2PN9Q5Dh29R+N8rP9NaP8tIaTQjyiO3iqmRZlo+k0Z/lYtWUb+fUP05XlVY
 ML/ktW543tkKeYwl3SWdW5MqAAOVGDbTt+L51CraDhVoiUac5ptkP+cmDmIqsnGa
 ZHVqpwugxgndEIyuBHDLBps+5/LrEaL10xDhGcMtP9hwGYhyNr6Yj+azfGtHWwOi
 jdVsdHDiecUBVtGyZ351Y4pCMOmP0uJif6MOUZFXYYSSeUBUhH8UjgEi
 =jcte
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The highlight is the new mount "device" string syntax implemented by
  Venky Shankar. It solves some long-standing issues with using
  different auth entities and/or mounting different CephFS filesystems
  from the same cluster, remounting and also misleading /proc/mounts
  contents. The existing syntax of course remains to be maintained.

  On top of that, there is a couple of fixes for edge cases in quota and
  a new mount option for turning on unbuffered I/O mode globally instead
  of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)"

* tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client:
  ceph: move CEPH_SUPER_MAGIC definition to magic.h
  ceph: remove redundant Lsx caps check
  ceph: add new "nopagecache" option
  ceph: don't check for quotas on MDS stray dirs
  ceph: drop send metrics debug message
  rbd: make const pointer spaces a static const array
  ceph: Fix incorrect statfs report for small quota
  ceph: mount syntax module parameter
  doc: document new CephFS mount device syntax
  ceph: record updated mon_addr on remount
  ceph: new device mount syntax
  libceph: rename parse_fsid() to ceph_parse_fsid() and export
  libceph: generalize addr/ip parsing based on delimiter
2022-01-20 13:46:20 +02:00
Linus Torvalds
67ed868d23 23 ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHo9UgACgkQiiy9cAdy
 T1FG4Av9Gx2v8bmcjZgnbOLcDmhPZPIR2DaBOZ+79RfSy0FgeFo76Mp9YuiAmqH3
 8DqinvWax0xzGiOSMQxODWK1c71TT7oRm5XqloxtLp6Upy61q7IB6f6gXKy566v1
 /KB2XkUZQgfXr9inCOflzQDBVhl9UqPWmWV++VDpZkv4HAGOppcu7hloXyraG4cc
 C4DzljiWGwWs875qcC8M533z3gEwpUB8o7M9hseLbEH8earUHzlLIp++HCuKfHJP
 W7U13nPgZXQ4I1SxDdzt3ZXcwZKRKvbG7MdK4KnftkeKRfobz92IxWDxj7VrM/iN
 5mxLExYjaZGSuYkquKiQy2RtaXudv3U51stxo2jWLvoRc9c8ogT5luwIQ31apRm+
 PRiJew2XdeWnV9IYMJOpRTcLE7eVscU3oP2kW8kL8uoe1CwbabOnz5jpQ6ioA08t
 8GgVvDWY7o2/bquDmiIsnaWnVnOWw4ms4vQTjecJgltpAHDQf9RwrSVZYAu+J+JY
 sWW5IGCT
 =WqhR
 -----END PGP SIGNATURE-----

Merge tag '5.17-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server fixes from Steve French:

 - authentication fix

 - RDMA (smbdirect) fixes (including fix for a memory corruption, and
   some performance improvements)

 - multiple improvements for multichannel

 - misc fixes, including crediting (flow control) improvements

 - cleanup fixes, including some kernel doc fixes

* tag '5.17-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: (23 commits)
  ksmbd: fix guest connection failure with nautilus
  ksmbd: uninitialized variable in create_socket()
  ksmbd: smbd: fix missing client's memory region invalidation
  ksmbd: add smb-direct shutdown
  ksmbd: smbd: change the default maximum read/write, receive size
  ksmbd: smbd: create MR pool
  ksmbd: add reserved room in ipc request/response
  ksmbd: smbd: call rdma_accept() under CM handler
  ksmbd: limits exceeding the maximum allowable outstanding requests
  ksmbd: move credit charge deduction under processing request
  ksmbd: add support for smb2 max credit parameter
  ksmbd: set 445 port to smbdirect port by default
  ksmbd: register ksmbd ib client with ib_register_client()
  ksmbd: Fix smb2_get_name() kernel-doc comment
  ksmbd: Delete an invalid argument description in smb2_populate_readdir_entry()
  ksmbd: Fix smb2_set_info_file() kernel-doc comment
  ksmbd: Fix buffer_check_err() kernel-doc comment
  ksmbd: fix multi session connection failure
  ksmbd: set both ipv4 and ipv6 in FSCTL_QUERY_NETWORK_INTERFACE_INFO
  ksmbd: set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFO
  ...
2022-01-20 13:39:13 +02:00
Linus Torvalds
f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Nathan Chancellor
e900909599 btrfs: use generic Kconfig option for 256kB page size limit
Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe
the dependency introduced by commit b05fbcc36b ("btrfs: disable build
on platforms having page size 256K").

Link: https://lkml.kernel.org/r/20211129230141.228085-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
Minghao Chi
25d2e88632 fs/adfs: remove unneeded variable make code cleaner
Return value directly instead of taking this in a variable.

Link: https://lkml.kernel.org/r/20211210023211.424609-1-chi.minghao@zte.com.cn
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cm>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:55 +02:00
NeilBrown
9bb56d5925 FAT: use io_schedule_timeout() instead of congestion_wait()
congestion_wait() in this context is just a sleep - block devices do not
support congestion signalling any more.

The goal for this wait, which was introduced in commit ae78bf9c4f
("[PATCH] add -o flush for fat") is to wait for any recently written
data to get to storage.  We currently have no direct mechanism to do
this, so a simple wait that behaves identically to the current
congestion_wait() is the best we can do.

This is a step towards removing congestion_wait()

Link: https://lkml.kernel.org/r/163936544519.22433.13400436295732112065@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Kees Cook
e35fa567a0 hfsplus: use struct_group_attr() for memcpy() region
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add struct_group() to mark the "info" region (containing struct DInfo
and struct DXInfo structs) in struct hfsplus_cat_folder and struct
hfsplus_cat_file that are written into directly, so the compiler can
correctly reason about the expected size of the writes.

"pahole" shows no size nor member offset changes to struct
hfsplus_cat_folder nor struct hfsplus_cat_file.  "objdump -d" shows no
object code changes.

Link: https://lkml.kernel.org/r/20211119192851.1046717-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Colin Ian King
e1ce8a97be nilfs2: remove redundant pointer sbufs
Pointer sbufs is being assigned a value but it's not being used later
on.  The pointer is redundant and can be removed.  Cleans up scan-build
static analysis warning:

  fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs'
    is used in the enclosing expression, the value is never actually read
    from 'sbufs' [deadcode.DeadStores]
        sbh = sbufs = page_buffers(src);

Link: https://lkml.kernel.org/r/20211211180955.550380-1-colin.i.king@gmail.com
Link: https://lkml.kernel.org/r/1640712476-15136-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
H.J. Lu
9630f0d60f fs/binfmt_elf: use PT_LOAD p_align values for static PIE
Extend commit ce81bb256a ("fs/binfmt_elf: use PT_LOAD p_align values
for suitable start address") which fixed PIE binaries built with
-Wl,-z,max-page-size=0x200000, to cover static PIE binaries.  This
fixes:

    https://bugzilla.kernel.org/show_bug.cgi?id=215275

Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading.

Link: https://lkml.kernel.org/r/20211209174052.370537-1-hjl.tools@gmail.com
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:54 +02:00
Yafang Shao
d6986ce24f kthread: dynamically allocate memory to store kthread's full name
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN.  For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed.  But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,

    jbd2/nvme0n1p2-
    xfs-reclaim/sdf

Currently there are so many truncated kthreads:

    rcu_tasks_kthre
    rcu_tasks_rude_
    rcu_tasks_trace
    poll_mpt3sas0_s
    ext4-rsv-conver
    xfs-reclaim/sd{a, b, c, ...}
    xfs-blockgc/sd{a, b, c, ...}
    xfs-inodegc/sd{a, b, c, ...}
    audit_send_repl
    ecryptfs-kthrea
    vfio-irqfd-clea
    jbd2/nvme0n1p2-
    ...

We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads.  Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.

One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows.  Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path.  Finally
we make a dicision to use the second approach.  See also the discussions
in this thread:
https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/

After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:

    rcu_tasks_kthread
    rcu_tasks_rude_kthread
    rcu_tasks_trace_kthread
    poll_mpt3sas0_statu
    ext4-rsv-conversion
    xfs-reclaim/sdf1
    xfs-blockgc/sdf1
    xfs-inodegc/sdf1
    audit_send_reply
    ecryptfs-kthread
    vfio-irqfd-cleanup
    jbd2/nvme0n1p2-8

Link: https://lkml.kernel.org/r/20211120112850.46047-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:53 +02:00
Yafang Shao
95af469c4f fs/binfmt_elf: replace open-coded string copy with get_task_comm
It is better to use get_task_comm() instead of the open coded string
copy as we do in other places.

struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore.  Below is the verification of vmcore,

  crash> ps
     PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
        0      0   0  ffffffff9d21a940  RU   0.0       0      0  [swapper/0]
  >     0      0   1  ffffa09e40f85e80  RU   0.0       0      0  [swapper/1]
  >     0      0   2  ffffa09e40f81f80  RU   0.0       0      0  [swapper/2]
  >     0      0   3  ffffa09e40f83f00  RU   0.0       0      0  [swapper/3]
  >     0      0   4  ffffa09e40f80000  RU   0.0       0      0  [swapper/4]
  >     0      0   5  ffffa09e40f89f80  RU   0.0       0      0  [swapper/5]
        0      0   6  ffffa09e40f8bf00  RU   0.0       0      0  [swapper/6]
  >     0      0   7  ffffa09e40f88000  RU   0.0       0      0  [swapper/7]
  >     0      0   8  ffffa09e40f8de80  RU   0.0       0      0  [swapper/8]
  >     0      0   9  ffffa09e40f95e80  RU   0.0       0      0  [swapper/9]
  >     0      0  10  ffffa09e40f91f80  RU   0.0       0      0  [swapper/10]
  >     0      0  11  ffffa09e40f93f00  RU   0.0       0      0  [swapper/11]
  >     0      0  12  ffffa09e40f90000  RU   0.0       0      0  [swapper/12]
  >     0      0  13  ffffa09e40f9bf00  RU   0.0       0      0  [swapper/13]
  >     0      0  14  ffffa09e40f98000  RU   0.0       0      0  [swapper/14]
  >     0      0  15  ffffa09e40f9de80  RU   0.0       0      0  [swapper/15]

It works well as expected.

Some comments are added to explain why we use the hard-coded 16.

Link: https://lkml.kernel.org/r/20211120112738.45980-5-laoar.shao@gmail.com
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:53 +02:00
Yafang Shao
503471ac36 fs/exec: replace strncpy with strscpy_pad in __get_task_comm
If the dest buffer size is smaller than sizeof(tsk->comm), the buffer
will be without null ternimator, that may cause problem.  Using
strscpy_pad() instead of strncpy() in __get_task_comm() can make the
string always nul ternimated and zero padded.

Link: https://lkml.kernel.org/r/20211120112738.45980-3-laoar.shao@gmail.com
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:53 +02:00
Yafang Shao
06c5088aee fs/exec: replace strlcpy with strscpy_pad in __set_task_comm
Patch series "task comm cleanups", v2.

This patchset is part of the patchset "extend task comm from 16 to
24"[1].  Now we have different opinion that dynamically allocates memory
to store kthread's long name into a separate pointer, so I decide to
take the useful cleanups apart from the original patchset and send it
separately[2].

These useful cleanups can make the usage around task comm less
error-prone.  Furthermore, it will be useful if we want to extend task
comm in the future.

[1]. https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/
[2]. https://lore.kernel.org/lkml/CALOAHbAx55AUo3bm8ZepZSZnw7A08cvKPdPyNTf=E_tPqmw5hw@mail.gmail.com/

This patch (of 7):

strlcpy() can trigger out-of-bound reads on the source string[1], we'd
better use strscpy() instead.  To make it be robust against full tsk->comm
copies that got noticed in other places, we should make sure it's zero
padded.

[1] https://github.com/KSPP/linux/issues/89

Link: https://lkml.kernel.org/r/20211120112738.45980-1-laoar.shao@gmail.com
Link: https://lkml.kernel.org/r/20211120112738.45980-2-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:53 +02:00
luo penghao
7080cead5d sysctl: remove redundant ret assignment
Subsequent if judgments will assign new values to ret, so the statement
here should be deleted

The clang_analyzer complains as follows:

  fs/proc/proc_sysctl.c:
  Value stored to 'ret' is never read

Link: https://lkml.kernel.org/r/20211230063622.586360-1-luo.penghao@zte.com.cn
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Geert Uytterhoeven
153ee1c41a sysctl: fix duplicate path separator in printed entries
sysctl_print_dir() always terminates the printed path name with a slash,
so printing a slash before the file part causes a duplicate like in

    sysctl duplicate entry: /kernel//perf_user_access

Fix this by dropping the extra slash.

Link: https://lkml.kernel.org/r/e3054d605dc56f83971e4b6d2f5fa63a978720ad.1641551872.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Qi Zheng
51a1873440 proc: convert the return type of proc_fd_access_allowed() to be boolean
Convert return type of proc_fd_access_allowed() and the 'allowed' in it
to be boolean since the return type of ptrace_may_access() is boolean.

Link: https://lkml.kernel.org/r/20211219024404.29779-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
David Hildenbrand
25bc5b0de9 proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration
In commit cc5f2704c9 ("proc/vmcore: convert oldmem_pfn_is_ram callback
to more generic vmcore callbacks"), we added detection of surprise
vmcore_cb unregistration after the vmcore was already opened.  Once
detected, we warn the user and simulate reading zeroes from that point
on when accessing the vmcore.

The basic reason was that unexpected unregistration, for example, by
manually unbinding a driver from a device after opening the vmcore, is
not supported and could result in reading oldmem the vmcore_cb would
have actually prohibited while registered.  However, something like that
can similarly be trigger by a user that's really looking for trouble
simply by unbinding the relevant driver before opening the vmcore -- or
by disallowing loading the driver in the first place.  So it's actually
of limited help.

Currently, unregistration can only be triggered via virtio-mem when
manually unbinding the driver from the device inside the VM; there is no
way to trigger it from the hypervisor, as hypervisors don't allow for
unplugging virtio-mem devices -- ripping out system RAM from a VM
without coordination with the guest is usually not a good idea.

The important part is that unbinding the driver and unregistering the
vmcore_cb while concurrently reading the vmcore won't crash the system,
and that is handled by the rwsem.

To make the mechanism more future proof, let's remove the "read zero"
part, but leave the warning in place.  For example, we could have a
future driver (like virtio-balloon) that will contact the hypervisor to
figure out if we already populated a page for a given PFN.
Hotunplugging such a device and consequently unregistering the vmcore_cb
could be triggered from the hypervisor without harming the system even
while kdump is running.  In that case, we don't want to silently end up
with a vmcore that contains wrong data, because the user inside the VM
might be unaware of the hypervisor action and might easily miss the
warning in the log.

Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Steve French
51620150ca cifs: update internal module number
To 2.35

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 23:14:34 -06:00
Steve French
52d005337b smb3: send NTLMSSP version information
For improved debugging it can be helpful to send version information
as other clients do during NTLMSSP negotiation. See protocol document
MS-NLMP section 2.2.1.1

Set the major and minor versions based on the kernel version, and the
BuildNumber based on the internal cifs.ko module version number,
and following the recommendation in the protocol documentation
(MS-NLMP section 2.2.10) we set the NTLMRevisionCurrent field to 15.

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 23:14:15 -06:00
Brian Foster
6191cf3ad5 xfs: flush inodegc workqueue tasks before cancel
The xfs_inodegc_stop() helper performs a high level flush of pending
work on the percpu queues and then runs a cancel_work_sync() on each
of the percpu work tasks to ensure all work has completed before
returning.  While cancel_work_sync() waits for wq tasks to complete,
it does not guarantee work tasks have started. This means that the
_stop() helper can queue and instantly cancel a wq task without
having completed the associated work. This can be observed by
tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>"
test:

	xfs_destroy_inode: ... ino 0x83 ...
	xfs_inode_set_need_inactive: ... ino 0x83 ...
	xfs_inodegc_stop: ...
	...
	xfs_inodegc_start: ...
	xfs_inodegc_worker: ...
	xfs_inode_inactivating: ... ino 0x83 ...

The first few lines show that the inode is removed and need inactive
state set, but the inactivation work has not completed before the
inodegc mechanism stops. The inactivation doesn't actually occur
until the fs is unfrozen and the gc mechanism starts back up. Note
that this test requires fsfreeze to reproduce because xfs_freeze
indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush().

When this occurs, the workqueue try_to_grab_pending() logic first
tries to steal the pending bit, which does not succeed because the
bit has been set by queue_work_on(). Subsequently, it checks for
association of a pool workqueue from the work item under the pool
lock. This association is set at the point a work item is queued and
cleared when dequeued for processing. If the association exists, the
work item is removed from the queue and cancel_work_sync() returns
true. If the pwq association is cleared, the remove attempt assumes
the task is busy and retries (eventually returning false to the
caller after waiting for the work task to complete).

To avoid this race, we can flush each work item explicitly before
cancel. However, since the _queue_all() already schedules each
underlying work item, the workqueue level helpers are sufficient to
achieve the same ordering effect. E.g., the inodegc enabled flag
prevents scheduling any further work in the _stop() case. Use the
drain_workqueue() helper in this particular case to make the intent
a bit more self explanatory.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-01-19 14:58:26 -08:00
Jens Axboe
73031f761c io-wq: delete dead lock shuffling code
We used to have more code around the work loop, but now the goto and
lock juggling just makes it less readable than it should. Get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-19 13:11:58 -07:00
Qu Wenruo
c080b4144b btrfs: defrag: properly update range->start for autodefrag
[BUG]
After commit 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to
implement btrfs_defrag_file()") autodefrag no longer properly re-defrag
the file from previously finished location.

[CAUSE]
The recent refactoring of defrag only focuses on defrag ioctl subpage
support, doesn't take autodefrag into consideration.

There are two problems involved which prevents autodefrag to restart its
scan:

- No range.start update
  Previously when one defrag target is found, range->start will be
  updated to indicate where next search should start from.

  But now btrfs_defrag_file() doesn't update it anymore, making all
  autodefrag to rescan from file offset 0.

  This would also make autodefrag to mark the same range dirty again and
  again, causing extra IO.

- No proper quick exit for defrag_one_cluster()
  Currently if we reached or exceed @max_sectors limit, we just exit
  defrag_one_cluster(), and let next defrag_one_cluster() call to do a
  quick exit.
  This makes @cur increase, thus no way to properly know which range is
  defragged and which range is skipped.

[FIX]
The fix involves two modifications:

- Update range->start to next cluster start
  This is a little different from the old behavior.
  Previously range->start is updated to the next defrag target.

  But in the end, the behavior should still be pretty much the same,
  as now we skip to next defrag target inside btrfs_defrag_file().

  Thus if auto-defrag determines to re-scan, then we still do the skip,
  just at a different timing.

- Make defrag_one_cluster() to return >0 to indicate a quick exit
  So that btrfs_defrag_file() can also do a quick exit, without
  increasing @cur to the range end, and re-use @cur to update
  @range->start.

- Add comment for btrfs_defrag_file() to mention the range->start update
  Currently only autodefrag utilize this behavior, as defrag ioctl won't
  set @max_to_defrag parameter, thus unless interrupted it will always
  try to defrag the whole range.

Reported-by: Filipe Manana <fdmanana@suse.com>
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-19 18:25:56 +01:00
Qu Wenruo
484167da77 btrfs: defrag: fix wrong number of defragged sectors
[BUG]
There are users using autodefrag mount option reporting obvious increase
in IO:

> If I compare the write average (in total, I don't have it per process)
> when taking idle periods on the same machine:
>     Linux 5.16:
>         without autodefrag: ~ 10KiB/s
>         with autodefrag: between 1 and 2MiB/s.
>
>     Linux 5.15:
>         with autodefrag:~ 10KiB/s (around the same as without
> autodefrag on 5.16)

[CAUSE]
When autodefrag mount option is enabled, btrfs_defrag_file() will be
called with @max_sectors = BTRFS_DEFRAG_BATCH (1024) to limit how many
sectors we can defrag in one try.

And then use the number of sectors defragged to determine if we need to
re-defrag.

But commit b18c3ab234 ("btrfs: defrag: introduce helper to defrag one
cluster") uses wrong unit to increase @sectors_defragged, which should
be in unit of sector, not byte.

This means, if we have defragged any sector, then @sectors_defragged
will be >= sectorsize (normally 4096), which is larger than
BTRFS_DEFRAG_BATCH.

This makes the @max_sectors check in defrag_one_cluster() to underflow,
rendering the whole @max_sectors check useless.

Thus causing way more IO for autodefrag mount options, as now there is
no limit on how many sectors can really be defragged.

[FIX]
Fix the problems by:

- Use sector as unit when increasing @sectors_defragged

- Include @sectors_defragged > @max_sectors case to break the loop

- Add extra comment on the return value of btrfs_defrag_file()

Reported-by: Anthony Ruhier <aruhier@mailbox.org>
Fixes: b18c3ab234 ("btrfs: defrag: introduce helper to defrag one cluster")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-19 18:25:54 +01:00
David Howells
70431bfd82 cifs: Support fscache indexing rewrite
Change the cifs filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in cifs.

The following changes have been made:

 (1) The fscache_netfs struct is no more, and there's no need to register
     the filesystem as a whole.

 (2) The session cookie is now an fscache_volume cookie, allocated with
     fscache_acquire_volume().  That takes three parameters: a string
     representing the "volume" in the index, a string naming the cache to
     use (or NULL) and a u64 that conveys coherency metadata for the
     volume.

     For cifs, I've made it render the volume name string as:

	"cifs,<ipaddress>,<sharename>"

     where the sharename has '/' characters replaced with ';'.

     This probably needs rethinking a bit as the total name could exceed
     the maximum filename component length.

     Further, the coherency data is currently just set to 0.  It needs
     something else doing with it - I wonder if it would suffice simply to
     sum the resource_id, vol_create_time and vol_serial_number or maybe
     hash them.

 (3) The fscache_cookie_def is no more and needed information is passed
     directly to fscache_acquire_cookie().  The cache no longer calls back
     into the filesystem, but rather metadata changes are indicated at
     other times.

     fscache_acquire_cookie() is passed the same keying and coherency
     information as before.

 (4) The functions to set/reset cookies are removed and
     fscache_use_cookie() and fscache_unuse_cookie() are used instead.

     fscache_use_cookie() is passed a flag to indicate if the cookie is
     opened for writing.  fscache_unuse_cookie() is passed updates for the
     metadata if we changed it (ie. if the file was opened for writing).

     These are called when the file is opened or closed.

 (5) cifs_setattr_*() are made to call fscache_resize() to change the size
     of the cache object.

 (6) The functions to read and write data are stubbed out pending a
     conversion to use netfslib.

Changes
=======
ver #8:
 - Abstract cache invalidation into a helper function.
 - Fix some checkpatch warnings[3].

ver #7:
 - Removed the accidentally added-back call to get the super cookie in
   cifs_root_iget().
 - Fixed the right call to cifs_fscache_get_super_cookie() to take account
   of the "-o fsc" mount flag.

ver #6:
 - Moved the change of gfpflags_allow_blocking() to current_is_kswapd() for
   cifs here.
 - Fixed one of the error paths in cifs_atomic_open() to jump around the
   call to use the cookie.
 - Fixed an additional successful return in the middle of cifs_open() to
   use the cookie on the way out.
 - Only get a volume cookie (and thus inode cookies) when "-o fsc" is
   supplied to mount.

ver #5:
 - Fixed a couple of bits of cookie handling[2]:
   - The cookie should be released in cifs_evict_inode(), not
     cifsFileInfo_put_final().  The cookie needs to persist beyond file
     closure so that writepages will be able to write to it.
   - fscache_use_cookie() needs to be called in cifs_atomic_open() as it is
     for cifs_open().

ver #4:
 - Fixed the use of sizeof with memset.
 - tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().

ver #3:
 - Canonicalise the cifs coherency data to make the cache portable.
 - Set volume coherency data.

ver #2:
 - Use gfpflags_allow_blocking() rather than using flag directly.
 - Upgraded to -rc4 to allow for upstream changes[1].
 - fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1]
Link: https://lore.kernel.org/r/3419813.1641592362@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/CAH2r5muTanw9pJqzAHd01d9A8keeChkzGsCEH6=0rHutVLAF-A@mail.gmail.com/ [3]
Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/3462849.1641593783@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/1318953.1642024578@warthog.procyon.org.uk/ # v6
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:21:08 -06:00
Filipe Manana
b767c2fc78 btrfs: allow defrag to be interruptible
During defrag, at btrfs_defrag_file(), we have this loop that iterates
over a file range in steps no larger than 256K subranges. If the range
is too long, there's no way to interrupt it. So make the loop check in
each iteration if there's signal pending, and if there is, break and
return -AGAIN to userspace.

Before kernel 5.16, we used to allow defrag to be cancelled through a
signal, but that was lost with commit 7b508037d4 ("btrfs: defrag:
use defrag_one_cluster() to implement btrfs_defrag_file()").

This change adds back the possibility to cancel a defrag with a signal
and keeps the same semantics, returning -EAGAIN to user space (and not
the usually more expected -EINTR).

This is also motivated by a recent bug on 5.16 where defragging a 1 byte
file resulted in iterating from file range 0 to (u64)-1, as hitting the
bug triggered a too long loop, basically requiring one to reboot the
machine, as it was not possible to cancel defrag.

Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-19 18:16:38 +01:00
Filipe Manana
6b34cd8e17 btrfs: fix too long loop when defragging a 1 byte file
When attempting to defrag a file with a single byte, we can end up in a
too long loop, which is nearly infinite because at btrfs_defrag_file()
we end up with the variable last_byte assigned with a value of
18446744073709551615 (which is (u64)-1). The problem comes from the fact
we end up doing:

    last_byte = round_up(last_byte, fs_info->sectorsize) - 1;

So if last_byte was assigned 0, which is i_size - 1, we underflow and
end up with the value 18446744073709551615.

This is trivial to reproduce and the following script triggers it:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  echo -n "X" > $MNT/foobar

  btrfs filesystem defragment $MNT/foobar

  umount $MNT

So fix this by not decrementing last_byte by 1 before doing the sector
size round up. Also, to make it easier to follow, make the round up right
after computing last_byte.

Reported-by: Anthony Ruhier <aruhier@mailbox.org>
Fixes: 7b508037d4 ("btrfs: defrag: use defrag_one_cluster() to implement btrfs_defrag_file()")
Link: https://lore.kernel.org/linux-btrfs/0a269612-e43f-da22-c5bc-b34b1b56ebe8@mailbox.org/
CC: stable@vger.kernel.org # 5.16
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-19 18:16:34 +01:00
Shyam Prasad N
ba978e8325 cifs: cifs_ses_mark_for_reconnect should also update reconnect bits
Recent restructuring of cifs_reconnect introduced a helper func
named cifs_ses_mark_for_reconnect, which updates the state of tcp
session for all the channels of a session for reconnect.

However, this does not update the session state and chans_need_reconnect
bitmask. This change fixes that.

Also, cifs_mark_tcp_sess_for_reconnect should mark set the bitmask
for all channels when the whole session is marked for reconnect.
Fixed that here too.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:58 -06:00
Shyam Prasad N
47de760655 cifs: update tcpStatus during negotiate and sess setup
Till the end of SMB session setup, update tcpStatus and
avoid updating session status field. There was a typo in
cifs_setup_session, which caused ses->status to be updated
instead. This was causing issues during reconnect.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:55 -06:00
Shyam Prasad N
c1604da708 cifs: make status checks in version independent callers
The status of tcp session, smb session and tcon have the
same flow, irrespective of the SMB version used. Hence
these status checks and updates should happen in the
version independent callers of these commands.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:55 -06:00
Shyam Prasad N
ece0767641 cifs: remove repeated state change in dfs tree connect
cifs_tree_connect checks and sets the tidStatus for the tcon.
cifs_tree_connect also calls a dfs specific tree connect
function, which also does similar checks. This should
not happen. Removing it with this change.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:55 -06:00
Shyam Prasad N
e154cb7b0a cifs: fix the cifs_reconnect path for DFS
Recently, the cifs_reconnect code was refactored into
two branches for regular vs dfs codepath. Some of my
recent changes were missing in the dfs path, namely the
code to enable periodic DNS query, and a missing lock.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:55 -06:00
Muhammad Usama Anjum
8a409cda97 cifs: remove unused variable ses_selected
ses_selected is being declared and set at several places. It is not
being used. Remove it.

Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:54 -06:00
Shyam Prasad N
88b024f556 cifs: protect all accesses to chan_* with chan_lock
A spin lock called chan_lock was introduced recently.
But not all accesses were protected. Doing that with
this change.

To make sure that a channel is not freed when in use,
we need to introduce a ref count. But today, we don't
ever free channels.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:54 -06:00
Shyam Prasad N
a05885ce13 cifs: fix the connection state transitions with multichannel
Recent changes to multichannel required some adjustments in
the way connection states transitioned during/after reconnect.

Also some minor fixes:
1. A pending switch of GlobalMid_Lock to cifs_tcp_ses_lock
2. Relocations of the code that logs reconnect
3. Changed some code in allocate_mid to suit the new scheme

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:54 -06:00
Shyam Prasad N
3663c9045f cifs: check reconnects for channels of active tcons too
With the new multichannel logic, when a channel needs reconnection,
the tree connect and other channels can still be active.
This fix will handle cases of checking for channel reconnect,
when the tcon does not need reconnect.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-19 11:10:54 -06:00
Linus Torvalds
1d1df41c5a f2fs-for-5.17-rc1
In this round, we've tried to address some performance issues in f2fs_checkpoint
 and direct IO flows. Also, there was a work to enhance the page cache management
 used for compression. Other than them, we've done typical work including sysfs,
 code clean-ups, tracepoint, sanity check, in addition to bug fixes on corner
 cases.
 
 Enhancement:
  - use iomap for direct IO
  - try to avoid lock contention to improve f2fs_ckpt speed
  - avoid unnecessary memory allocation in compression flow
  - POSIX_FADV_DONTNEED drops the page cache containing compression pages
  - add some sysfs entries (gc_urgent_high_remaining, pending_discard)
 
 Bug fix:
  - try not to expose unwritten blocks to user by DIO
    : this was added to avoid merge conflict; another patch is coming to address
      other missing case.
  - relax minor error condition for file pinning feature used in Android OTA
  - fix potential deadlock case in compression flow
  - should not truncate any block on pinned file
 
 In addition, we've done some code clean-ups and tracepoint/sanity check
 improvement.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmHnY0sACgkQQBSofoJI
 UNIOkg//UmjCSSG63/YZM/lQQQe4kK/tT6QTT8W/VQtzWL9vXcL7bcaxzwX3LQbR
 Gb47Zmsw9bzVJt6GQ2VRbODE1py/KPNMl5SDXJXHo6fOZ/dOnHve32gLwcLEzhPd
 casB0TbwQJ6bpEsJiZ5ho741mURxUrSCHAAX6QIQVXh8ofm9qAqlWu74OLI6UHiV
 MM84XmXcHtGUZG5SCTWfSCJhJM6Az/3A83ws9KVeu86dlE7IrigphU2nI2vdCKiO
 trR3CiLC/364fiM+9ssLS3X2wKFPD/unEU7ljBv5UaG36jsVfW+tisjTKldzpiKK
 44cNgDv1FEDxC0g3FKUhEGezAhxT8AJZB0in0zn8+5scarKGJtFCy9XhCGMVaeP+
 usxvHVy8Ga1I7sMV6oHEBcGiPJWkmurzq1XXobtj6oL/JxN4gqUJeHTcod89hQHA
 lx9kZs7MLKm2au+T3gZf5xyx35YCie8sY/N1qoPy8tU9Q7FJ54NdqqAc9JEZ6mSk
 k9ybMaa/srHG/EI/XYPw0DrobHg6P5+bYtmsRvw2vP/nsNsD3ZI/EwBBEll2ITxC
 V5Dn7MljYWI/5kB41Hl5xz6X65WeIN7koRyTXw5mp9tkNrLugqII5hzhwhSlcqJ1
 3k9TAN3RbVpWHBcyryDyLbm/+dcbwIJ4v/eJEMIDk8F2SrBGOZs=
 =LCJH
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've tried to address some performance issues in
  f2fs_checkpoint and direct IO flows. Also, there was a work to enhance
  the page cache management used for compression. Other than them, we've
  done typical work including sysfs, code clean-ups, tracepoint, sanity
  check, in addition to bug fixes on corner cases.

  Enhancements:
   - use iomap for direct IO
   - try to avoid lock contention to improve f2fs_ckpt speed
   - avoid unnecessary memory allocation in compression flow
   - POSIX_FADV_DONTNEED drops the page cache containing compression
     pages
   - add some sysfs entries (gc_urgent_high_remaining, pending_discard)

  Bug fixes:
   - try not to expose unwritten blocks to user by DIO (this was added
     to avoid merge conflict; another patch is coming to address other
     missing case)
   - relax minor error condition for file pinning feature used in
     Android OTA
   - fix potential deadlock case in compression flow
   - should not truncate any block on pinned file

  In addition, we've done some code clean-ups and tracepoint/sanity
  check improvement"

* tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits)
  f2fs: do not allow partial truncation on pinned file
  f2fs: remove redunant invalidate compress pages
  f2fs: Simplify bool conversion
  f2fs: don't drop compressed page cache in .{invalidate,release}page
  f2fs: fix to reserve space for IO align feature
  f2fs: fix to check available space of CP area correctly in update_ckpt_flags()
  f2fs: support fault injection to f2fs_trylock_op()
  f2fs: clean up __find_inline_xattr() with __find_xattr()
  f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr()
  f2fs: do not bother checkpoint by f2fs_get_node_info
  f2fs: avoid down_write on nat_tree_lock during checkpoint
  f2fs: compress: fix potential deadlock of compress file
  f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
  f2fs: add gc_urgent_high_remaining sysfs node
  f2fs: fix to do sanity check in is_alive()
  f2fs: fix to avoid panic in is_alive() if metadata is inconsistent
  f2fs: fix to do sanity check on inode type during garbage collection
  f2fs: avoid duplicate call of mark_inode_dirty
  f2fs: show number of pending discard commands
  f2fs: support POSIX_FADV_DONTNEED drop compressed page cache
  ...
2022-01-19 11:50:20 +02:00
Jens Axboe
ccbf726171 io_uring: perform poll removal even if async work removal is successful
An active work can have poll armed, hence it's not enough to just do
the async work removal and return the value if it's different from "not
found". Rather than make poll removal special, just fall through to do
the remaining type lookups and removals.

Reported-by: Florian Fischer <florian.fl.fischer@fau.de>
Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:28:43 -07:00
Jens Axboe
361aee450c io-wq: add intermediate work step between pending list and active work
We have a gap where a worker removes an item from the work list and to
when it gets added as the workers active work. In this state, the work
item cannot be found by cancelations. This is a small window, but it does
exist.

Add a temporary pointer to a work item that isn't on the pending work
list anymore, but also not the active work. This is needed as we need
to drop the wqe lock in between grabbing the work item and marking it
as active, to ensure that signal based cancelations are properly
ordered.

Reported-by: Florian Fischer <florian.fl.fischer@fau.de>
Link: https://lore.kernel.org/io-uring/20220118151337.fac6cthvbnu7icoc@pasture/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:28:12 -07:00
Jens Axboe
efdf518459 io-wq: perform both unstarted and started work cancelations in one go
Rather than split these into two separate lookups and matches, combine
them into one loop. This will become important when we can guarantee
that we don't have a window where a pending work item isn't discoverable
in either state.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:27:59 -07:00
Jens Axboe
36e4c58bf0 io-wq: invoke work cancelation with wqe->lock held
io_wqe_cancel_pending_work() grabs it internally, grab it upfront
instead. For the running work cancelation, grab the lock around it as
well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:27:59 -07:00
Jens Axboe
081b582046 io-wq: make io_worker lock a raw spinlock
In preparation to nesting it under the wqe lock (which is raw due to
being acquired from the scheduler side), change the io_worker lock from
a normal spinlock to a raw spinlock.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:27:59 -07:00
Jens Axboe
ea6e7ceeda io-wq: remove useless 'work' argument to __io_worker_busy()
We don't use 'work' anymore in the busy logic, remove the dead argument.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-18 19:10:11 -07:00
Namjae Jeon
ac090d9c90 ksmbd: fix guest connection failure with nautilus
MS-SMB2 describe session sign like the following.
Session.SigningRequired MUST be set to TRUE under the following conditions:
 - If the SMB2_NEGOTIATE_SIGNING_REQUIRED bit is set in the SecurityMode
   field of the client request.
 - If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
   field and Session.IsAnonymous is FALSE and either Connection.ShouldSign
   or global RequireMessageSigning is TRUE.

When trying guest account connection using nautilus, The login failure
happened on session setup. ksmbd does not allow this connection
when the user is a guest and the connection sign is set. Just do not set
session sign instead of error response as described in the specification.
And this change improves the guest connection in Nautilus.

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-18 16:53:20 -06:00
Dan Carpenter
b207602fb0 ksmbd: uninitialized variable in create_socket()
The "ksmbd_socket" variable is not initialized on this error path.

Cc: stable@vger.kernel.org
Fixes: 0626e6641f ("cifsd: add server handler for central processing and tranport layers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-18 16:53:14 -06:00
Hyunchul Lee
2fd5dcb1c8 ksmbd: smbd: fix missing client's memory region invalidation
if the Channel of a SMB2 WRITE request is
SMB2_CHANNEL_RDMA_V1_INVALIDTE, a client
does not invalidate its memory regions but
ksmbd must do it by sending a SMB2 WRITE response
with IB_WR_SEND_WITH_INV.

But if errors occur while processing a SMB2
READ/WRITE request, ksmbd sends a response
with IB_WR_SEND. So a client could use memory
regions already in use.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-18 16:53:08 -06:00
Steve French
e4e2787bef smb3: add new defines from protocol specification
In the October updates to MS-SMB2 two additional FSCTLs
were described.  Add the missing defines for these,
as well as fix a typo in an earlier define.

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-18 16:50:47 -06:00
J. Bruce Fields
6e7f90d163 lockd: fix server crash on reboot of client holding lock
I thought I was iterating over the array when actually the iteration is
over the values contained in the array?

Ugh, keep it simple.

Symptoms were a null deference in vfs_lock_file() when an NFSv3 client
that previously held a lock came back up and sent a notify.

Reported-by: Jonathan Woithe <jwoithe@just42.net>
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-18 17:07:48 -05:00
Darrick J. Wong
a8e422af69 xfs: remove unused xfs_ioctl32.h declarations
Remove these unused ia32 compat declarations; all the bits involved have
either been withdrawn or hoisted to the VFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2022-01-18 10:18:36 -08:00
Linus Torvalds
3bf6a9e36e virtio,vdpa,qemu_fw_cfg: features, cleanups, fixes
partial support for < MAX_ORDER - 1 granularity for virtio-mem
 driver_override for vdpa
 sysfs ABI documentation for vdpa
 multiqueue config support for mlx5 vdpa
 
 Misc fixes, cleanups.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmHiDHkPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpVT4H/3Veixt3uYPOmuLU2tSx+8X+sFTtik81hyiE
 okz5fRJrxxA8SqS76FnmO10FS4hlPOGNk0Z5WVhr0yihwFvPLvpCM/xi2Lmrz9I7
 pB0sXOIocEL1xApsxukR9K1Twpb2hfYsflbJYUVlRfhS5G0izKJNZp5I7OPrzd80
 vVNNDWKW2iLDlfqsavumI4Kvm4nsFuCHG03jzMtcIa7YTXYV3DORD4ZGFFVUOIQN
 t5F74TznwHOeYgJeg7TzjFjfPWmXjLetvx10QX1A1uOvwppWW/QY6My0UafTXNXj
 VB3gOwJPf+gxXAXl/4bafq4NzM0xys6cpcPpjvhmU+erY4UuyAU=
 =Y1eO
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes.

   - partial support for < MAX_ORDER - 1 granularity for virtio-mem

   - driver_override for vdpa

   - sysfs ABI documentation for vdpa

   - multiqueue config support for mlx5 vdpa

   - and misc fixes, cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits)
  vdpa/mlx5: Fix tracking of current number of VQs
  vdpa/mlx5: Fix is_index_valid() to refer to features
  vdpa: Protect vdpa reset with cf_mutex
  vdpa: Avoid taking cf_mutex lock on get status
  vdpa/vdpa_sim_net: Report max device capabilities
  vdpa: Use BIT_ULL for bit operations
  vdpa/vdpa_sim: Configure max supported virtqueues
  vdpa/mlx5: Report max device capabilities
  vdpa: Support reporting max device capabilities
  vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
  vdpa: Add support for returning device configuration information
  vdpa/mlx5: Support configuring max data virtqueue
  vdpa/mlx5: Fix config_attr_mask assignment
  vdpa: Allow to configure max data virtqueues
  vdpa: Read device configuration only if FEATURES_OK
  vdpa: Sync calls set/get config/status with cf_mutex
  vdpa/mlx5: Distribute RX virtqueues in RQT object
  vdpa: Provide interface to read driver features
  vdpa: clean up get_config_size ret value handling
  virtio_ring: mark ring unused on error
  ...
2022-01-18 10:05:48 +02:00
Jamie Hill-Daniel
722d94847d vfs: fs_context: fix up param length parsing in legacy_parse_param
The "PAGE_SIZE - 2 - size" calculation in legacy_parse_param() is an
unsigned type so a large value of "size" results in a high positive
value instead of a negative value as expected.  Fix this by getting rid
of the subtraction.

Signed-off-by: Jamie Hill-Daniel <jamie@hill-daniel.co.uk>
Signed-off-by: William Liu <willsroot@protonmail.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-18 09:23:19 +02:00
Ronnie Sahlberg
5455b9ecaf cifs: serialize all mount attempts
RHBZ: 2008434

Some servers, such as Windows2016 have a very low number of concurrent mounts that
they allow from each client.
This can be a problem if you have a more than a handful (==3 in this case)
of cifs entries in your fstab and cause a number of the mounts there to randomly fail.

Add a global mutex and use it to serialize all mount attempts.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-18 00:10:03 -06:00
Linus Torvalds
f0033681f0 orangefs: two fixes
Fix the size of a memory allocation in orangefs_bufmap_alloc()
   Christophe JAILLET
 
   use default_groups in kobj_type
   Greg KH
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIGSFVdO6eop9nER2z0QOqevODb4FAmHlgOQACgkQz0QOqevO
 Db75Ng/+Lruql+eMqjMrmwRf8Rm1tErTjyll1RgynB608xd3A8mzRxk0U9+EQDCJ
 U1fa0N1AJatSDh0X8PLke0lMR1YxGFNs+gmIswz4b0B0tpSuw+oJhUTrYSaeE5ud
 boD+EwCIv7G3bTUqT8eC3sRFGZycIfQUMKS7Ml76OKxQXSd+9BaP4e7s87/kQItI
 oRxBx/AzndLjhQWfPhCGVqYMzetto/pSAQaUfspH/pIAaDYPmuTCBvNf9crMjQ1k
 TI7Pf0CMTGFdki0p5MkhK5co90X28aI+QHaycCtWMvwKxcASU1mKaJ5IqEzholkY
 Q89cEyMeb8RkQjeOC8L1jG1GvkrBJNzdm30NVo4m7uQBp1ExXdvUnQRQ+FnJ7nEP
 SAX5trxIrQuNKq1/hLqxngIQ3/duLxtnaFd9ZwmixTvQUgtxLTQL9Bued1WmcNTR
 Z3VePxyIgqKKsdzzx0uBAd501/T77aB5TwSiVaZQNEDc44oZhr6rricrYP2tjnq4
 3XLsAzb5djPUj3kEhbc4HKPatl67r7LsFumOMjZGnnj2nEsvQLH9o53snWlyfBYS
 40CrulcEiIjD5dgt/xAojhPT20CyqCYH9mJyZ5NfvRlcR30vNT9/okeoFnUyKTB4
 JBhHT/bu89CGD5F1M6G6wHGXNv/szxPDzW08/dUJA+BrZrHOYJQ=
 =OM/4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fixes from Mike Marshall:
 "Two fixes:

   - Fix the size of a memory allocation in orangefs_bufmap_alloc()
     (Christophe JAILLET)

   - Use default_groups in kobj_type (Greg KH)"

* tag 'for-linus-5.17-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()
  orangefs: use default_groups in kobj_type
2022-01-18 06:26:50 +02:00
Eugene Korenevsky
a2809d0e16 cifs: quirk for STATUS_OBJECT_NAME_INVALID returned for non-ASCII dfs refs
Windows SMB server responds with STATUS_OBJECT_NAME_INVALID code to
SMB2 QUERY_INFO request for "\<server>\<dfsname>\<linkpath>" DFS reference,
where <dfsname> contains non-ASCII unicode symbols.

Check such DFS reference and emulate -EREMOTE if it is actual.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215440
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-17 13:28:25 -06:00
Eugene Korenevsky
7eacba3b00 cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty
alloc_path_with_tree_prefix() concatenates tree prefix and the path.
Windows CIFS client does not add separator after the tree prefix if the path
is empty. Let's do the same.

This fixes mounting DFS namespaces with names containing non-ASCII symbols.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215440
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-17 13:28:05 -06:00
Yang Li
74ce6135ae cifs: clean up an inconsistent indenting
Eliminate the follow smatch warning:
fs/cifs/sess.c:1581 sess_auth_rawntlmssp_authenticate() warn:
inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-17 12:00:37 -06:00
Shyam Prasad N
e3548aaf41 cifs: free ntlmsspblob allocated in negotiate
One of my previous fixes:
cifs: send workstation name during ntlmssp session setup
...changed the prototype of build_ntlmssp_negotiate_blob
from being allocated by the caller to being allocated within
the function. The caller needs to free this object too.
While SMB2 version of the caller did it, I forgot to free
for the SMB1 version. Fixing that here.

Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org # 5.16
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-17 11:56:19 -06:00
Darrick J. Wong
b3bb9413e7 xfs: remove the XFS_IOC_{ALLOC,FREE}SP* definitions
Now that we've made these ioctls defunct, move them from xfs_fs.h to
xfs_ioctl.c, which effectively removes them from the publicly supported
ioctl interfaces for XFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2022-01-17 09:17:11 -08:00
Darrick J. Wong
4d1b97f9ce xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls
According to the glibc compat header for Irix 4, these ioctls originated
in April 1991 as a (somewhat clunky) way to preallocate space at the end
of a file on an EFS filesystem.  XFS, which was released in Irix 5.3 in
December 1993, picked up these ioctls to maintain compatibility and they
were ported to Linux in the early 2000s.

Recently it was pointed out to me they still lurk in the kernel, even
though the Linux fallocate syscall supplanted the functionality a long
time ago.  fstests doesn't seem to include any real functional or stress
tests for these ioctls, which means that the code quality is ... very
questionable.  Most notably, it was a stale disk block exposure vector
for 21 years and nobody noticed or complained.  As mature programmers
say, "If you're not testing it, it's broken."

Given all that, let's withdraw these ioctls from the XFS userspace API.
Normally we'd set a long deprecation process, but I estimate that there
aren't any real users, so let's trigger a warning in dmesg and return
-ENOTTY.

See: CVE-2021-4155

Augments: 983d8e60f5 ("xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-01-17 09:16:41 -08:00
Darrick J. Wong
9dec0368b9 xfs: remove the XFS_IOC_FSSETDM definitions
Remove the definitions for these ioctls, since the functionality (and,
weirdly, the 32-bit compat ioctl definitions) were removed from the
kernel in November 2019.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-01-17 09:16:40 -08:00
Linus Torvalds
0c947b893d 13 cifs/smb3 fixes, mostly multichannel, reconnect relate
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmHk2A0ACgkQiiy9cAdy
 T1GjmgwAhJA5Z8nr28Q/DrKdUkdKeHbPQUsroQlRVzO7gswnVQe2EKc/cfBj2bC2
 a/uD8G3Kkgv3n6UCjbj5FkfAQRIcuQaZfxPEYMrlk6quBmSeqSTts/YOGICfCvw8
 4Ra/pULtuF3Y27/3z98owbMxi/OScVP7vOxGVSQOgBJazTX0Lgtnt+UK2gKMYqjD
 rGDaFScf8eq6J4Py2E2Ritn06v4Zk3/G1C+66SwJePnHdPfNh9ym+OKjiz8OkhB2
 o968pt9QpBb6hizZlr/uO402lcxQkDVGmUKjHNA8xeCZvOvlBAlJFJY7FdfswlRv
 62nxWhXg8hZBda1qJQS+7PLLK/2ovEFApcYk3ZZu67lCPHtk0cloxGkNmaJ7qcu1
 /IDKFwPEiN3PqwE50hIHx9X3VaSdoOrL41MJVc1EzubM6kmgQHsq6FczKC8zWStH
 ZioulKkc6r+/bJWCkllMIFoo7hb/M67i8QLF9VSaepgxz1A1oN23N78X6rShzcjE
 56qh83F+
 =Kauh
 -----END PGP SIGNATURE-----

Merge tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:

 - multichannel patches mostly related to improving reconnect behavior

 - minor cleanup patches

* tag '5.17-rc-part1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix FILE_BOTH_DIRECTORY_INFO definition
  cifs: move superblock magic defitions to magic.h
  cifs: Fix smb311_update_preauth_hash() kernel-doc comment
  cifs: avoid race during socket reconnect between send and recv
  cifs: maintain a state machine for tcp/smb/tcon sessions
  cifs: fix hang on cifs_get_next_mid()
  cifs: take cifs_tcp_ses_lock for status checks
  cifs: reconnect only the connection and not smb session where possible
  cifs: add WARN_ON for when chan_count goes below minimum
  cifs: adjust DebugData to use chans_need_reconnect for conn status
  cifs: use the chans_need_reconnect bitmap for reconnect status
  cifs: track individual channel status using chans_need_reconnect
  cifs: remove redundant assignment to pointer p
2022-01-17 09:53:21 +02:00
NeilBrown
a6097180d8 devtmpfs regression fix: reconfigure on each mount
Prior to Linux v5.4 devtmpfs used mount_single() which treats the given
mount options as "remount" options, so it updates the configuration of
the single super_block on each mount.

Since that was changed, the mount options used for devtmpfs are ignored.
This is a regression which affect systemd - which mounts devtmpfs with
"-o mode=755,size=4m,nr_inodes=1m".

This patch restores the "remount" effect by calling reconfigure_single()

Fixes: d401727ea0 ("devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-17 09:40:29 +02:00
Linus Torvalds
98f2345773 unicode: fix .gitignore for generated utfdata file
Commit 2b3d047870 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.

The comments should be updated too, but at least they don't cause any
visible breakage.  But the gitignore file needs changing to avoid git
complaining about untracked files.

Fixes: 2b3d047870 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-17 07:26:43 +02:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
6661224e66 unicode patches for 5.17
This includes patches from Christoph Hellwig to split the large data
 tables of the unicode subsystem into a loadable module, which allow
 users to not have them around if case-insensitive filesystems are not to
 be used.  It also includes minor code fixes to unicode and its users,
 from the same author.
 
 There is a trivial conflict in the function encoding_show in
 fs/f2fs/sysfs.c reported by linux-next between commit
 
 84eab2a899 ("f2fs: replace snprintf in show functions with sysfs_emit")
 
 and commit a440943e68 ("unicode: remove the charset field from struct
 unicode_map").  from my tree.
 
 All the patches here have been on linux-next releases for the past
 months.
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAmHeLp0ACgkQQEuZqsMc
 ppRWdhAAstuibIlhUj1Vae070P92oaxM/Azz3IgyVFWensJyQV1PvbtFQDhyKM4w
 M3tQ45eK49vVHn+JpLHbiAdZV66rD/sMSsruCVIf/8KNVDisOBQtFar5yxVr0Ion
 AOMoG6/Xrk8BZlZH62fhtJGtu/EFmeFoGVdC81NdTSroe9G+26we3IULwHSE1lNH
 XMJFCgU6otuLDOna16U7kL77Tu7GXRJcQe1+2nRJ+u6Agxy2xTo/s4FHuxzRK0/e
 GsgO1scY6unWM23O6z+qJYazng2Zt3EOZtSGqU4TsvZwjUi2UtAYW1/vAQGc/q3Y
 hGxPYGgKC1VrXLfIcuyng7j0vFPtADbdHMbsJPoyy+Nz4znDJ81IAKAHMO1in3C8
 CHKjW+6InmXNye/uwdRt8Tx49jxUHmWUbQRT5FwMDpzC7MAL+DVdPpVVQgpLVM/H
 gW3YpBEk5qQvVdh8DWZVW3rT3SnMX/v0+u+76FsMHKYNJMNrCnP6vXpCPQl/Gyut
 ycgK7qVF3o/bgNBf072H3ZBZajTv7ePvacP4Wth7m9I2ykk+p4IjQLpTC5rJK0By
 VC1xS4im2VqiIWE9eE5y9cXU1oa/AfOcOF+7FZcxT13IL6hKTtd4+H4yKgdcNsyk
 7RjpGgjp+SU51/EilhEqMFgEe07CURxwGwhApizBSiTIOgZS96U=
 =4q9x
 -----END PGP SIGNATURE-----

Merge tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode updates from Gabriel Krisman Bertazi:
 "This includes patches from Christoph Hellwig to split the large data
  tables of the unicode subsystem into a loadable module, which allow
  users to not have them around if case-insensitive filesystems are not
  to be used. It also includes minor code fixes to unicode and its
  users, from the same author.

  All the patches here have been on linux-next releases for the past
  months"

* tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: only export internal symbols for the selftests
  unicode: Add utf8-data module
  unicode: cache the normalization tables in struct unicode_map
  unicode: move utf8cursor to utf8-selftest.c
  unicode: simplify utf8len
  unicode: remove the unused utf8{,n}age{min,max} functions
  unicode: pass a UNICODE_AGE() tripple to utf8_load
  unicode: mark the version field in struct unicode_map unsigned
  unicode: remove the charset field from struct unicode_map
  f2fs: simplify f2fs_sb_read_encoding
  ext4: simplify ext4_sb_read_encoding
2022-01-17 05:40:02 +02:00
Linus Torvalds
4d66020dce Tracing updates for 5.17:
New:
 
 - The Real Time Linux Analysis (RTLA) tool is added to the tools directory.
 
 - Can safely filter on user space pointers with: field.ustring ~ "match-string"
 
 - eprobes can now be filtered like any other event.
 
 - trace_marker(_raw) now uses stream_open() to allow multiple threads to safely
   write to it. Note, this could possibly break existing user space, but we will
   not know until we hear about it, and then can revert the change if need be.
 
 - New field in events to display when bottom halfs are disabled.
 
 - Sorting of the ftrace functions are now done at compile time instead of
   at bootup.
 
 Infrastructure changes to support future efforts:
 
 - Added __rel_loc type for trace events. Similar to __data_loc but the offset
   to the dynamic data is based off of the location of the descriptor and not
   the beginning of the event. Needed for user defined events.
 
 - Some simplification of event trigger code.
 
 - Make synthetic events process its callback better to not hinder other
   event callbacks that are registered. Needed for user defined events.
 
 And other small fixes and clean ups.
 -
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYeGvcxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZtAP9ICjJxX54MTErElhhUL/NFLV7wqhJi
 OIAgmp6jGVRqPAD+JxQtBnGH+3XMd71ioQkTfQ1rp+jBz2ERBj2DmELUAg0=
 =zmda
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New:

   - The Real Time Linux Analysis (RTLA) tool is added to the tools
     directory.

   - Can safely filter on user space pointers with: field.ustring ~
     "match-string"

   - eprobes can now be filtered like any other event.

   - trace_marker(_raw) now uses stream_open() to allow multiple threads
     to safely write to it. Note, this could possibly break existing
     user space, but we will not know until we hear about it, and then
     can revert the change if need be.

   - New field in events to display when bottom halfs are disabled.

   - Sorting of the ftrace functions are now done at compile time
     instead of at bootup.

  Infrastructure changes to support future efforts:

   - Added __rel_loc type for trace events. Similar to __data_loc but
     the offset to the dynamic data is based off of the location of the
     descriptor and not the beginning of the event. Needed for user
     defined events.

   - Some simplification of event trigger code.

   - Make synthetic events process its callback better to not hinder
     other event callbacks that are registered. Needed for user defined
     events.

  And other small fixes and cleanups"

* tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits)
  tracing: Add ustring operation to filtering string pointers
  rtla: Add rtla timerlat hist documentation
  rtla: Add rtla timerlat top documentation
  rtla: Add rtla timerlat documentation
  rtla: Add rtla osnoise hist documentation
  rtla: Add rtla osnoise top documentation
  rtla: Add rtla osnoise man page
  rtla: Add Documentation
  rtla/timerlat: Add timerlat hist mode
  rtla: Add timerlat tool and timelart top mode
  rtla/osnoise: Add the hist mode
  rtla/osnoise: Add osnoise top mode
  rtla: Add osnoise tool
  rtla: Helper functions for rtla
  rtla: Real-Time Linux Analysis tool
  tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails
  tracing: Remove duplicate warnings when calling trace_create_file()
  tracing/kprobes: 'nmissed' not showed correctly for kretprobe
  tracing: Add test for user space strings when filtering on string pointers
  tracing: Have syscall trace events use trace_event_buffer_lock_reserve()
  ...
2022-01-16 10:15:32 +02:00
Linus Torvalds
88db845808 Description for this pull request:
- Fix ->i_blocks truncation issue that still exists elsewhere.
  - 4 cleanups & typos fixes.
  - Move super block magic number to magic.h
  - Fix missing REQ_SYNC in exfat_update_bhs().
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmHiZ68WHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCNUhD/48wj/ce6+GNeyKeadfcD/wMFJF
 DNs564zTHcjecUg7+tzk82b4n7NM3X//hCaEqEwmX1WjcZcXbtSNqUY7DTMTadgM
 Ad10JQheOfVRHqSAchNK3ph86Mo5u1c29+4tWS9tv4UaZeAgyCXph+w8d5/M0BBj
 Rhxoi2Sb2cIVJAu6eMWbKGy4aQVN8a/nlf36ZCKDNCz8IwLyNtUtQ8myZLDhf7BA
 df4guextxvSyjURqFy/2At+/+faP/nXddJTKOIkmV3VGM9b7URjwADVsEjtouZIo
 TAH+flRle5tMFFguFLoXt+pVLuiK3AMtgV/1JNaAQGTotD0VpB66LkFGqsUjBtew
 D8XzoESafeMp3HmWA2eSAbHQ3n1ulzyhuJ5RJpGfhicBFlFlwld4uoGm1JUOA05T
 or84sPN32rCVMxeMquykXHQ0HzGXg7coZHjGG9YjYV1FsyGCIO3aa5IpQ9D2g0WZ
 wvRruXXwFur19mnGXYWfwB67HUXYB8iiAHWKXQM2msMklpLT1+rY+M8RBd/Yz4bJ
 WNrU/U4NCGMjlB4Qmq/ZGazubDOXO4xNt/28gkNm+S2EDGp0w6HOIP/BqHO9879k
 Xx0EqgLSPfGgdMtH0wigfEZ9UkI3W1+Z0gInGs2LVtSPq7rS7HzbKd9CNci2d41Y
 2mCoqCZ0eH14+CXS5A==
 =/DUv
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Fix ->i_blocks truncation issue that still exists elsewhere.

 - Four cleanups & typos fixes.

 - Move super block magic number to magic.h

 - Fix missing REQ_SYNC in exfat_update_bhs().

* tag 'exfat-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix missing REQ_SYNC in exfat_update_bhs()
  exfat: remove argument 'sector' from exfat_get_dentry()
  exfat: move super block magic number to magic.h
  exfat: fix i_blocks for files truncated over 4 GiB
  exfat: reuse exfat_inode_info variable instead of calling EXFAT_I()
  exfat: make exfat_find_location() static
  exfat: fix typos in comments
  exfat: simplify is_valid_cluster()
2022-01-16 07:54:11 +02:00
Linus Torvalds
175398a097 Highlights:
- Bruce steps down as NFSD maintainer
 - Prepare for dynamic nfsd thread management
 - More work on supporting re-exporting NFS mounts
 - One fs/locks patch on behalf of Jeff Layton
 
 Notable bug fixes:
 - Fix zero-length NFSv3 WRITEs
 - Fix directory cinfo on FS's that do not support iversion
 - Fix WRITE verifiers for stable writes
 - Fix crash on COPY_NOTIFY with a special state ID
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmHcWOMACgkQM2qzM29m
 f5dh0Q/+MjEL0IK551FdChx9Es1JqKRggv9KwJkLIoa1bw/PMSwP2pnKz6eL0Yun
 mdhE9AZQgyFH1IAGdqjeLZKIYRin6bvAdDrnlqQ9SvTviPLWniSUI6AuyUqK6Zyk
 wMcXpyOze0fhpxkYmz8/g7i66w967tmLh5MRvV1dkpOYAe99rYwGhvj+9ZeEWfNI
 TgmptntMG6YEb+xY0E73otXZHMr2DL67ZYvOUYWemJA1uxcX4joaWBg8sx74dB6k
 DUB4BFuoURk6viDD1QYh3qPU3dz9RCJNMz/cWd8+2t7BdaujTSXRIcaFslrQnKfL
 Rm+O7pi5W+XohFDjeuMZ1g0c1ot/aoZSaAz00LoCVhejJ/sK9NiPAN1+LyY91Lja
 cUBMVPNfW7ClIpiZcORP/chNmVn2qlaL2nxzSY/Uegnd5pIIeVD0pFVgx4+NlEat
 mbrrQBcMpBRM0B+RzHS6AusqHrGdSEcwqWoVXWdxsBigJQT/AxWmii3U88k0Z54i
 ooMWLaQ9EBBmygV01JN/OBySW2M/dvbfz3eFROvAVqsIP9JWP3FlUOlRDl8GcjXA
 azi9fTysBom7WtL6NPcxDJbJ2t9hYr2YaztTpdo9YCHOuQbSQT6IWR5PAa3zvwMu
 Bfz6Y8Hoo/KZHCqmkPGYM+x1ENCyDPv788E+erdnw1PFP5F3Pbo=
 =/kX3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Bruce has announced he is leaving Red Hat at the end of the month and
  is stepping back from his role as NFSD co-maintainer. As a result,
  this includes a patch removing him from the MAINTAINERS file.

  There is one patch in here that Jeff Layton was carrying in the locks
  tree. Since he had only one for this cycle, he asked us to send it to
  you via the nfsd tree.

  There continues to be 0-day reports from Robert Morris @MIT. This time
  we include a fix for a crash in the COPY_NOTIFY operation.

  Highlights:
   - Bruce steps down as NFSD maintainer
   - Prepare for dynamic nfsd thread management
   - More work on supporting re-exporting NFS mounts
   - One fs/locks patch on behalf of Jeff Layton

  Notable bug fixes:
   - Fix zero-length NFSv3 WRITEs
   - Fix directory cinfo on FS's that do not support iversion
   - Fix WRITE verifiers for stable writes
   - Fix crash on COPY_NOTIFY with a special state ID"

* tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits)
  SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
  SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
  fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes
  nfsd: fix crash on COPY_NOTIFY with special stateid
  MAINTAINERS: remove bfields
  NFSD: Move fill_pre_wcc() and fill_post_wcc()
  Revert "nfsd: skip some unnecessary stats in the v4 case"
  NFSD: Trace boot verifier resets
  NFSD: Rename boot verifier functions
  NFSD: Clean up the nfsd_net::nfssvc_boot field
  NFSD: Write verifier might go backwards
  nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
  NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
  NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
  NFSD: Clean up nfsd_vfs_write()
  nfsd: Replace use of rwsem with errseq_t
  NFSD: Fix verifier returned in stable WRITEs
  nfsd: Retry once in nfsd_open on an -EOPENSTALE return
  nfsd: Add errno mapping for EREMOTEIO
  nfsd: map EBADF
  ...
2022-01-16 07:42:58 +02:00
Linus Torvalds
49ad227d54 9p-for-5.17-rc1: fixes, split 9p_net_fd, new reviewer
- fix possible uninitialized memory usage for setattr
 - fix fscache reading hole in a file just after it's been grown
 - split net/9p/trans_fd.c in its own module like other transports
   that module defaults to 9P_NET and is autoloaded if required so
   users should not be impacted
 - add Christian Schoenebeck to 9p reviewers
 - some more trivial cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmHgt18ACgkQq06b7GqY
 5nAdUA//ZHFTIcnbTiVkqbw0YztxedTOhdGCWcYsszux0pNQ/ZCjbP5NxESWiFxs
 raYWcvE98Xvq4fbs8b+m7YKBJzWF+Km/v1MKfgvZZWFLZR3MfWiL8iojlsaUgG9U
 rBmEUyaTWw2COIvFN7EqnwT5mwzNCli5d3AzaAmgffWsHEi/+EVU35YX70ySUkjW
 nVf08oX3dBB425oArXOZApOZSVRsUr5YDSuQGiFHBL+hvPTrvPumu/AXbjLbTbmf
 NuUxu1Akw8/jhWNATLHzjCdzJzBSfF0zRs+oH9Qt8MKkUrlTfjxc/2MbQYL1p1IN
 84XxiG02ebw+Mx05cydP9/Ll7gOo2p6ORGXMHcoPIAMy6zFiCKo3TRdKDgmYDauC
 K8c3v+osNwl2GFn1+2XoQIWnqXb7bJ1debkFtWVEGOPd/Yr6CYCZyfiZamoPJXUO
 TjkoLAJXJGKlV66ZCuVeAsySdUxdtwbj2WxTliDUWtxvQ+m1jt06S3f/TbA7a9bB
 8aBhx7FKzQ/UL8zhm4+3WLPlWkoSUXhFSZZUknvhuaHFV/S1xglXox5OAQJrJMy+
 qOzmOjCg14T1TC2WkJlEsedLUH8ACU+XueAPT6uqSdA4rvQEVzk32zTm/GfFu3Q3
 0RhcGlW5kAAYBTBLXvmKsjyW2OmPCSycgbWCu3E8A/1gubbRE40=
 =o2Lc
 -----END PGP SIGNATURE-----

Merge tag '9p-for-5.17-rc1' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
 "Fixes, split 9p_net_fd, and new reviewer:

   - fix possible uninitialized memory usage for setattr

   - fix fscache reading hole in a file just after it's been grown

   - split net/9p/trans_fd.c in its own module like other transports.

     The new transport module defaults to 9P_NET and is autoloaded if
     required so users should not be impacted

   - add Christian Schoenebeck to 9p reviewers

   - some more trivial cleanup"

* tag '9p-for-5.17-rc1' of git://github.com/martinetd/linux:
  9p: fix enodata when reading growing file
  net/9p: show error message if user 'msize' cannot be satisfied
  MAINTAINERS: 9p: add Christian Schoenebeck as reviewer
  9p: only copy valid iattrs in 9P2000.L setattr implementation
  9p: Use BUG_ON instead of if condition followed by BUG.
  net/p9: load default transports
  9p/xen: autoload when xenbus service is available
  9p/trans_fd: split into dedicated module
  fs: 9p: remove unneeded variable
  9p/trans_virtio: Fix typo in the comment for p9_virtio_create()
2022-01-16 07:36:49 +02:00
Linus Torvalds
f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Yury Norov
b5c7e7ec7d all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-15 08:47:31 -08:00
Eugene Korenevsky
9bbf8662a2 cifs: fix FILE_BOTH_DIRECTORY_INFO definition
The size of FILE_BOTH_DIRECTORY_INFO.ShortName must be 24 bytes, not 12
(see MS-FSCC documentation).

Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-15 10:08:47 -06:00
Jeff Layton
dea2903719 cifs: move superblock magic defitions to magic.h
Help userland apps to identify cifs and smb2 mounts.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-15 10:08:44 -06:00
Yang Li
3ac5f2f257 cifs: Fix smb311_update_preauth_hash() kernel-doc comment
Add the description of @server in smb311_update_preauth_hash()
kernel-doc comment to remove warning found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
fs/cifs/smb2misc.c:856: warning: Function parameter or member 'server'
not described in 'smb311_update_preauth_hash'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-15 10:08:33 -06:00
Sean Christopherson
d6aba4c8e2 hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list()
Pass "end - 1" instead of "end" when walking the interval tree in
hugetlb_vmdelete_list() to fix an inclusive vs.  exclusive bug.  The two
callers that pass a non-zero "end" treat it as exclusive, whereas the
interval tree iterator expects an inclusive "last".  E.g.  punching a
hole in a file that precisely matches the size of a single hugepage,
with a vma starting right on the boundary, will result in
unmap_hugepage_range() being called twice, with the second call having
start==end.

The off-by-one error doesn't cause functional problems as
__unmap_hugepage_range() turns into a massive nop due to
short-circuiting its for-loop on "address < end".  But, the mmu_notifier
invocations to invalid_range_{start,end}() are passed a bogus zero-sized
range, which may be unexpected behavior for secondary MMUs.

The bug was exposed by commit ed922739c9 ("KVM: Use interval tree to
do fast hva lookup in memslots"), currently queued in the KVM tree for
5.17, which added a WARN to detect ranges with start==end.

Link: https://lkml.kernel.org/r/20211228234257.1926057-1-seanjc@google.com
Fixes: 1bfad99ab4 ("hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reported-by: syzbot+4e697fe80a31aa7efe21@syzkaller.appspotmail.com
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
NeilBrown
4034247a0d mm: introduce memalloc_retry_wait()
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying.  Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:

 - a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
 - a need to check for the process being signalled between failures
 - the possibility that other recovery actions could be performed
 - the allocation is quite deep in support code, and passing down an
   extra flag to say if __GFP_NOFAIL is wanted would be clumsy.

Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.

It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.

This patch introduces memalloc_retry_wait() with takes on that
responsibility.  Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used.  It will wait
however is appropriate.

For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests.  If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing.  So there is no need for much
further waiting.  memalloc_retry_wait() will wait until the current
jiffie ends.  If this condition is not met, then alloc_page() won't have
waited much if at all.  In that case memalloc_retry_wait() waits about
200ms.  This is the delay that most current loops uses.

linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.

Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:29 +02:00
Arnd Bergmann
17fca131ce mm: move anon_vma declarations to linux/mm_inline.h
The patch to add anonymous vma names causes a build failure in some
configurations:

  include/linux/mm_types.h: In function 'is_same_vma_anon_name':
  include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration]
    924 |         return name && vma_name && !strcmp(name, vma_name);
        |                                     ^~~~~~
  include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'?

This should not really be part of linux/mm_types.h in the first place,
as that header is meant to only contain structure defintions and need a
minimum set of indirect includes itself.

While the header clearly includes more than it should at this point,
let's not make it worse by including string.h as well, which would pull
in the expensive (compile-speed wise) fortify-string logic.

Move the new functions into a separate header that only needs to be
included in a couple of locations.

Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org
Fixes: "mm: add a field to store names for private anonymous memory"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Colin Cross
9a10064f56 mm: add a field to store names for private anonymous memory
In many userspace applications, and especially in VM based applications
like Android uses heavily, there are multiple different allocators in
use.  At a minimum there is libc malloc and the stack, and in many cases
there are libc malloc, the stack, direct syscalls to mmap anonymous
memory, and multiple VM heaps (one for small objects, one for big
objects, etc.).  Each of these layers usually has its own tools to
inspect its usage; malloc by compiling a debug version, the VM through
heap inspection tools, and for direct syscalls there is usually no way
to track them.

On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages
mapped in userspace and slice their usage by process, shared (COW) vs.
unique mappings, backing, etc.  This can account for real physical
memory usage even in cases like fork without exec (which Android uses
heavily to share as many private COW pages as possible between
processes), Kernel SamePage Merging, and clean zero pages.  It produces
a measurement of the pages that only exist in that process (USS, for
unique), and a measurement of the physical memory usage of that process
with the cost of shared pages being evenly split between processes that
share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap
walking tool that can understand the heap debugging of every layer, or
for every layer's heap debugging tools to implement the pagemap walking
logic, in which case it is hard to get a consistent view of memory
across the whole system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that somebody
needs to clean up on crashes.  It needs to be readable while the process
is still running, so it has to have some sort of synchronization with
every layer of userspace.  Efficiently tracking the ranges requires
reimplementing something like the kernel vma trees, and linking to it
from every layer of userspace.  It requires more memory, more syscalls,
more runtime cost, and more complexity to separately track regions that
the kernel is already tracking.

This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
userspace-provided name for anonymous vmas.  The names of named
anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
[anon:<name>].

Userspace can set the name for a region of memory by calling

   prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)

Setting the name to NULL clears it.  The name length limit is 80 bytes
including NUL-terminator and is checked to contain only printable ascii
characters (including space), except '[',']','\','$' and '`'.

Ascii strings are being used to have a descriptive identifiers for vmas,
which can be understood by the users reading /proc/pid/maps or
/proc/pid/smaps.  Names can be standardized for a given system and they
can include some variable parts such as the name of the allocator or a
library, tid of the thread using it, etc.

The name is stored in a pointer in the shared union in vm_area_struct
that points to a null terminated string.  Anonymous vmas with the same
name (equivalent strings) and are otherwise mergeable will be merged.
The name pointers are not shared between vmas even if they contain the
same name.  The name pointer is stored in a union with fields that are
only used on file-backed mappings, so it does not increase memory usage.

CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
feature.  It keeps the feature disabled by default to prevent any
additional memory overhead and to avoid confusing procfs parsers on
systems which are not ready to support named anonymous vmas.

The patch is based on the original patch developed by Colin Cross, more
specifically on its latest version [1] posted upstream by Sumit Semwal.
It used a userspace pointer to store vma names.  In that design, name
pointers could be shared between vmas.  However during the last
upstreaming attempt, Kees Cook raised concerns [2] about this approach
and suggested to copy the name into kernel memory space, perform
validity checks [3] and store as a string referenced from
vm_area_struct.

One big concern is about fork() performance which would need to strdup
anonymous vma names.  Dave Hansen suggested experimenting with
worst-case scenario of forking a process with 64k vmas having longest
possible names [4].  I ran this experiment on an ARM64 Android device
and recorded a worst-case regression of almost 40% when forking such a
process.

This regression is addressed in the followup patch which replaces the
pointer to a name with a refcounted structure that allows sharing the
name pointer between vmas of the same name.  Instead of duplicating the
string during fork() or when splitting a vma it increments the refcount.

[1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
[2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
[3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
[4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/

Changes for prctl(2) manual page (in the options section):

PR_SET_VMA
	Sets an attribute specified in arg2 for virtual memory areas
	starting from the address specified in arg3 and spanning the
	size specified	in arg4. arg5 specifies the value of the attribute
	to be set. Note that assigning an attribute to a virtual memory
	area might prevent it from being merged with adjacent virtual
	memory areas due to the difference in that attribute's value.

	Currently, arg2 must be one of:

	PR_SET_VMA_ANON_NAME
		Set a name for anonymous virtual memory areas. arg5 should
		be a pointer to a null-terminated string containing the
		name. The name length including null byte cannot exceed
		80 bytes. If arg5 is NULL, the name of the appropriate
		anonymous virtual memory areas will be reset. The name
		can contain only printable ascii characters (including
                space), except '[',']','\','$' and '`'.

                This feature is available only if the kernel is built with
                the CONFIG_ANON_VMA_NAME option enabled.

[surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
  Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
[surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
 added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
 work here was done by Colin Cross, therefore, with his permission, keeping
 him as the author]

Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
Signed-off-by: Colin Cross <ccross@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Glauber <jan.glauber@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Shaohua Li <shli@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:27 +02:00
Matthew Wilcox (Oracle)
3e9d80a891 mm,fs: split dump_mapping() out from dump_page()
dump_mapping() is a big chunk of dump_page(), and it'd be handy to be
able to call it when we don't have a struct page.  Split it out and move
it to fs/inode.c.  Take the opportunity to simplify some of the debug
messages a little.

Link: https://lkml.kernel.org/r/20211121121056.2870061-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:26 +02:00
Amit Daniel Kachhap
a12cf8b32c fs/ioctl: remove unnecessary __user annotation
__user annotations are used by the checker (e.g sparse) to mark user
pointers.  However here __user is applied to a struct directly, without a
pointer being directly involved.

Although the presence of __user does not cause sparse to emit a warning,
__user should be removed for consistency with other uses of offsetof().

Note: No functional changes intended.

Link: https://lkml.kernel.org/r/20211122101256.7875-1-amit.kachhap@arm.com
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Colin Ian King
9a25d05150 ocfs2: remove redundant assignment to variable free_space
The variable 'free_space' is being initialized with a value that is not
read, it is being re-assigned later in the two paths of an if statement.
The early initialization is redundant and can be removed.

Link: https://lkml.kernel.org/r/20220112230411.1090761-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Greg Kroah-Hartman
d141b39b39 ocfs2: cluster: use default_groups in kobj_type
There are currently two ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.

Move the ocfs2 cluster sysfs code to use default_groups field which has
been the preferred way since aa30f47cf6 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.

Link: https://lkml.kernel.org/r/20220106102028.3345634-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Colin Ian King
f018844f83 ocfs2: remove redundant assignment to pointer root_bh
The variable 'root_bh' is being initialized with a value that is not
read, it is being re-assigned later on closer to its use.  The early
initialization is redundant and can be removed.

Link: https://lkml.kernel.org/r/20211228013719.620923-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Greg Kroah-Hartman
59430cc114 ocfs2: use default_groups in kobj_type
There are currently two ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.

Move the ocfs2 code to use default_groups field which has been the
preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.

Link: https://lkml.kernel.org/r/20211228144517.391660-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Joseph Qi
e07bf00c40 ocfs2: clearly handle ocfs2_grab_pages_for_write() return value
ocfs2_grab_pages_for_write() may return -EAGAIN if write context type is
mmap and it could not lock the target page.  In this case, we exit with
no error and no target page.  And then trigger the caller page_mkwrite()
to retry.

Since there are other caller types, e.g.  buffer and direct io, make the
return value handling more clear.

Link: https://lkml.kernel.org/r/20211206065051.103353-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Zhang Mingyu
783cc68d61 ocfs2: use BUG_ON instead of if condition followed by BUG.
This issue was detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/20211105014424.75372-1-zhang.mingyu@zte.com.cn
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Zheng Liang
9eec1d8971 squashfs: provide backing_dev_info in order to disable read-ahead
Commit c1f6925e10 ("mm: put readahead pages in cache earlier") causes
the read performance of squashfs to deteriorate.Through testing, we find
that the performance will be back by closing the readahead of squashfs.

So we want to learn the way of ubifs, provides backing_dev_info and
disable read-ahead

We tested the following data by fio.
squashfs image blocksize=128K
test command:

  fio --name basic --bs=? --filename="/mnt/test_file" --rw=? --iodepth=1 --ioengine=psync --runtime=200 --time_based

  turn on squashfs readahead in 5.10 kernel
  bs(k)      read/randread           MB/s
  4            randread              271
  128          randread              231
  1024         randread              246
  4            read                  310
  128          read                  245
  1024         read                  247

  turn off squashfs readahead in 5.10 kernel
  bs(k)      read/randread           MB/s
  4            randread              293
  128          randread              330
  1024         randread              363
  4            read                  338
  128          read                  360
  1024         read                  365

  turn on squashfs readahead and revert the
  commit c1f6925e1091("mm: put readahead
  pages in cache earlier") in 5.10 kernel
  bs(k)      read/randread           MB/s
  4           randread               289
  128         randread               306
  1024        randread               335
  4           read                   337
  128         read                   336
  1024        read                   338

Link: https://lkml.kernel.org/r/20211116113141.1391026-1-zhengliang6@huawei.com
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Hou Tao <houtao1@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Yang Li
7e0af97853 fs/ntfs/attrib.c: fix one kernel-doc comment
The comments for the file should not be in kernel-doc format:

/**
 * attrib.c - NTFS attribute operations.  Part of the Linux-NTFS

as it causes it to be incorrectly identified for function
ntfs_map_runlist_nolock(), causing some warnings found by running
scripts/kernel-doc.:

  fs/ntfs/attrib.c:25: warning: Incorrect use of kernel-doc format:  * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
  fs/ntfs/attrib.c:71: warning: Function parameter or member 'ni' not described in 'ntfs_map_runlist_nolock'
  fs/ntfs/attrib.c:71: warning: Function parameter or member 'vcn' not described in 'ntfs_map_runlist_nolock'
  fs/ntfs/attrib.c:71: warning: Function parameter or member 'ctx' not described in 'ntfs_map_runlist_nolock'
  fs/ntfs/attrib.c:71: warning: expecting prototype for attrib.c - NTFS attribute operations.  Part of the Linux(). Prototype was for ntfs_map_runlist_nolock() instead

Link: https://lkml.kernel.org/r/20220106015145.67067-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:24 +02:00
Linus Torvalds
a33f5c380c New code for 5.17:
- Fix a minor locking inconsistency in readdir
  - Fix incorrect fs feature bit validation for secondary superblocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHfFFYACgkQ+H93GTRK
 tOvrQA//RnDMit4zLoPo3WeZTDjpv1CQkOXFgQSba+Owgz1RvGn5f29R4ICKW6Uy
 NfnPcq5AeNhysOAi0uZPl4EC6140QgMrD4aJjKY1JAc/BkeBVEN05NOUFfCuBXU/
 RMzL8nGD5zsN7fPIWVGS60iitou8IvoX8ky4dKx7XcsbFzbBMtnJIsUpfqVutY9u
 i+zub6sNRkstr4uBRk+1S8uAqHUAW+21YwfKqB6pgpCoO5BHu2e4eN01ohWvF8Ru
 0ujJ2j4YlfjYmtQypFk3rNgQoI0oXY6mYWZPKr7fYvhDpoKEodUvHRLIJEHqoD+y
 fX6Ey5XxpHmDxSJnWRC7Vznl56VEmMndUcWEq2ZqROp/r/zZp7StXyHLO7DR9nEs
 mp+55a4tcKhHa/KnjbqexAaN4a1NTpryMqjsPHP7VTNu5Dq8CK4kHtrcEVrKCZFq
 ExRFMfoUDxap6iJaxoKAz5CtZyJeuZO8bLCa7jq/2F91EWp+2aclxEU6VY5r6B2X
 dFlIY8XnZEfJxE3xnhH/aDs6IKWH4YmvgtwxNb+RIupyMTfJzpcysjV9NRER8fAv
 9rLfWNz+nx0efNyXle+h+vrzT/zXgyi/0PSjAQS9/xErTPRFAmXFefLl/0oMQSHC
 XFIVivcR0lxKpZhB5CIOfVj1Vu8VM52JvlXjHPU3ObqEbFDxtf4=
 =RYJ4
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "These are the last few obvious fixes that I found while stress testing
  online fsck for XFS prior to initiating a design review of the whole
  giant machinery.

   - Fix a minor locking inconsistency in readdir

   - Fix incorrect fs feature bit validation for secondary superblocks"

* tag 'xfs-5.17-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix online fsck handling of v5 feature bits on secondary supers
  xfs: take the ILOCK when readdir inspects directory mapping data
2022-01-15 07:47:40 +02:00
Michael S. Tsirkin
d9679d0013 virtio: wrap config->reset calls
This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:52 -05:00
Miklos Szeredi
94fd19752b ovl: don't fail copy up if no fileattr support on upper
Christoph Fritz is reporting that failure to copy up fileattr when upper
doesn't support fileattr or xattr results in a regression.

Return success in these failure cases; this reverts overlayfs to the old
behavior.

Add a pr_warn_once() in these cases to still let the user know about the
copy up failures.

Reported-by: Christoph Fritz <chf.fritz@googlemail.com>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-01-14 16:57:56 +01:00
Christoph Fritz
4ee7e4a6c9 ovl: fix NULL pointer dereference in copy up warning
This patch is fixing a NULL pointer dereference to get a recently
introduced warning message working.

Fixes: 5b0a414d06 ("ovl: fix filattr copy-up failure")
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-01-14 15:59:22 +01:00
Pavel Begunkov
791f3465c4 io_uring: fix UAF due to missing POLLFREE handling
Fixes a problem described in 50252e4b5e
("aio: fix use-after-free due to missing POLLFREE handling")
and copies the approach used there.

In short, we have to forcibly eject a poll entry when we meet POLLFREE.
We can't rely on io_poll_get_ownership() as can't wait for potentially
running tw handlers, so we use the fact that wqs are RCU freed. See
Eric's patch and comments for more details.

Reported-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Reported-and-tested-by: syzbot+5426c7ed6868c705ca14@syzkaller.appspotmail.com
Fixes: 221c5eb233 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4ed56b6f548f7ea337603a82315750449412748a.1642161259.git.asml.silence@gmail.com
[axboe: drop non-functional change from patch]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-14 06:48:35 -07:00
Jiapeng Chong
c84b8a3fef io_uring: Remove unused function req_ref_put
Fix the following clang warnings:

fs/io_uring.c:1195:20: warning: unused function 'req_ref_put'
[-Wunused-function].

Fixes: aa43477b04 ("io_uring: poll rework")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220113162005.3011-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-01-13 12:43:05 -07:00
Olga Kornievskaia
4ca9f31a2b NFSv4.1 test and add 4.1 trunking transport
For each location returned in FS_LOCATION query, establish a
transport to the server, send EXCHANGE_ID and test for trunking,
if successful, add the transport to the exiting client.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:36:58 -05:00
Olga Kornievskaia
a8d54baba7 NFSv4 handle port presence in fs_location server string
An fs_location attribute returns a string that can be ipv4, ipv6,
or DNS name. An ip location can have a port appended to it and if
no port is present a default port needs to be set. If rpc_pton()
fails to parse, try calling rpc_uaddr2socaddr() that can convert
an universal address.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:36:58 -05:00
Olga Kornievskaia
f5b27cc676 NFSv4 expose nfs_parse_server_name function
Make nfs_parse_server_name available outside of nfs4namespace.c.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:36:58 -05:00
Olga Kornievskaia
1976b2b314 NFSv4.1 query for fs_location attr on a new file system
Query the server for other possible trunkable locations for a given
file system on a 4.1+ mount.

v2:
-- added missing static to nfs4_discover_trunking,
reported by the kernel test robot

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:30:48 -05:00
Jeff Layton
a0b3a15eab ceph: move CEPH_SUPER_MAGIC definition to magic.h
The uapi headers are missing the ceph definition. Move it there so
userland apps can ID cephfs.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:07 +01:00
Xiubo Li
76bdbc7ac7 ceph: remove redundant Lsx caps check
The newcaps has already included the Ls, no need to check it again.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:07 +01:00
Jeff Layton
94cc0877ca ceph: add new "nopagecache" option
CephFS is a bit unlike most other filesystems in that it only
conditionally does buffered I/O based on the caps that it gets from the
MDS. In most cases, unless there is contended access for an inode the
MDS does give Fbc caps to the client, so the unbuffered codepaths are
only infrequently traveled and are difficult to test.

At one time, the "-o sync" mount option would give you this behavior,
but that was removed in commit 7ab9b38070 ("ceph: Don't use
ceph-sync-mode for synchronous-fs.").

Add a new mount option to tell the client to ignore Fbc caps when doing
I/O, and to use the synchronous codepaths exclusively, even on
non-O_DIRECT file descriptors. We already have an ioctl that forces this
behavior on a per-file basis, so we can just always set the CEPH_F_SYNC
flag in the file description on such mounts.

Additionally, this patch also changes the client to not request Fbc when
doing direct I/O. We aren't using the cache with O_DIRECT so we don't
have any need for those caps.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:07 +01:00
Jeff Layton
0078ea3b05 ceph: don't check for quotas on MDS stray dirs
玮文 胡 reported seeing the WARN_RATELIMIT pop when writing to an
inode that had been transplanted into the stray dir. The client was
trying to look up the quotarealm info from the parent and that tripped
the warning.

Change the ceph_vino_is_reserved helper to not throw a warning for
MDS stray directories (0x100 - 0x1ff), only for reserved dirs that
are not in that range.

Also, fix ceph_has_realms_with_quotas to return false when encountering
a reserved inode.

URL: https://tracker.ceph.com/issues/53180
Reported-by: Hu Weiwen <sehuww@mail.scut.edu.cn>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:07 +01:00
Jeff Layton
af9ceae83c ceph: drop send metrics debug message
This pops every second and isn't very useful.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:06 +01:00
Kotresh HR
8e55ba8caa ceph: Fix incorrect statfs report for small quota
Problem:
The statfs reports incorrect free/available space for quota less then
CEPH_BLOCK size (4M).

Solution:
For quota less than CEPH_BLOCK size, smaller block size of 4K is used.
But if quota is less than 4K, it is decided to go with binary use/free
of 4K block. For quota size less than 4K size, report the
total=used=4K,free=0 when quota is full and total=free=4K,used=0
otherwise.

Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:06 +01:00
Venky Shankar
adbed05ed6 ceph: mount syntax module parameter
Add read-only module parameters for supported mount syntaxes. Primary
user is the user-space mount helper for catching v2 syntax bugs during
testing by cross verifying if the kernel supports v2 syntax on mount
failure.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:06 +01:00
Venky Shankar
2167f2cc68 ceph: record updated mon_addr on remount
Note that the new monitors are just shown in /proc/mounts.
Ceph does not (re)connect to new monitors yet.

[ jlayton: s/printk\(KERN_NOTICE/pr_notice(/
	   s/strcmp/strcmp_null/ ]

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:06 +01:00
Venky Shankar
7b19b4db5a ceph: new device mount syntax
Old mount device syntax (source) has the following problems:

- mounts to the same cluster but with different fsnames
  and/or creds have identical device string which can
  confuse xfstests.

- Userspace mount helper tool resolves monitor addresses
  and fill in mon addrs automatically, but that means the
  device shown in /proc/mounts is different than what was
  used for mounting.

New device syntax is as follows:

  cephuser@fsid.mycephfs2=/path

Note, there is no "monitor address" in the device string.
That gets passed in as mount option. This keeps the device
string same when monitor addresses change (on remounts).

Also note that the userspace mount helper tool is backward
compatible. I.e., the mount helper will fallback to using
old syntax after trying to mount with the new syntax.

[ idryomov: drop CEPH_MON_ADDR_MNTOPT_DELIM ]

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:06 +01:00
Venky Shankar
2d7c86a8f9 libceph: generalize addr/ip parsing based on delimiter
... and remove hardcoded function name in ceph_parse_ips().

[ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ]

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13 13:40:05 +01:00
Linus Torvalds
3acbdbf42e dax + libnvdimm for v5.17
- Simplify the dax_operations API
   - Eliminate bdev_dax_pgoff() in favor of the filesystem maintaining
     and applying a partition offset to all its DAX iomap operations.
   - Remove wrappers and device-mapper stacked callbacks for
     ->copy_from_iter() and ->copy_to_iter() in favor of moving
     block_device relative offset responsibility to the
     dax_direct_access() caller.
   - Remove the need for an @bdev in filesystem-DAX infrastructure
   - Remove unused uio helpers copy_from_iter_flushcache() and
     copy_mc_to_iter() as only the non-check_copy_size() versions are
     used for DAX.
 - Prepare XFS for the pending (next merge window) DAX+reflink support
 - Remove deprecated DEV_DAX_PMEM_COMPAT support
 - Cleanup a straggling misuse of the GUID api
 
 Tags offered after the branch was cut:
 Reviewed-by: Mike Snitzer <snitzer@redhat.com>
 Link: https://lore.kernel.org/r/Ydb/3P+8nvjCjYfO@redhat.com
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYd3dTAAKCRDfioYZHlFs
 Z//UAP9zetoTE+O7zJG7CXja4jSopSadbdbh6QKSXaqfKBPvQQD+N4US3wA2bGv8
 f/qCY62j2Hj3hUTGHs9RvTyw3JsSYAA=
 =QvDs
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax and libnvdimm updates from Dan Williams:
 "The bulk of this is a rework of the dax_operations API after
  discovering the obstacles it posed to the work-in-progress DAX+reflink
  support for XFS and other copy-on-write filesystem mechanics.

  Primarily the need to plumb a block_device through the API to handle
  partition offsets was a sticking point and Christoph untangled that
  dependency in addition to other cleanups to make landing the
  DAX+reflink support easier.

  The DAX_PMEM_COMPAT option has been around for 4 years and not only
  are distributions shipping userspace that understand the current
  configuration API, but some are not even bothering to turn this option
  on anymore, so it seems a good time to remove it per the deprecation
  schedule. Recall that this was added after the device-dax subsystem
  moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
  All recent functionality depends on /sys/bus/dax.

  Some other miscellaneous cleanups and reflink prep patches are
  included as well.

  Summary:

   - Simplify the dax_operations API:

      - Eliminate bdev_dax_pgoff() in favor of the filesystem
        maintaining and applying a partition offset to all its DAX iomap
        operations.

      - Remove wrappers and device-mapper stacked callbacks for
        ->copy_from_iter() and ->copy_to_iter() in favor of moving
        block_device relative offset responsibility to the
        dax_direct_access() caller.

      - Remove the need for an @bdev in filesystem-DAX infrastructure

      - Remove unused uio helpers copy_from_iter_flushcache() and
        copy_mc_to_iter() as only the non-check_copy_size() versions are
        used for DAX.

   - Prepare XFS for the pending (next merge window) DAX+reflink support

   - Remove deprecated DEV_DAX_PMEM_COMPAT support

   - Cleanup a straggling misuse of the GUID api"

* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
  iomap: Fix error handling in iomap_zero_iter()
  ACPI: NFIT: Import GUID before use
  dax: remove the copy_from_iter and copy_to_iter methods
  dax: remove the DAXDEV_F_SYNC flag
  dax: simplify dax_synchronous and set_dax_synchronous
  uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  iomap: turn the byte variable in iomap_zero_iter into a ssize_t
  memremap: remove support for external pgmap refcounts
  fsdax: don't require CONFIG_BLOCK
  iomap: build the block based code conditionally
  dax: fix up some of the block device related ifdefs
  fsdax: shift partition offset handling into the file systems
  dax: return the partition offset from fs_dax_get_by_bdev
  iomap: add a IOMAP_DAX flag
  xfs: pass the mapping flags to xfs_bmbt_to_iomap
  xfs: use xfs_direct_write_iomap_ops for DAX zeroing
  xfs: move dax device handling into xfs_{alloc,free}_buftarg
  ext4: cleanup the dax handling in ext4_fill_super
  ext2: cleanup the dax handling in ext2_fill_super
  fsdax: decouple zeroing from the iomap buffered I/O code
  ...
2022-01-12 15:46:11 -08:00
Linus Torvalds
8834147f95 fscache rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmHeBGsACgkQ+7dXa6fL
 C2tyLw/8C2Gs/XvOZvRO7KPetKI9BbQSFoCe7uvGbiPq5CEmgcjWzQxvQGklBiZD
 qYa6pMNye1iGpsHOY3Yu210b7vMQiRLnnxvVle0UrjpZR7CcxYS0gGV+6yRdbDGy
 W1X6GFiX06qiNsgBH4msYp0SmbhhfkTyAx1BeBZAEtX8iFgaPfOldPY2nLMcTDD6
 6FT1nTzRcMHx9IUQZJtpeatzc70Qg8+fOr2UAY2nOIypXh6+vAMBO80xtUjGVU+1
 pWD1E+8cXSLfwEEzquFWoWTsTX7hNfsesEN10FmBf1bVCH9ZDFE01MOl6B8+CkFl
 +xfkvDNFC3yyUwAMVAV4+A4Be+cVLSqN2R91QIKJnAj9w1OjxASrwZJ1YeZp6KP4
 h0XKuPs3sRwwbNPVL/nP0UPNexoJnOUAaHesl4uKkRrExmxz9xGOIqIri2+tUIO+
 HkGyNns1huymj1K1ja4AQbDiZZX39GgYVleyg9g3uuy1FS4k+/myJcXo/CqWn3ON
 4oeNwxwLvlcqIQnPrESvwev50lFZYB4pfwvez6T2C5dL/Wk/xdeJK9iG81RWgx7y
 5XcDeoGDE08gMCGWVPjuhOCXypeiRGHhRNlcxTtq5kLwBZGkcYg/wFFnWn+6hzc4
 kyXw2kS5WZq4Q/FPh7BdY0eHp6xv0EpAOZwceneLB9lhNINdxcQ=
 =ISJ6
 -----END PGP SIGNATURE-----

Merge tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache rewrite from David Howells:
 "This is a set of patches that rewrites the fscache driver and the
  cachefiles driver, significantly simplifying the code compared to
  what's upstream, removing the complex operation scheduling and object
  state machine in favour of something much smaller and simpler.

  The series is structured such that the first few patches disable
  fscache use by the network filesystems using it, remove the cachefiles
  driver entirely and as much of the fscache driver as can be got away
  with without causing build failures in the network filesystems.

  The patches after that recreate fscache and then cachefiles,
  attempting to add the pieces in a logical order. Finally, the
  filesystems are reenabled and then the very last patch changes the
  documentation.

  [!] Note: I have dropped the cifs patch for the moment, leaving local
      caching in cifs disabled. I've been having trouble getting that
      working. I think I have it done, but it needs more testing (there
      seem to be some test failures occurring with v5.16 also from
      xfstests), so I propose deferring that patch to the end of the
      merge window.

  WHY REWRITE?
  ============

  Fscache's operation scheduling API was intended to handle sequencing
  of cache operations, which were all required (where possible) to run
  asynchronously in parallel with the operations being done by the
  network filesystem, whilst allowing the cache to be brought online and
  offline and to interrupt service for invalidation.

  With the advent of the tmpfile capacity in the VFS, however, an
  opportunity arises to do invalidation much more simply, without having
  to wait for I/O that's actually in progress: Cachefiles can simply
  create a tmpfile, cut over the file pointer for the backing object
  attached to a cookie and abandon the in-progress I/O, dismissing it
  upon completion.

  Future work here would involve using Omar Sandoval's vfs_link() with
  AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new
  hard link from a tmpfile as currently I have to unlink the old file
  first.

  These patches can also simplify the object state handling as I/O
  operations to the cache don't all have to be brought to a stop in
  order to invalidate a file. To that end, and with an eye on to writing
  a new backing cache model in the future, I've taken the opportunity to
  simplify the indexing structure.

  I've separated the index cookie concept from the file cookie concept
  by C type now. The former is now called a "volume cookie" (struct
  fscache_volume) and there is a container of file cookies. There are
  then just the two levels. All the index cookie levels are collapsed
  into a single volume cookie, and this has a single printable string as
  a key. For instance, an AFS volume would have a key of something like
  "afs,example.com,1000555", combining the filesystem name, cell name
  and volume ID. This is freeform, but must not have '/' chars in it.

  I've also eliminated all pointers back from fscache into the network
  filesystem. This required the duplication of a little bit of data in
  the cookie (cookie key, coherency data and file size), but it's not
  actually that much. This gets rid of problems with making sure we keep
  netfs data structures around so that the cache can access them.

  These patches mean that most of the code that was in the drivers
  before is simply gone and those drivers are now almost entirely new
  code. That being the case, there doesn't seem any particular reason to
  try and maintain bisectability across it. Further, there has to be a
  point in the middle where things are cut over as there's a single
  point everything has to go through (ie. /dev/cachefiles) and it can't
  be in use by two drivers at once.

  ISSUES YET OUTSTANDING
  ======================

  There are some issues still outstanding, unaddressed by this patchset,
  that will need fixing in future patchsets, but that don't stop this
  series from being usable:

  (1) The cachefiles driver needs to stop using the backing filesystem's
      metadata to store information about what parts of the cache are
      populated. This is not reliable with modern extent-based
      filesystems.

      Fixing this is deferred to a separate patchset as it involves
      negotiation with the network filesystem and the VM as to how much
      data to download to fulfil a read - which brings me on to (2)...

  (2) NFS (and CIFS with the dropped patch) do not take account of how
      the cache would like I/O to be structured to meet its granularity
      requirements. Previously, the cache used page granularity, which
      was fine as the network filesystems also dealt in page
      granularity, and the backing filesystem (ext4, xfs or whatever)
      did whatever it did out of sight. However, we now have folios to
      deal with and the cache will now have to store its own metadata to
      track its contents.

      The change I'm looking at making for cachefiles is to store
      content bitmaps in one or more xattrs and making a bit in the map
      correspond to something like a 256KiB block. However, the size of
      an xattr and the fact that they have to be read/updated in one go
      means that I'm looking at covering 1GiB of data per 512-byte map
      and storing each map in an xattr. Cachefiles has the potential to
      grow into a fully fledged filesystem of its very own if I'm not
      careful.

      However, I'm also looking at changing things even more radically
      and going to a different model of how the cache is arranged and
      managed - one that's more akin to the way, say, openafs does
      things - which brings me on to (3)...

  (3) The way cachefilesd does culling is very inefficient for large
      caches and it would be better to move it into the kernel if I can
      as cachefilesd has to keep asking the kernel if it can cull a
      file. Changing the way the backend works would allow this to be
      addressed.

  BITS THAT MAY BE CONTROVERSIAL
  ==============================

  There are some bits I've added that may be controversial:

  (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check
      if a files is already being used by some other kernel service
      (e.g. a duplicate cachefiles cache in the same directory) and
      reject it if it is. This isn't entirely necessary, but it helps
      prevent accidental data corruption.

      I don't want to use S_SWAPFILE as that has other effects, but
      quite possibly swapon() should set S_KERNEL_FILE too.

      Note that it doesn't prevent userspace from interfering, though
      perhaps it should. (I have made it prevent a marked directory from
      being rmdir-able).

  (2) Cachefiles wants to keep the backing file for a cookie open whilst
      we might need to write to it from network filesystem writeback.
      The problem is that the network filesystem unuses its cookie when
      its file is closed, and so we have nothing pinning the cachefiles
      file open and it will get closed automatically after a short time
      to avoid EMFILE/ENFILE problems.

      Reopening the cache file, however, is a problem if this is being
      done due to writeback triggered by exit(). Some filesystems will
      oops if we try to open a file in that context because they want to
      access current->fs or suchlike.

      To get around this, I added the following:

      (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
          filesystem inode to indicate that we have a usage count on the
          cookie caching that inode.

      (B) A flag in struct writeback_control, unpinned_fscache_wb, that
          is set when __writeback_single_inode() clears the last dirty
          page from i_pages - at which point it clears
          I_PINNING_FSCACHE_WB and sets this flag.

          This has to be done here so that clearing I_PINNING_FSCACHE_WB
          can be done atomically with the check of PAGECACHE_TAG_DIRTY
          that clears I_DIRTY_PAGES.

      (C) A function, fscache_set_page_dirty(), which if it is not set,
          sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to
          pin the cache resources.

      (D) A function, fscache_unpin_writeback(), to be called by
          ->write_inode() to unuse the cookie.

      (E) A function, fscache_clear_inode_writeback(), to be called when
          the inode is evicted, before clear_inode() is called. This
          cleans up any lingering I_PINNING_FSCACHE_WB.

      The network filesystem can then use these tools to make sure that
      fscache_write_to_cache() can write locally modified data to the
      cache as well as to the server.

      For the future, I'm working on write helpers for netfs lib that
      should allow this facility to be removed by keeping track of the
      dirty regions separately - but that's incomplete at the moment and
      is also going to be affected by folios, one way or another, since
      it deals with pages"

Link: https://lore.kernel.org/all/510611.1641942444@warthog.procyon.org.uk/
Tested-by: Dominique Martinet <asmadeus@codewreck.org> # 9p
Tested-by: kafs-testing@auristor.com # afs
Tested-by: Jeff Layton <jlayton@kernel.org> # ceph
Tested-by: Dave Wysochanski <dwysocha@redhat.com> # nfs
Tested-by: Daire Byrne <daire@dneg.com> # nfs

* tag 'fscache-rewrite-20220111' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (67 commits)
  9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
  fscache: Add a tracepoint for cookie use/unuse
  fscache: Rewrite documentation
  ceph: add fscache writeback support
  ceph: conversion to new fscache API
  nfs: Implement cache I/O by accessing the cache directly
  nfs: Convert to new fscache volume/cookie API
  9p: Copy local writes to the cache when writing to the server
  9p: Use fscache indexing rewrite and reenable caching
  afs: Skip truncation on the server of data we haven't written yet
  afs: Copy local writes to the cache when writing to the server
  afs: Convert afs to use the new fscache API
  fscache, cachefiles: Display stat of culling events
  fscache, cachefiles: Display stats of no-space events
  cachefiles: Allow cachefiles to actually function
  fscache, cachefiles: Store the volume coherency data
  cachefiles: Implement the I/O routines
  cachefiles: Implement cookie resize for truncate
  cachefiles: Implement begin and end I/O operation
  cachefiles: Implement backing file wrangling
  ...
2022-01-12 13:45:12 -08:00
Linus Torvalds
8975f89748 fuse update for 5.17
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYdw8xgAKCRDh3BK/laaZ
 PFy3AQCHSltzy6f234CcsFk3mtJn0im0tDbRoEYFD731JOR1YAD9HQKtJRn/sMCF
 r0PnZnOWJ35RWB3o8uEqptsgrZXJkQo=
 =HosR
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Fix a regression introduced in 5.15

 - Extend the size of the FUSE_INIT request to accommodate for more
   flags. There's a slight possibility of a regression for obscure fuse
   servers; if this happens, then more complexity will need to be added
   to the protocol

 - Allow the DAX property to be controlled by the server on a per-inode
   basis in virtiofs

 - Allow sending security context to the server when creating a file or
   directory

* tag 'fuse-update-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  Documentation/filesystem/dax: DAX on virtiofs
  fuse: mark inode DONT_CACHE when per inode DAX hint changes
  fuse: negotiate per inode DAX in FUSE_INIT
  fuse: enable per inode DAX
  fuse: support per inode DAX in fuse protocol
  fuse: make DAX mount option a tri-state
  fuse: add fuse_should_enable_dax() helper
  fuse: Pass correct lend value to filemap_write_and_wait_range()
  fuse: send security context of inode on file
  fuse: extend init flags
2022-01-12 13:30:58 -08:00
Linus Torvalds
1fb38c934c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHek9YACgkQnJ2qBz9k
 QNmgUQf+NpBuhoBI83dWI7NvwMZ1srtOiFdtVctq4jT8rS5m3KnwaTbq9HZ2y+WL
 6+Tp8B2Qs/C2X6DbX87N1RBaP6mpMQCY4ADfbYNV6KOhUQqGBo/zxyM4nAk0WOBR
 wXFWNhd/c3Xr3eb5Ggaus11UdQxXBFtZ72Azm2eUXnIKuSeAH1KWbyElQGzLqMDE
 NLS68XFOcxLLScWA0lTExtouV6ZLUHm6EyYRH+TmIjAwb01JDwceXmdqzXNO7uiV
 cKalWV8GcUH2OOLdX/KXvAwDVwuJwq4PCGCajGPEI8Ebct5Eh0YZuHVcvmfReOde
 dnyZNiMOH8GwYVDHUDCNI8zD/BpyJQ==
 =4gnz
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull UDF / reiserfs updates from Jan Kara:
 "One UDF fix and one reiserfs cleanup"

* tag 'fs_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix error handling in udf_new_inode()
  reiserfs: don't use congestion_wait()
2022-01-12 13:28:06 -08:00
Linus Torvalds
3d3d673306 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmHektkACgkQnJ2qBz9k
 QNk73Qf9G/8C29GC7lTZvW8ZOqM2onvP16/wpMU98jkw6/xpsdGLMmQlP4JCdB6Y
 mTttvPxwMMVrdRmt+MNIgvWPOK2LKHdkbNbj0DRw4JxLEIm+zXddAXAcvX2vLeUn
 N24idiVQxIKWS3zJhQ6H7sFjSrZ7ztjkeHWff70LIzURIupYO0+rRmKDgOqo6D5g
 cKAdb8x9ZQLr0CTQoHKQxA8IeCSirSblVaDMdes2KrJH+i05s3aqS9jBvY16nrLo
 Xvm4Vv7ecQXcmKDswuxVcmBVlVVsWxoYv1w8+v9qRGJnbgpQkjjgCGjCh54M18Do
 /OovTVlqGWRfmf2Eo1FCXXz90ppGlQ==
 =gvMh
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fanotify updates from Jan Kara:
 "Support for new FAN_RENAME fanotify event and support for reporting
  child info in directory fanotify events (FAN_REPORT_TARGET_FID)"

* tag 'fsnotify_for_v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: wire up FAN_RENAME event
  fanotify: report old and/or new parent+name in FAN_RENAME event
  fanotify: record either old name new name or both for FAN_RENAME
  fanotify: record old and new parent and name in FAN_RENAME event
  fanotify: support secondary dir fh and name in fanotify_info
  fanotify: use helpers to parcel fanotify_info buffer
  fanotify: use macros to get the offset to fanotify_info buffer
  fsnotify: generate FS_RENAME event with rich information
  fanotify: introduce group flag FAN_REPORT_TARGET_FID
  fsnotify: separate mark iterator type from object type enum
  fsnotify: clarify object type argument
2022-01-12 13:19:35 -08:00
Linus Torvalds
f079ab01b5 Convert xfs/iomap to use folios
This should be all that is needed for XFS to use large folios.
 There is no code in this pull request to create large folios, but
 no additional changes should be needed to XFS or iomap once they
 are created.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcpaUACgkQDpNsjXcp
 gj4MUAf+ItcKfgFo1QCMT+6Y0mohVqPme/vdyOCNv6yOOfZZqN5ZQc+2hmxXrRz9
 XPOPwZKL0TttlHSYEJmrm8mqwN8UXl0kqMu4kQqOXMziiD9qpVlaLXOZ7iLdkQxu
 z/xe1iACcGfJUaQCsaMP6BZqp6iETA4qP72dBE4jc6PC4H3OI0pN/900gEbAcLxD
 Yn0a5NhrdS/EySU2aHLB6OcwhqnSiHBVjUbFiuXxuvOYyzLaERIh00Kx3jLdj4DR
 82K4TF8h2IZpALfIDSt0JG+gHLCc+EfF7Yd/xkeEv0md3ncyi+jWvFCFPNJbyFjm
 cYoDTSunfbxwszA2n01R4JM8/KkGwA==
 =IeFX
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux

Pull iomap updates from Matthew Wilcox:
 "Convert xfs/iomap to use folios.

  This should be all that is needed for XFS to use large folios. There
  is no code in this pull request to create large folios, but no
  additional changes should be needed to XFS or iomap once they are
  created.

  Usually this would have come from Darrick, and we had intended that it
  would come that route. Between the holidays and various things which
  Darrick needed to work on, he asked if I could send things directly.

  There weren't any other iomap patches pending for this release, which
  probably also played a role"

* tag 'iomap-5.17' of git://git.infradead.org/users/willy/linux: (26 commits)
  iomap: Inline __iomap_zero_iter into its caller
  xfs: Support large folios
  iomap: Support large folios in invalidatepage
  iomap: Convert iomap_migrate_page() to use folios
  iomap: Convert iomap_add_to_ioend() to take a folio
  iomap: Simplify iomap_do_writepage()
  iomap: Simplify iomap_writepage_map()
  iomap,xfs: Convert ->discard_page to ->discard_folio
  iomap: Convert iomap_write_end_inline to take a folio
  iomap: Convert iomap_write_begin() and iomap_write_end() to folios
  iomap: Convert __iomap_zero_iter to use a folio
  iomap: Allow iomap_write_begin() to be called with the full length
  iomap: Convert iomap_page_mkwrite to use a folio
  iomap: Convert readahead and readpage to use a folio
  iomap: Convert iomap_read_inline_data to take a folio
  iomap: Use folio offsets instead of page offsets
  iomap: Convert bio completions to use folios
  iomap: Pass the iomap_page into iomap_set_range_uptodate
  iomap: Add iomap_invalidate_folio
  iomap: Convert iomap_releasepage to use a folio
  ...
2022-01-12 12:51:41 -08:00
Linus Torvalds
6020c204be Convert much of the page cache to use folios
This patchset stops just short of actually enabling large folios.
 It converts everything that I noticed needs to be converted, but there may
 still be places I've overlooked which still have page size assumptions.
 The big change here is using large entries in the page cache XArray
 instead of many small entries.  That only affects shmem for now, but
 it's a pretty big change for shmem since it changes where memory needs
 to be allocated (at split time instead of insertion).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmHcraoACgkQDpNsjXcp
 gj7C3wgAl0cjtdVzTpkLmbnInsicW1m3thnbkSXYbpqRccFjpu2kEBGj31PT+oGz
 dzgXP7SNZ/VkFT+qWtmHSRF/J41B6f9bFojO81B2aQdpRiziU+5QbSbXbfUjwVhE
 GJF0WGSJtVqySKynXP/iYTEt2zj6BiVperAwIqzhZpPY7gNoyDgeRD34Xy5bQqdD
 ey6/Uwkh7oFHLEDcgxsEnyF0tUR3q+gpe5XZW1fb79p3crWw44xATc3UvKv8qCLC
 Rd4oHmKkOj4MvdiUxJEfXI+XxgrkQ8XRO70B+p6ZljhDaoDZYw7ullxA0gvlSpNX
 6pnjSQlKA1VQXsi6PMSt+9vf26XxaQ==
 =KeYZ
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache

Pull folio conversion updates from Matthew Wilcox:
 "Convert much of the page cache to use folios

  This stops just short of actually enabling large folios. It converts
  everything that I noticed needs to be converted, but there may still
  be places I've overlooked which still have page size assumptions.

  The big change here is using large entries in the page cache XArray
  instead of many small entries. That only affects shmem for now, but
  it's a pretty big change for shmem since it changes where memory needs
  to be allocated (at split time instead of insertion)"

* tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache: (49 commits)
  mm: Use multi-index entries in the page cache
  XArray: Add xas_advance()
  truncate,shmem: Handle truncates that split large folios
  truncate: Convert invalidate_inode_pages2_range to folios
  fs: Convert vfs_dedupe_file_range_compare to folios
  mm: Remove pagevec_remove_exceptionals()
  mm: Convert find_lock_entries() to use a folio_batch
  filemap: Return only folios from find_get_entries()
  filemap: Convert filemap_get_read_batch() to use a folio_batch
  filemap: Convert filemap_read() to use a folio
  truncate: Add invalidate_complete_folio2()
  truncate: Convert invalidate_inode_pages2_range() to use a folio
  truncate: Skip known-truncated indices
  truncate,shmem: Add truncate_inode_folio()
  shmem: Convert part of shmem_undo_range() to use a folio
  mm: Add unmap_mapping_folio()
  truncate: Add truncate_cleanup_folio()
  filemap: Add filemap_release_folio()
  filemap: Use a folio in filemap_page_mkwrite
  filemap: Use a folio in filemap_map_pages
  ...
2022-01-12 12:37:02 -08:00
Olga Kornievskaia
8a59bb93b7 NFSv4 store server support for fs_location attribute
Define and store if server returns it supports fs_locations attribute
as a capability.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-12 14:26:30 -05:00
Olga Kornievskaia
90e12a3191 NFSv4 remove zero number of fs_locations entries error check
Remove the check for the zero length fs_locations reply in the
xdr decoding, and instead check for that in the migration code.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-12 14:26:05 -05:00
Linus Torvalds
6dc69d3d0d driver core changes for 5.17-rc1
Here is the set of changes for the driver core for 5.17-rc1.
 
 Lots of little things here, including:
 	- kobj_type cleanups
 	- auxiliary_bus documentation updates
 	- auxiliary_device conversions for some drivers (relevant
 	  subsystems all have provided acks for these)
 	- kernfs lock contention reduction for some workloads
 	- other tiny cleanups and changes.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYd7deA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym8ngCgw0ANwrRPE5b1dthEmfU2f8Knk5kAn0pHQv6R
 VRZJypgNfU/Pt0ykstZD
 =CO9J
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of changes for the driver core for 5.17-rc1.

  Lots of little things here, including:

   - kobj_type cleanups

   - auxiliary_bus documentation updates

   - auxiliary_device conversions for some drivers (relevant subsystems
     all have provided acks for these)

   - kernfs lock contention reduction for some workloads

   - other tiny cleanups and changes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (43 commits)
  kobject documentation: remove default_attrs information
  drivers/firmware: Add missing platform_device_put() in sysfb_create_simplefb
  debugfs: lockdown: Allow reading debugfs files that are not world readable
  driver core: Make bus notifiers in right order in really_probe()
  driver core: Move driver_sysfs_remove() after driver_sysfs_add()
  firmware: edd: remove empty default_attrs array
  firmware: dmi-sysfs: use default_groups in kobj_type
  qemu_fw_cfg: use default_groups in kobj_type
  firmware: memmap: use default_groups in kobj_type
  sh: sq: use default_groups in kobj_type
  headers/uninline: Uninline single-use function: kobject_has_children()
  devtmpfs: mount with noexec and nosuid
  driver core: Simplify async probe test code by using ktime_ms_delta()
  nilfs2: use default_groups in kobj_type
  kobject: remove kset from struct kset_uevent_ops callbacks
  driver core: make kobj_type constant.
  driver core: platform: document registration-failure requirement
  vdpa/mlx5: Use auxiliary_device driver data helpers
  net/mlx5e: Use auxiliary_device driver data helpers
  soundwire: intel: Use auxiliary_device driver data helpers
  ...
2022-01-12 11:11:34 -08:00
Linus Torvalds
d3c8108035 for-5.17/block-2022-01-11
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8DAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnhRD/wMAjsNO65PCA+o/bPpVi4ulx9EejAzrJnB
 5vHFvREAoOOGKvRpYGe4w3TcKyW+zPb+GtlXFjPfK+wuVzWhrQtW/+vkjKlBt8wK
 o7rzeMwTKJ9ZGvYaaQpp1yC0WURBB3qnCRQhb8dOQzhJgEXinhIOznZsut4mniLv
 fTqcDmKAb/+G6K6CQCCqnH0I/+OJZyUeSFo1kk2i4ZqCBepQpBkOL6H2rBOtGxUg
 bt1jiGHbbhCRYEE3u2kV0HP10qAChNaMQC705jV4Qpf4+3EntSxs+6nSb74dvMkX
 3+Wmp8Ctq6lpPnDL1nrAFGz3jZnB0Y+GdgOclQn3ViQd1FCXZzuYWQ3fTaBfURCZ
 /RE5nc047SqpwCFLOynM++OkaeQZ1zSxeyoFTtzDaPF4tLuaX3JHswvTzNGPw8SN
 BnexseNnNBCjJliZSEE7fOkjJDcev2dvRxPtI8/wkF4lHUgETc5IW563C53xo/Tx
 32yFjZwCVIpNWk21su/0H3iEq80wZ7PnriiN/E3JA6XbnevlRPu0NPMb0D258GCm
 yCcdPVDNZsQCB8hluqZcu0g6LSgZRo90Yg1oqKqEpAllJJMBaEAPPPuUIJh998mo
 iKGxZzgr7d9jrbGJTInp0F8b3B3/oV/hxgzy0Hu/mHP3AsnaAk9o/oEQZ7rX4Khr
 6biloqkIMA==
 =RWnJ
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Unify where the struct request handling code is located in the blk-mq
   code (Christoph)

 - Header cleanups (Christoph)

 - Clean up the io_context handling code (Christoph, me)

 - Get rid of ->rq_disk in struct request (Christoph)

 - Error handling fix for add_disk() (Christoph)

 - request allocation cleanusp (Christoph)

 - Documentation updates (Eric, Matthew)

 - Remove trivial crypto unregister helper (Eric)

 - Reduce shared tag overhead (John)

 - Reduce poll_stats memory overhead (me)

 - Known indirect function call for dio (me)

 - Use atomic references for struct request (me)

 - Support request list issue for block and NVMe (me)

 - Improve queue dispatch pinning (Ming)

 - Improve the direct list issue code (Keith)

 - BFQ improvements (Jan)

 - Direct completion helper and use it in mmc block (Sebastian)

 - Use raw spinlock for the blktrace code (Wander)

 - fsync error handling fix (Ye)

 - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)

* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
  MAINTAINERS: add entries for block layer documentation
  docs: block: remove queue-sysfs.rst
  docs: sysfs-block: document virt_boundary_mask
  docs: sysfs-block: document stable_writes
  docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
  docs: sysfs-block: add contact for nomerges
  docs: sysfs-block: sort alphabetically
  docs: sysfs-block: move to stable directory
  block: don't protect submit_bio_checks by q_usage_counter
  block: fix old-style declaration
  nvme-pci: fix queue_rqs list splitting
  block: introduce rq_list_move
  block: introduce rq_list_for_each_safe macro
  block: move rq_list macros to blk-mq.h
  block: drop needless assignment in set_task_ioprio()
  block: remove unnecessary trailing '\'
  bio.h: fix kernel-doc warnings
  block: check minor range in device_add_disk()
  block: use "unsigned long" for blk_validate_block_size().
  block: fix error unwinding in device_add_disk
  ...
2022-01-12 10:26:52 -08:00
Linus Torvalds
42a7b4ed45 for-5.17/io_uring-2022-01-11
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8BkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkgED/4vyoNlKfuTqRzuHAZzq+PBYlMpVbn54ebm
 z2AJ2StLrZ8rflz/9ERNVYxbSS6ELf/8H2Mdjnb5uCoRza/QkMf4JnGtgT0pBavV
 uoruKTiYgQAQJmGuhxAo0TQwyW7F6qadp1eZH8CqxB2Cwj5wgSb3SB/yIlNkj6+2
 fSzVGCh0CljhrkgFG10z507VGJYmgBqE9kl2tnfYkAA4blqyowZN4boTMx3l9fm5
 2GWt0xg80lgMCzHb5m/sbqdOrMJV77TkuiAuVc+FjIfdGiVWo7XCn3q3e0qqJhRM
 A3x9mmUyF/xNrRLMgTyYxHzKLrytBibr+VxvDLr/LmLtV8viVCsgJiP5EIiAWIAt
 BCBks31QUxHaB9fyiQ43/nsRZLHmhXErLEK4D0loAXa50p4Xl4ZdaegD6txSH3FI
 mGprv4Si1kMNqSxmzrOUfFk8Xdv/vC08RWL4UbRa8xkgwWUAIhpRaYmAtmw913kb
 MgwFQuGpE9b7ae7HNdgWzyI424srCwY5UawgpzI25ZwGAVXTXDQ2qw1Lmyyo6swT
 bsYVyH/vJLvCS1tjmBtrKfJQ0Mokm1sJpaMeTs/SfSyHmAUXEsEFdXVu6bRnVkYF
 9vjHZKOl5jA5nx6/JDpW993GnHV3FGkCDfENXs3wUYY7Hu0DDYZt0CGiqGLjC7Oq
 Ow4q3aEJQA==
 =syIB
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - Support for prioritized work completions (Hao)

 - Simplification of reissue (Pavel)

 - Add support for CQE skip (Pavel)

 - Memory leak fix going to 5.15-stable (Pavel)

 - Re-write of internal poll. This both cleans up that code, and gets us
   ready to fix the POLLFREE issue (Pavel)

 - Various cleanups (GuoYong, Pavel, Hao)

* tag 'for-5.17/io_uring-2022-01-11' of git://git.kernel.dk/linux-block: (31 commits)
  io_uring: fix not released cached task refs
  io_uring: remove redundant tab space
  io_uring: remove unused function parameter
  io_uring: use completion batching for poll rem/upd
  io_uring: single shot poll removal optimisation
  io_uring: poll rework
  io_uring: kill poll linking optimisation
  io_uring: move common poll bits
  io_uring: refactor poll update
  io_uring: remove double poll on poll update
  io_uring: code clean for some ctx usage
  io_uring: batch completion in prior_task_list
  io_uring: split io_req_complete_post() and add a helper
  io_uring: add helper for task work execution code
  io_uring: add a priority tw list for irq completion work
  io-wq: add helper to merge two wq_lists
  io_uring: reuse io_req_task_complete for timeouts
  io_uring: tweak iopoll CQE_SKIP event counting
  io_uring: simplify selected buf handling
  io_uring: move up io_put_kbuf() and io_put_rw_kbuf()
  ...
2022-01-12 10:20:35 -08:00
Darrick J. Wong
4a9bca8680 xfs: fix online fsck handling of v5 feature bits on secondary supers
While I was auditing the code in xfs_repair that adds feature bits to
existing V5 filesystems, I decided to have a look at how online fsck
handles feature bits, and I found a few problems:

1) ATTR2 is added to the primary super when an xattr is set to a file,
but that isn't consistently propagated to secondary supers.  This isn't
a corruption, merely a discrepancy that repair will fix if it ever has
to restore the primary from a secondary.  Hence, if we find a mismatch
on a secondary, this is a preen condition, not a corruption.

2) There are more compat and ro_compat features now than there used to
be, but we mask off the newer features from testing.  This means we
ignore inconsistencies in the INOBTCOUNT and BIGTIME features, which is
wrong.  Get rid of the masking and compare directly.

3) NEEDSREPAIR, when set on a secondary, is ignored by everyone.  Hence
a mismatch here should also be flagged for preening, and online repair
should clear the flag.  Right now we ignore it due to (2).

4) log_incompat features are ephemeral, since we can clear the feature
bit as soon as the log no longer contains live records for a particular
log feature.  As such, the only copy we care about is the one in the
primary super.  If we find any bits set in the secondary super, we
should flag that for preening, and clear the bits if the user elects to
repair it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-01-12 09:45:21 -08:00
Linus Torvalds
f692121142 This pull request contains the following changes for UML:
- set_fs removal
 - Devicetree support
 - Many cleanups from Al
 - Various virtio and build related fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbPpwWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXkcD/9UfDRFqvgtZIlacQ/0IvN24xeq
 +f4aXoXEVyVsCd02jv9pUk3IAezIQyJf3MGtNJ4D/UFXtYfjEYjK5kJpPDP6umaZ
 ZDnpTzn29HW2aGlgOxW9gU7a3Yze629QasIRP6x7Ht+Hk5eXrvRYrgcmKtw1mm04
 SA5v5ZqP3P5r623fpsFiw4Dvl7l6MhDyFeyA2tabNnmv93HgB76PHDtV2Z+SWrC+
 ubjlfBQc87QGHW+eTvce+0qw9APMoJpNFjNN4H8P/9VcDTvw+KL2JqQ02HSMWh4z
 HeHKsv6hbty+GskBhbaWDW7867fPJ3e08TFAAAjeEiBP/CDBwjOTSr3eOw1eHgzU
 xdAqC2Bz0e5G3shClmVEzzvcP6R2cgNZjeBze5m3wQ1NKHEddk6N9t5K+4NrOpgp
 gbNN5Q4FAVOBKeQsZWG81bJKGcu7SbShgiKjlxcaRpMyp6LwyD4naauGjmCzYsbf
 Pd4ilLO1Yocf7nFs2C4vWxE4iAZ6hfQtukerIxCQfb/Y2BaWT3bcWWYHFRFy6Lq+
 hTFGnjf+Ro65QCoa1idaLaUdhwAGi6U9sjjL6G/JdQCCE3ftcXLVA9TJz9CNdMb5
 98IGznxhczOZc7rHNXOF4km5+OUrU6N+C0WRp3yOoUWcI+Ms4PXHzqIwC5cde/V7
 O/o9O1BAoBP6LE1pPg==
 =5J6E
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - set_fs removal

 - Devicetree support

 - Many cleanups from Al

 - Various virtio and build related fixes

* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (31 commits)
  um: virtio_uml: Allow probing from devicetree
  um: Add devicetree support
  um: Extract load file helper from initrd.c
  um: remove set_fs
  hostfs: Fix writeback of dirty pages
  um: Use swap() to make code cleaner
  um: header debriding - sigio.h
  um: header debriding - os.h
  um: header debriding - net_*.h
  um: header debriding - mem_user.h
  um: header debriding - activate_ipi()
  um: common-offsets.h debriding...
  um, x86: bury crypto_tfm_ctx_offset
  um: unexport handle_page_fault()
  um: remove a dangling extern of syscall_trace()
  um: kill unused cpu()
  uml/i386: missing include in barrier.h
  um: stop polluting the namespace with registers.h contents
  logic_io instance of iounmap() needs volatile on argument
  um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c
  ...
2022-01-11 15:26:52 -08:00
Linus Torvalds
5672cdfba4 This pull request contains the following changes for JFFS2, UBI and UBIFS:
JFFS2:
 - Fix for a deadlock in jffs2_write_begin()
 
 UBI:
 - Fixes in comments
 
 UBIFS:
 - Expose error counters in sysfs
 - Many bugfixes found by Hulk Robot and others
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmHbQUIWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQRhD/9dMymYBFc0pku9SPgQrbieCXTf
 O73rtO/MQTgIZR7y0VueosoUE+6P5672VKMLKt8Dp67tqoqI/LI8YiC0eYGQwklm
 U7Ui3ze3ClNoRoPWflFKpm7c9/5UthWpUdVIRQ0Q3U7qvUPHswlPMoEKzTdMZqXa
 zaOKGuUGO+4USBod4Jooo5I8KyTUezvjVQRii2xcjIy9LOgKNXImuGIX4QnLkH8Q
 GbzgSMtYsHvdo8pqE68sZJ8D7vgEpOHEKTqcVroqO3dO8cmnk41pI5hNMnYherN3
 +2yPefRza0ukxowTquM+eJB7D+/Req1ZjpthRMGVtStmmhXxYwp+wX4JLWta6Tr2
 A6mgd3j8IxU0BlhP+AAXwIcIPvYyPtGYcfY8tqjorErHHOESvk8Hhifmq49pB8qy
 UpIPMGSohZDrIIN4DIvb5tahVI0lYq7ozgPz9MJbvVfTI5Z63lntbfwGydsLCJLy
 VLuVl3tZaMg/Pwr0qIGERQbgAqTJ0nzeH68CumSTmQxOFR9J51H2jQZLJMkZOohn
 b7B6XkJ4H8xv6U2cFU9JjaaNJCJK6A3DcdCJKyzIYKwoWKPXjEGPmYEg2SgKFIhY
 jzAUYUlN+ZBI4UTZH/ua627pdgzjR/ho1Y6tqd1OetlGzZq1a6UnTmXifud8h/mX
 B3kRcodjNzbEm9dx2w==
 =rpfR
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
 "JFFS2:
   - Fix for a deadlock in jffs2_write_begin()

  UBI:
   - Fixes in comments

  UBIFS:
   - Expose error counters in sysfs
   - Many bugfixes found by Hulk Robot and others"

* tag 'for-linus-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
  ubifs: read-only if LEB may always be taken in ubifs_garbage_collect
  ubifs: fix double return leb in ubifs_garbage_collect
  ubifs: fix slab-out-of-bounds in ubifs_change_lp
  ubifs: fix snprintf() length check
  ubifs: Document sysfs nodes
  ubifs: Export filesystem error counters
  ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
  ubifs: Make use of the helper macro kthread_run()
  ubi: Fix a mistake in comment
  ubifs: Fix spelling mistakes
2022-01-11 15:23:27 -08:00
Linus Torvalds
3f67eaed57 dlm for 5.17
This set includes the normal collection of minor fixes and
 cleanups, new kmem caches for network messaging structs,
 a start on some basic tracepoints, and some new debugfs
 files for inserting test messages.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJh3J8BAAoJEDgbc8f8gGmqC0IP/0xCyLKDLmv1KcR/SbUxiP9Y
 ZmiuF1FdxqRLPN5HfU1GU4734EbWEiIZFGXDk36b4kE3UpbeddUMA6dDf9sDOJ73
 H5BYgNfSgzybIq1D3wZe4C3nH4YTSrqvZhAI2bD3mcNnUDx20LzUMaGqLkzJcsOa
 iFDVDlkNrXee/RPSuZDv/5U+L5YEC7WlKEHJ2u/INAFitakb0i1ChJGAgrJZADCw
 giznPhfkmWyesgfOxOQ7JcxcVQSedQ+7vtHYhZ5tZ5v1h2yM9LPYJ0bNpNKfHUiA
 0gf0DIpOjt4q9ClYxCsaLeK0t32qWOjiNoZaAm+lrLvZWE75CC+WP6KrDR+2x3p5
 JlgBZ24h3v7t/ldYmEo7SSZ/1lfb4+0fyk7UjPzko9ErhRFrAopAfl924bRLN3ES
 9iSgDslBT5apivNwrByDKI9flhDpMnfWtSufnYeMzFAbHzJCWJuUS8bbi9M+K0mK
 ENjmXQu1OVOmht8WM7HOA91tIUcLhr/YD2fKtYNx054TErKKRjlRb/V+NHffyYJc
 SqqyaLsAamhMBcLrAk3a88hj6bp8N4NYwrMQMus0xv3aZ23OaGCvtrj7fXuYMlcu
 sjLOTQfFFDPN1B1WvI18TDdh/TIgP/8AC+KbHf4Y+hbntO0BgEzL1/CKqT3WqN8y
 hAnK+Avj6CILRvO8ardm
 =Cst4
 -----END PGP SIGNATURE-----

Merge tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "This set includes the normal collection of minor fixes and cleanups,
  new kmem caches for network messaging structs, a start on some basic
  tracepoints, and some new debugfs files for inserting test messages"

* tag 'dlm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (32 commits)
  fs: dlm: print cluster addr if non-cluster node connects
  fs: dlm: memory cache for lowcomms hotpath
  fs: dlm: memory cache for writequeue_entry
  fs: dlm: memory cache for midcomms hotpath
  fs: dlm: remove wq_alloc mutex
  fs: dlm: use event based wait for pending remove
  fs: dlm: check for pending users filling buffers
  fs: dlm: use list_empty() to check last iteration
  fs: dlm: fix build with CONFIG_IPV6 disabled
  fs: dlm: replace use of socket sk_callback_lock with sock_lock
  fs: dlm: don't call kernel_getpeername() in error_report()
  fs: dlm: fix potential buffer overflow
  fs: dlm:Remove unneeded semicolon
  fs: dlm: remove double list_first_entry call
  fs: dlm: filter user dlm messages for kernel locks
  fs: dlm: add lkb waiters debugfs functionality
  fs: dlm: add lkb debugfs functionality
  fs: dlm: allow create lkb with specific id range
  fs: dlm: add debugfs rawmsg send functionality
  fs: dlm: let handle callback data as void
  ...
2022-01-11 15:21:54 -08:00
Linus Torvalds
8481c323e4 Various minor gfs2 cleanups and fixes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmHdubYUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpngxAAg+N74AZhPeUoPsvbVf4LKJ8eJRTY
 4329sUjORwpy2g2wN1I53vOnpSjFQ0uWxaOwsuBTmTzM2lr9+J3w/a7Chx5q6M7D
 o8dHh4BAwKGmXoCqMxvSiBtFiD/cht7YjjGGb+Q6+qa8viLky3Oksps9SsaZqMCf
 RD+33fflp3eppPZTVgo9n+zuxbMz4wT9GpS/GNZy8GhKhPM+v5cPCw1pV0Wy1b7T
 2FXsKBtk+odTzbfj88gQ+b2hoRMMQ2FLqIsqyJKajMuBwiBpuPzrm03J2B9N1OJ0
 R0Ir0mXV627SoQ4N7v12MfmSF3+XMUTEMrdxMGbjsOz35TtspfOc5+mmveVF4DBD
 T4zsK9ju6N/4OtsetOB3MF1AV5Hj0zOvuFZ1y71Bk9RBAJvLrzbHc17KtMEYbVgQ
 XUQYY6IGYiVBr8n8x9t3LaPhOzYXwvgFwAmetZE8vYkLZQEQlgRVYJymFR8iluLo
 oIPy1ciZgyMRDCa1/OHAIV604qwTChRLzO6WnDhdwWPwF/Sx4cHnhNVyR/5cp1ub
 GG6E4o/jopeblftQHO5j+GIZAAwo8abU4ahJXUJaPQVdyY3VjOPhd+ooTrNbEEug
 qijL1fIojcalDyCG2cIsXlFtXSb9Txr3L1fxOkluwmwVdR9LHJSEeDxYvIN20L/+
 yqnHctyg9+HNsIc=
 =icLi
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:
 "Various minor gfs2 cleanups and fixes"

* tag 'gfs2-v5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: dump inode object for iopen glocks
  gfs2: Fix gfs2_instantiate description
  gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDED
  gfs2: remove redundant set of INSTANTIATE_NEEDED
  gfs2: Fix __gfs2_holder_init function name in kernel-doc comment
2022-01-11 15:20:32 -08:00
Darrick J. Wong
65552b02a1 xfs: take the ILOCK when readdir inspects directory mapping data
I was poking around in the directory code while diagnosing online fsck
bugs, and noticed that xfs_readdir doesn't actually take the directory
ILOCK when it calls xfs_dir2_isblock.  xfs_dir_open most probably loaded
the data fork mappings and the VFS took i_rwsem (aka IOLOCK_SHARED) so
we're protected against writer threads, but we really need to follow the
locking model like we do in other places.

To avoid unnecessarily cycling the ILOCK for fairly small directories,
change the block/leaf _getdents functions to consume the ILOCK hold that
the parent readdir function took to decide on a _getdents implementation.

It is ok to cycle the ILOCK in readdir because the VFS takes the IOLOCK
in the appropriate mode during lookups and writes, and we don't want to
be holding the ILOCK when we copy directory entries to userspace in case
there's a page fault.  We really only need it to protect against data
fork lookups, like we do for other files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-01-11 15:11:04 -08:00
Linus Torvalds
1dbfae0113 Convert ext4 to use the new mount API, and add support for the
FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls.  In addition the usual
 large number of clean ups and bug fixes, in particular for the
 fast_commit feature.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmHcr8sACgkQ8vlZVpUN
 gaOuMwgAgunXEni8yED1QXAp+F1nigxx8K6O6fpjzVjiUFlH/cgc3M5PEZiV7Pgj
 rIsNliCLQjDQPA7XxInvMuDXEvGRdleaqGS5MCKBR9898acaCTn0eBCK8ZKfhlIs
 tJus706kOjy8IXRgEJkCrxEdrsJ57rgqHk/bstipNVg4lnfYYSUKH8k0ob4aaoiW
 irufs1dSo7KmpwL0c/z9yIei3ST4DA2Dc/eBS0hsdgKtr/zMmwSanxqbQJTXz8ra
 FTZ23HAu/zxWZ9GM4p3DhuVaMWaulm6eEqJKO+loeU9xOeItn0WmSJW260KR0cmj
 0nPAflMmpQVBhtO9gjXcSiORbd6/zg==
 =9HNa
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Convert ext4 to use the new mount API, and add support for the
  FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls.

  In addition the usual large number of clean ups and bug fixes, in
  particular for the fast_commit feature"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits)
  ext4: don't use the orphan list when migrating an inode
  ext4: use BUG_ON instead of if condition followed by BUG
  ext4: fix a copy and paste typo
  ext4: set csum seed in tmp inode while migrating to extents
  ext4: remove unnecessary 'offset' assignment
  ext4: remove redundant o_start statement
  ext4: drop an always true check
  ext4: remove unused assignments
  ext4: remove redundant statement
  ext4: remove useless resetting io_end_size in mpage_process_page()
  ext4: allow to change s_last_trim_minblks via sysfs
  ext4: change s_last_trim_minblks type to unsigned long
  ext4: implement support for get/set fs label
  ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specified
  ext4: don't use kfree() on rcu protected pointer sbi->s_qf_names
  ext4: avoid trim error on fs with small groups
  ext4: fix an use-after-free issue about data=journal writeback mode
  ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'
  ext4: initialize err_blk before calling __ext4_get_inode_loc
  ext4: fix a possible ABBA deadlock due to busy PA
  ...
2022-01-11 15:07:49 -08:00
Linus Torvalds
11fc88c2e4 New code for 5.17:
- Fix log recovery with da btree buffers when metauuid is in use.
  - Fix type coercion problems in xattr buffer size validation.
  - Fix a bug in online scrub dir leaf bestcount checking.
  - Only run COW recovery when recovering the log.
  - Fix symlink target buffer UAF problems and symlink locking problems
    by not exposing xfs innards to the VFS.
  - Fix incorrect quotaoff lock usage.
  - Don't let transactions cancel cleanly if they have deferred work
    items attached.
  - Fix a UAF when we're deciding if we need to relog an intent item.
  - Reduce kvmalloc overhead for log shadow buffers.
  - Clean up sysfs attr group usage.
  - Fix a bug where scrub's bmap/rmap checking could race with a quota
    file block allocation due to insufficient locking.
  - Teach scrub to complain about invalid project ids.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmHXOL0ACgkQ+H93GTRK
 tOulDQ/+PwvFyROQePQo+5Z7fDb92VAKX6UxFYzWZGXj3zhRmTX7fvNSniPxKqHO
 K0GMnZ9rJxvuo/MRuSEPILl+2rKh59Efo9IPcs2TR+xO+vI/EKnch2/gPLBN1y7Y
 A4peQuK6CvUjYTyMazR2PcK8y76ZW8iXyuQMxnpmYldZBCZ+v7myVoVlPCRrZ68W
 EoEQ/cDf4fmsycSG4Eh9WTddbqd9bfl1+jXEz/pzlbhJSvQOaqynR+21ahxhEQ/f
 CzjSM4kXQ9AWTfHOPnEXs+Hf02NfcKRMuguzO69gJZhooEXJlgeYAZXUKqXpdmbC
 2Tc+x/rJnQv9PtzEw/WoWMem4rhwjBX7cVi2E/SJHB46MLI5rjhQilIdWzecWd8q
 nOrjIZlReI0vxYSqe1ifejxiCsV6gv3XdyLNacjxf7ISkU4XF0caO8f+3/ocIScV
 E7k+XN13pArUiRZXOt8UL571Hoa6nhtHjyAUg6i62AFI2R9XYUEjEXKMwX7ovRVO
 8eGdeQqioHkmOkF8HGyyurB9+itccuW8dtl7YOqEQJMGXeyk7rsvpcPahKKKoiDr
 8w3bpZlNiOfZEnRcNjO9B/Egu6LLMudti2ptwlUvfvWsfa8VTO8v0h5pqVhwuwrk
 lwQThWJ5THe/MptWybdZ3kSjhMvzqi4GO75T30Hj6XhecC3dFfo=
 =RYEb
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "The big new feature here is that the mount code now only bothers to
  try to free stale COW staging extents if the fs unmounted uncleanly.
  This should reduce mount times, particularly on filesystems supporting
  reflink and containing a large number of allocation groups.

  Everything else this cycle are bugfixes, as the iomap folios
  conversion should be plenty enough excitement for anyone. That and I
  ran out of brain bandwidth after Thanksgiving last year.

  Summary:

   - Fix log recovery with da btree buffers when metauuid is in use.

   - Fix type coercion problems in xattr buffer size validation.

   - Fix a bug in online scrub dir leaf bestcount checking.

   - Only run COW recovery when recovering the log.

   - Fix symlink target buffer UAF problems and symlink locking problems
     by not exposing xfs innards to the VFS.

   - Fix incorrect quotaoff lock usage.

   - Don't let transactions cancel cleanly if they have deferred work
     items attached.

   - Fix a UAF when we're deciding if we need to relog an intent item.

   - Reduce kvmalloc overhead for log shadow buffers.

   - Clean up sysfs attr group usage.

   - Fix a bug where scrub's bmap/rmap checking could race with a quota
     file block allocation due to insufficient locking.

   - Teach scrub to complain about invalid project ids"

* tag 'xfs-5.17-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: warn about inodes with project id of -1
  xfs: hold quota inode ILOCK_EXCL until the end of dqalloc
  xfs: Remove redundant assignment of mp
  xfs: reduce kvmalloc overhead for CIL shadow buffers
  xfs: sysfs: use default_groups in kobj_type
  xfs: prevent UAF in xfs_log_item_in_current_chkpt
  xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()
  xfs: Fix comments mentioning xfs_ialloc
  xfs: check sb_meta_uuid for dabuf buffer recovery
  xfs: fix a bug in the online fsck directory leaf1 bestcount check
  xfs: only run COW extent recovery when there are no live extents
  xfs: don't expose internal symlink metadata buffers to the vfs
  xfs: fix quotaoff mutex usage now that we don't support disabling it
  xfs: shut down filesystem if we xfs_trans_cancel with deferred work items
2022-01-11 15:01:50 -08:00
Linus Torvalds
d601e58c5f for-5.17-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmHcX64ACgkQxWXV+ddt
 WDsIyA//UGMG0waxz40RksQL+4AnTIJ+apmmxq5VGEE8y6aUPAGKm++5Cs9cC+Os
 5xuZhNnD8hjB+eWH+tRCx8ll+T4g4skGKDCyD0fWNtZAlW8pnsIbztdO0nNYx9C0
 V++vu/hQR6M8E7ORlayEKBWy2/UnBG5p/XVLPG4RJ4vMETJPl2RLWVDpu3dj09kf
 YnD3AY0vmKEyCu/b9NtSzfZMO2/lXT2U41ezLJJfmPAXcMJ0EeSAazACVDQyq24p
 wnnr6xmdo7ZR0oGFLUmBmfxbKwd3l0JIUsi/XRysXe+8y7raIE0gKCg7NpCS276T
 BZrKmefxHhdMCA1HLlH6AkrKmQUgIaceLkXTanTYv4cnVzb6XoeV6R4IO9/JQ0yv
 YsdCL7eZ4Vl3ToPlQkYWdwUNP5UjVMg5qMwxchigbwq7jViLJtiu3WKy0O0TitzB
 n4o0cCdlv3lHRp8FS6cFmbCrsavFT8/q3vz/aRdkrojOKE0jEWVJiz0bf39j5BVO
 IiJ3RF3kbpG7TZz0+eNUbgebME8zwaWfyd6t2L7Ztvx9F4elzSW95iwGc9//TZvl
 ciNFI9LQvnZRymUItxg0HNXfbJMXJAE9ImTxA8HLkfQYpaBlwbt4avkQK56LhpN+
 nFCd5cxJy5HLFIvkRDfqpF+C247p5FICzd/hs59/cXlCynulglg=
 =9lg+
 -----END PGP SIGNATURE-----

Merge tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This end of the year branch is intentionally not that exciting. Most
  of the changes are under the hood, but there are some minor user
  visible improvements and several performance improvements too.

  Features:

   - make send work with concurrent block group relocation.

     We're not allowed to prevent send failing or silently producing
     some bad stream but with more fine grained locking and checks it's
     possible. The send vs deduplication exclusion could reuse the same
     logic in the future.

   - new exclusive operation 'balance paused' to allow adding a device
     to filesystem with paused balance

   - new sysfs file for fsid stored in the per-device directory to help
     distinguish devices when seeding is enabled, the fsid may differ
     from the one reported by the filesystem

  Performance improvements:

   - less metadata needed for directory logging, directory deletion is
     20-40% faster

   - in zoned mode, cache zone information during mount to speed up
     repeated queries (about 50% speedup)

   - free space tree entries get indexed and searched by size (latency
     -30%, search run time -30%)

   - less contention in tree node locking when inserting a key and no
     splits are needed (files/sec in fsmark improves by 1-20%)

  Fixes:

   - fix ENOSPC failure when attempting direct IO write into NOCOW range

   - fix deadlock between quota enable and other quota operations

   - global reserve minimum calculations fixed to account for free space
     tree

   - in zoned mode, fix condition for chunk allocation that may not find
     the right zone for reuse and could lead to early ENOSPC

  Core:

   - global reserve stealing got simplified and cleaned up in evict

   - remove async transaction commit based on manual transaction refs,
     reuse existing kthread and mechanisms to let it commit transaction
     before timeout

   - preparatory work for extent tree v2, add wrappers for global tree
     roots, truncation path cleanups

   - remove readahead framework, it's a bit overengineered and used only
     for scrub, and yet it does not cover all its needs, there is
     another readahead built in the b-tree search that is now used,
     performance drop on HDD is about 5% which is acceptable and scrub
     is often throttled anyway, on SSDs there's no reported drop but
     slight improvement

   - self tests report extent tree state when error occurs

   - replace assert with debugging information when an uncommitted
     transaction is found at unmount time

  Other:

   - error handling improvements

   - other cleanups and refactoring"

* tag 'for-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
  btrfs: output more debug messages for uncommitted transaction
  btrfs: respect the max size in the header when activating swap file
  btrfs: fix argument list that the kdoc format and script verified
  btrfs: remove unnecessary parameter type from compression_decompress_bio
  btrfs: selftests: dump extent io tree if extent-io-tree test failed
  btrfs: scrub: cleanup the argument list of scrub_stripe()
  btrfs: scrub: cleanup the argument list of scrub_chunk()
  btrfs: remove reada infrastructure
  btrfs: scrub: use btrfs_path::reada for extent tree readahead
  btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity()
  btrfs: refactor unlock_up
  btrfs: skip transaction commit after failure to create subvolume
  btrfs: zoned: fix chunk allocation condition for zoned allocator
  btrfs: add extent allocator hook to decide to allocate chunk or not
  btrfs: zoned: unset dedicated block group on allocation failure
  btrfs: zoned: drop redundant check for REQ_OP_ZONE_APPEND and btrfs_is_zoned
  btrfs: zoned: sink zone check into btrfs_repair_one_zone
  btrfs: zoned: simplify btrfs_check_meta_write_pointer
  btrfs: zoned: encapsulate inode locking for zoned relocation
  btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device
  ...
2022-01-11 14:53:40 -08:00
Linus Torvalds
9149fe8ba7 Changes since last update:
- add sysfs interface and a sysfs node to control sync decompression;
 
  - add tail-packing inline support for compressed files;
 
  - get rid of erofs_get_meta_page().
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYduH/BEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBNPdAP9hKomD1hRiFeCWlLA1nDXYkqGbbt6+D3HT
 cm4G7DgVBAD+O+RWv6JVYg1zAAFlKmxqEKEfoDLKI65wAjH1V/h/dQE=
 =cEam
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, tail-packing data inline for compressed files is now
  supported so that tail pcluster can be stored and read together with
  inode metadata in order to save data I/O and storage space.

  In addition to that, to prepare for the upcoming subpage, folio and
  fscache features, we also introduce meta buffers to get rid of
  erofs_get_meta_page() since it was too close to the page itself.

  In addition, in order to show supported kernel features and control
  sync decompression strategy, new sysfs nodes are introduced in this
  cycle as well.

  Summary:

   - add sysfs interface and a sysfs node to control sync decompression

   - add tail-packing inline support for compressed files

   - get rid of erofs_get_meta_page()"

* tag 'erofs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use meta buffers for zmap operations
  erofs: use meta buffers for xattr operations
  erofs: use meta buffers for super operations
  erofs: use meta buffers for inode operations
  erofs: introduce meta buffer operations
  erofs: add on-disk compressed tail-packing inline support
  erofs: support inline data decompression
  erofs: support unaligned data decompression
  erofs: introduce z_erofs_fixup_insize
  erofs: tidy up z_erofs_lz4_decompress
  erofs: clean up erofs_map_blocks tracepoints
  erofs: Replace zero-length array with flexible-array member
  erofs: add sysfs node to control sync decompression strategy
  erofs: add sysfs interface
  erofs: rename lz4_0pading to zero_padding
2022-01-11 14:51:10 -08:00
David Howells
d7bdba1c81 9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a
test for __GFP_DIRECT_RECLAIM being set) is used to determine if
->releasepage() should wait for the completion of a DIO write to fscache
with something like:

	if (folio_test_fscache(folio)) {
		if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
			return false;
		folio_wait_fscache(folio);
	}

Instead, current_is_kswapd() should be used instead.

Note that this is based on a patch originally by Zhaoyang Huang[1].  In
addition to extending it to the other network filesystems and putting it on
top of my fscache rewrite, it also needs to include linux/swap.h in a bunch
of places.  Can current_is_kswapd() be moved to linux/mm.h?

Changes
=======
ver #5:
 - Dropping the changes for cifs.

Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <smfrench@gmail.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: linux-cachefs@redhat.com
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1]
Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:27:42 +00:00
Linus Torvalds
5dfbfe71e3 fs.idmapped.v5.17
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYdRCkgAKCRCRxhvAZXjc
 olrvAQCdp8LWkT8TauJSl8wmUm3mZhNy+5+fXuCUSwe3PyUtTQEAq4fxm41JpG8u
 WCZTrrxVhaXwgUY3aWzzeQnLCZjtEQw=
 =woqV
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull fs idmapping updates from Christian Brauner:
 "This contains the work to enable the idmapping infrastructure to
  support idmapped mounts of filesystems mounted with an idmapping.

  In addition this contains various cleanups that avoid repeated
  open-coding of the same functionality and simplify the code in quite a
  few places.

  We also finish the renaming of the mapping helpers we started a few
  kernel releases back and move them to a dedicated header to not
  continue polluting the fs header needlessly with low-level idmapping
  helpers. With this series the fs header only contains idmapping
  helpers that interact with fs objects.

  Currently we only support idmapped mounts for filesystems mounted
  without an idmapping themselves. This was a conscious decision
  mentioned in multiple places (cf. [1]).

  As explained at length in [3] it is perfectly fine to extend support
  for idmapped mounts to filesystem's mounted with an idmapping should
  the need arise. The need has been there for some time now (cf. [2]).

  Before we can port any filesystem that is mountable with an idmapping
  to support idmapped mounts in the coming cycles, we need to first
  extend the mapping helpers to account for the filesystem's idmapping.
  This again, is explained at length in our documentation at [3] and
  also in the individual commit messages so here's an overview.

  Currently, the low-level mapping helpers implement the remapping
  algorithms described in [3] in a simplified manner as we could rely on
  the fact that all filesystems supporting idmapped mounts are mounted
  without an idmapping.

  In contrast, filesystems mounted with an idmapping are very likely to
  not use an identity mapping and will instead use a non-identity
  mapping. So the translation step from or into the filesystem's
  idmapping in the remapping algorithm cannot be skipped for such
  filesystems.

  Non-idmapped filesystems and filesystems not supporting idmapped
  mounts are unaffected by this change as the remapping algorithms can
  take the same shortcut as before. If the low-level helpers detect that
  they are dealing with an idmapped mount but the underlying filesystem
  is mounted without an idmapping we can rely on the previous shortcut
  and can continue to skip the translation step from or into the
  filesystem's idmapping. And of course, if the low-level helpers detect
  that they are not dealing with an idmapped mount they can simply
  return the relevant id unchanged; no remapping needs to be performed
  at all.

  These checks guarantee that only the minimal amount of work is
  performed. As before, if idmapped mounts aren't used the low-level
  helpers are idempotent and no work is performed at all"

Link: 2ca4dcc490 ("fs/mount_setattr: tighten permission checks") [1]
Link: https://github.com/containers/podman/issues/10374 [2]
Link: Documentations/filesystems/idmappings.rst [3]
Link: a65e58e791 ("fs: document and rename fsid helpers") [4]

* tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: support mapped mounts of mapped filesystems
  fs: add i_user_ns() helper
  fs: port higher-level mapping helpers
  fs: remove unused low-level mapping helpers
  fs: use low-level mapping helpers
  docs: update mapping documentation
  fs: account for filesystem mappings
  fs: tweak fsuidgid_has_mapping()
  fs: move mapping helpers
  fs: add is_idmapped_mnt() helper
2022-01-11 14:26:55 -08:00
David Howells
e6435f1e02 fscache: Add a tracepoint for cookie use/unuse
Add a tracepoint to track fscache_use/unuse_cookie().

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/164021588628.640689.12942919367404043608.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00
Jeff Layton
1702e79734 ceph: add fscache writeback support
When updating the backing store from the pagecache (a'la writepage or
writepages), write to the cache first. This allows us to keep caching
files even when they are being written, as long as we have appropriate
caps.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20211129162907.149445-3-jlayton@kernel.org/ # v1
Link: https://lore.kernel.org/r/20211207134451.66296-3-jlayton@kernel.org/ # v2
Link: https://lore.kernel.org/r/163906985808.143852.1383891557313186623.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967190257.1823006.16713609520911954804.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021585020.640689.6765214932458435472.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00
Jeff Layton
400e1286c0 ceph: conversion to new fscache API
Now that the fscache API has been reworked and simplified, change ceph
over to use it.

With the old API, we would only instantiate a cookie when the file was
open for reads. Change it to instantiate the cookie when the inode is
instantiated and call use/unuse when the file is opened/closed.

Also, ensure we resize the cached data on truncates, and invalidate the
cache in response to the appropriate events. This will allow us to
plumb in write support later.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20211129162907.149445-2-jlayton@kernel.org/ # v1
Link: https://lore.kernel.org/r/20211207134451.66296-2-jlayton@kernel.org/ # v2
Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021581427.640689.14128682147127509264.stgit@warthog.procyon.org.uk/ # v4
2022-01-11 22:13:01 +00:00