Commit graph

461 commits

Author SHA1 Message Date
Christian Brauner
7fd2ca4a86 nfsd: use vfs setgid helper
commit 2d8ae8c417 upstream.

We've aligned setgid behavior over multiple kernel releases. The details
can be found in commit cf619f8919 ("Merge tag 'fs.ovl.setgid.v6.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") and
commit 426b4ca2d6 ("Merge tag 'fs.setgid.v6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux").
Consistent setgid stripping behavior is now encapsulated in the
setattr_should_drop_sgid() helper which is used by all filesystems that
strip setgid bits outside of vfs proper. Usually ATTR_KILL_SGID is
raised in e.g., chown_common() and is subject to the
setattr_should_drop_sgid() check to determine whether the setgid bit can
be retained. Since nfsd is raising ATTR_KILL_SGID unconditionally it
will cause notify_change() to strip it even if the caller had the
necessary privileges to retain it. Ensure that nfsd only raises
ATR_KILL_SGID if the caller lacks the necessary privileges to retain the
setgid bit.

Without this patch the setgid stripping tests in LTP will fail:

> As you can see, the problem is S_ISGID (0002000) was dropped on a
> non-group-executable file while chown was invoked by super-user, while

[...]

> fchown02.c:66: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700

[...]

> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700

With this patch all tests pass.

Reported-by: Sherry Yang <sherry.yang@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:36:54 +02:00
Jeff Layton
d53d70084d nfsd: make a copy of struct iattr before calling notify_change
notify_change can modify the iattr structure. In particular it can
end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing
a BUG() if the same iattr is passed to notify_change more than once.

Make a copy of the struct iattr before calling notify_change.

Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969
Tested-by: Zhi Li <yieli@redhat.com>
Fixes: 34b91dda71 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-23 09:46:26 -04:00
Chuck Lever
22b620ec0b NFSD: Clean up xattr memory allocation flags
Tetsuo Handa points out:
> Since GFP_KERNEL is "GFP_NOFS | __GFP_FS", usage like
> "GFP_KERNEL | GFP_NOFS" does not make sense.

The original intent was to hold the inode lock while estimating
the buffer requirements for the requested information. Frank van
der Linden, the author of NFSD's xattr code, says:

> ... you need inode_lock to get an atomic view of an xattr. Since
> both nfsd_getxattr and nfsd_listxattr to the standard trick of
> querying the xattr length with a NULL buf argument (just getting
> the length back), allocating the right buffer size, and then
> querying again, they need to hold the inode lock to avoid having
> the xattr changed from under them while doing that.
>
> From that then flows the requirement that GFP_FS could cause
> problems while holding i_rwsem, so I added GFP_NOFS.

However, Dave Chinner states:
> You can do GFP_KERNEL allocations holding the i_rwsem just fine.
> All that it requires is the caller holds a reference to the
> inode ...

Since these code paths acquire a dentry, they do indeed hold a
reference. It is therefore safe to use GFP_KERNEL for these memory
allocations. In particular, that's what this code is already doing;
but now the C source code looks sane too.

At a later time we can revisit in order to remove the inode lock in
favor of simply retrying if the estimated buffer size is too small.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-27 18:49:24 -04:00
Chuck Lever
0f5162480b NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor()
There have been several bugs over the years where the NFSD splice
actor has attempted to write outside the rq_pages array.

This is a "should never happen" condition, but if for some reason
the pipe splice actor should attempt to walk past the end of
rq_pages, it needs to terminate the READ operation to prevent
corruption of the pointer addresses in the fields just beyond the
array.

A server crash is thus prevented. Since the code is not behaving,
the READ operation returns -EIO to the client. None of the READ
payload data can be trusted if the splice actor isn't operating as
expected.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2023-04-26 09:05:01 -04:00
Linus Torvalds
a2eaf246f5 nfsd-6.3 fixes:
- Fix a crash during NFS READs from certain client implementations
 - Address a minor kbuild regression in v6.3
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQaEJgACgkQM2qzM29m
 f5ceYBAAiGHz+ISIIirg2+B5DttDL5ssPoqbgfZ16JZfBgaDduOg8oK7pbILv1nf
 4ufZQRcLpJDSnzhqNXxQE/L4Vsei8+tqkbwILSqFsCBZk5ei2AYSsZMu7Ptf8X8X
 Y+KZ55Bko0uTYAsglnPiEaDEGeq0ZYwfUc5/bnw7ZzXdAHhlwgjoAads6Q25A9f1
 6ZuKQ6nIis/ok9q88GdLf7x51IdKrXTrkC1iYAeVTXCgR3qXPezqmPOzW44rZs0t
 TMzUgDuyPgvB5eT/N+pwXoI9Bli9v1PFoXZjm2NchhJ/Iy6Gt1eEEB93J/qRSU/x
 ziJLkk0jnJEPWJ0Ht5li4+60b58t2OoWCTJTr3FzHCcF5Iqr7YjenSi2U8IT84ct
 EeFdqR2nZpJ4Y1wbAhTjQ2w6rw5WpaXDLHCRRdE7OF/D121n3FkSNo8DtSn+fbR8
 sFDpOruZn4tjgpf3AjCR1zbyKw9baC6clhnLrW6AE6T4ht9XwfE6eCUSy48FSMzg
 4IZKLcofQh2UEvlVtCvGOjtcK151eFKocsq+p6/yhYE6BM20Z3dBqhw9iBzqOAPg
 0qZk60AWl9W5h7M+ovkGAZpwXv4djsdWf7LbFuYOQ0kodzXOQ/5sFQm4zC5F3I1Z
 8WPPA9SkYyNiNSBUt65tca4JNw5h6x9A3scPWtuHB7Lz3fuGTnE=
 =frHb
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a crash during NFS READs from certain client implementations

 - Address a minor kbuild regression in v6.3

* tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't replace page in rq_pages if it's a continuation of last page
  NFS & NFSD: Update GSS dependencies
2023-03-21 14:48:38 -07:00
Jeff Layton
27c934dd88 nfsd: don't replace page in rq_pages if it's a continuation of last page
The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.

nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.

If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()") removed the check
for it.

Fixes: 91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dario Lesca <d.lesca@solinos.it>
Tested-by: David Critch <dcritch@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-17 18:18:15 -04:00
Linus Torvalds
92cadfcffa nfsd-6.3 fixes:
- Protect NFSD writes against filesystem freezing
 - Fix a potential memory leak during server shutdown
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQLPvQACgkQM2qzM29m
 f5crgRAAiuZvYynBppTxeEZ9ITLRJCgeYhTCJDBFngh3NZtW0UxrTUfYIry/MJTm
 j520mvdzx8+OVyasUJCMoy+dY/1toeCaCspfEdyul47JRVrxuzJokbYzyV5fFZav
 q3vDNHdBXOkrJkXvJd+VtbdZa2iYBMIQ1pTc6vG2FbUaxfNqZyg7uKug/R2Q/Ipj
 tF587jXQuMvpPwlXOeFol3fhLQCO9KjTrZF/d4+m5+KTSQ4sgzrgoWuVD/39Is4Q
 bVx5hBX74PFNHaFMqviIjtIEFjrOfe6wVjdNgbmpzbEiUm9VGZ6zJ2k+LsnawMIw
 8W3178Yjp+SEb21X+b+pe7/efIqD01SOX36zUOETL9gOvX3aMHbTuDGERQlCBUdZ
 9pcUwWfzeGzQSsnEvEi05HlnrDQpeavAc4tVYWgiEvO2S8cngXgsuoGhUO2ov8qY
 scq4PQGclpmR6IMYDJLL6JbOvfBDIMoFHGbDbdkjpkkhoRHgtQgnRebOgB20bxom
 D3XPk39U2CbVX9UJqgPN0yvA8L4biG4RPNZZNOjFtOup6oggpzwgN2M9h1ZhuulG
 bTOjHWl9NJmSNIYUDzOyH7m98UCI1pEXcH1kkV7vjjAIAPbn3ovXgzMjSBqDVvNj
 tyqGujS4snRb5/NI8g6O+VyTbMSnimu+tZpUvKH+4oSBH9ZAUVI=
 =z6NN
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Protect NFSD writes against filesystem freezing

 - Fix a potential memory leak during server shutdown

* tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix a server shutdown leak
  NFSD: Protect against filesystem freezing
2023-03-10 08:45:30 -08:00
Chuck Lever
fd9a2e1d51 NFSD: Protect against filesystem freezing
Flole observes this WARNING on occasion:

[1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0

Reported-by: <flole@flole.de>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123
Fixes: 73da852e38 ("nfsd: use vfs_iter_read/write")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-07 09:33:42 -05:00
Linus Torvalds
9fc2f99030 NFSD 6.3 Release Notes
Two significant security enhancements are part of this release:
 
 * NFSD's RPC header encoding and decoding, including RPCSEC GSS
   and gssproxy header parsing, has been overhauled to make it
   more memory-safe.
 
 * Support for Kerberos AES-SHA2-based encryption types has been
   added for both the NFS client and server. This provides a clean
   path for deprecating and removing insecure encryption types
   based on DES and SHA-1. AES-SHA2 is also FIPS-140 compliant, so
   that NFS with Kerberos may now be used on systems with fips
   enabled.
 
 In addition to these, NFSD is now able to handle crossing into an
 auto-mounted mount point on an exported NFS mount. A number of
 fixes have been made to NFSD's server-side copy implementation.
 
 RPC metrics have been converted to per-CPU variables. This helps
 reduce unnecessary cross-CPU and cross-node memory bus traffic,
 and significantly reduces noise when KCSAN is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmPzgiYACgkQM2qzM29m
 f5dB2A//eqjpj+FgAN+UjygrwMC4ahAsPX3Sc3FG8/lTAiao3NFVFY2gxAiCPyVE
 CFk+tUyfL23oXvbyfIBe3LhxSBOf621xU6up2OzqAzJqh1Q9iUWB6as3I14to8ZU
 sWpxXo5ofwk1hzkbrvOAVkyfY0emwsr00iBeWMawkpBe8FZEQA31OYj3/xHr6bBI
 zEVlZPBZAZlp0DZ74tb+bBLs/EOnqKj+XLWcogCH13JB3sn2umF6cQNkYgsxvHGa
 TNQi4LEdzWZGme242LfBRiGGwm1xuVIjlAhYV/R1wIjaknE3QBzqfXc6lJx74WII
 HaqpRJGrKqdo7B+1gaXCl/AMS7YluED1CBrxuej0wBG7l2JEB7m2MFMQ4LTQjgsn
 nrr3P70DgbB4LuPCPyUS7dtsMmUXabIqP7niiCR4T1toH6lBmHAgEi4cFmkzg7Cd
 EoFzn888mtDpfx4fghcsRWS5oKXEzbPJfu5+IZOD63+UB+NGpi0Xo2s23sJPK8vz
 kqK/X63JYOUxWUvK0zkj/c/wW1cLqIaBwnSKbShou5/BL+cZVI+uJYrnEesgpoB2
 5fh/cZv3hdcoOPO7OfcjCLQYy4J6RCWajptnk/hcS3lMvBTBrnq697iAqCVURDKU
 Xfmlf7XbBwje+sk4eHgqVGEqqVjrEmoqbmA2OS44WSS5LDvxXdI=
 =ZG/7
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Two significant security enhancements are part of this release:

   - NFSD's RPC header encoding and decoding, including RPCSEC GSS and
     gssproxy header parsing, has been overhauled to make it more
     memory-safe.

   - Support for Kerberos AES-SHA2-based encryption types has been added
     for both the NFS client and server. This provides a clean path for
     deprecating and removing insecure encryption types based on DES and
     SHA-1. AES-SHA2 is also FIPS-140 compliant, so that NFS with
     Kerberos may now be used on systems with fips enabled.

  In addition to these, NFSD is now able to handle crossing into an
  auto-mounted mount point on an exported NFS mount. A number of fixes
  have been made to NFSD's server-side copy implementation.

  RPC metrics have been converted to per-CPU variables. This helps
  reduce unnecessary cross-CPU and cross-node memory bus traffic, and
  significantly reduces noise when KCSAN is enabled"

* tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (121 commits)
  NFSD: Clean up nfsd_symlink()
  NFSD: copy the whole verifier in nfsd_copy_write_verifier
  nfsd: don't fsync nfsd_files on last close
  SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes
  nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
  NFSD: fix problems with cleanup on errors in nfsd4_copy
  nfsd: fix race to check ls_layouts
  nfsd: don't hand out delegation on setuid files being opened for write
  SUNRPC: Remove ->xpo_secure_port()
  SUNRPC: Clean up the svc_xprt_flags() macro
  nfsd: remove fs/nfsd/fault_inject.c
  NFSD: fix leaked reference count of nfsd4_ssc_umount_item
  nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
  nfsd: zero out pointers after putting nfsd_files on COPY setup error
  SUNRPC: Fix whitespace damage in svcauth_unix.c
  nfsd: eliminate __nfs4_get_fd
  nfsd: add some kerneldoc comments for stateid preprocessing functions
  nfsd: eliminate find_deleg_file_locked
  nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS
  SUNRPC: Add encryption self-tests
  ...
2023-02-22 14:21:40 -08:00
Richard Weinberger
e1f19857f9 fs: namei: Allow follow_down() to uncover auto mounts
This function is only used by NFSD to cross mount points.
If a mount point is of type auto mount, follow_down() will
not uncover it. Add LOOKUP_AUTOMOUNT to the lookup flags
to have ->d_automount() called when NFSD walks down the
mount tree.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:08 -05:00
Richard Weinberger
50f5fdaea9 NFSD: Teach nfsd_mountpoint() auto mounts
Currently nfsd_mountpoint() tests for mount points using d_mountpoint(),
this works only when a mount point is already uncovered.
In our case the mount point is of type auto mount and can be coverted.
i.e. ->d_automount() was not called.

Using d_managed() nfsd_mountpoint() can test whether a mount point is
either already uncovered or can be uncovered later.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Ian Kent <raven@themaw.net>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:08 -05:00
Christian Brauner
4609e1f18e
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
13e83a4923
fs: port ->set_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
abf08576af
fs: port vfs_*() helpers to struct mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-18 17:51:45 +01:00
Linus Torvalds
764822972d NFSD 6.2 Release Notes
This release introduces support for the CB_RECALL_ANY operation.
 NFSD can send this operation to request that clients return any
 delegations they choose. The server uses this operation to handle
 low memory scenarios or indicate to a client when that client has
 reached the maximum number of delegations the server supports.
 
 The NFSv4.2 READ_PLUS operation has been simplified temporarily
 whilst support for sparse files in local filesystems and the VFS is
 improved.
 
 Two major data structure fixes appear in this release:
 
 * The nfs4_file hash table is replaced with a resizable hash table
   to reduce the latency of NFSv4 OPEN operations.
 
 * Reference counting in the NFSD filecache has been hardened against
   races.
 
 In furtherance of removing support for NFSv2 in a subsequent kernel
 release, a new Kconfig option enables server-side support for NFSv2
 to be left out of a kernel build.
 
 MAINTAINERS has been updated to indicate that changes to fs/exportfs
 should go through the NFSD tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmOXPE4ACgkQM2qzM29m
 f5faaRAAh7YT5V61afPbfgBybO5AbDzztpZSNjNjLZs78piSnFp6hP75yNtTviwQ
 1o7St13/NkCmDaIdGUpr02U01zbM1BDOq2wGckImOJLNSgb7xHV5r4PqkRiFkh0t
 QYSnwG+wp8fDUJeCL/nAOAu9I9EQUqHzWchxiU/h8ln2hN3rXUlIRSeo17Wy7zkD
 cNIcoAjTi9fzY3dE6H4r+lZTdNCYH+AdzChmKrHdRZQwq0Xs3FWv4gAMTLbDuD4P
 B6NDHz0Umn6XnFsJGptwozkwaWeMQw4GyJj/3iUiO8JF209SaoYXMPjJAyG6tYYa
 fUrgv4UXGeXjigDbLBA5IYxfhX7GXjMQSaj3edhzyrl8P74q4/Cq/8fDUnAZ841m
 E+TGSCPIQD0QuIjdXxLv9KLY8JNThSfcAt6jr5GBXhPZQr8xpS0BqK/Onr68fgZC
 Lpull5xN68L4A1B7cf2GNPuMyvkBKxwSGXOehldh/BkvpVMjFwqd4/q5xWC+6CcQ
 tbOkjTbbSS71nzJwZip0NphaYCa3qQPzKT4SZzn/I4I9W5otbwYBx734Bw46gTDE
 ZPUXTuJ00VPgX07wbLRahg521Fwzr+8sk1WnVYq82PoaMh1l9FjzLNGouQWBdo3E
 UzIo/KUfQKmoZce6O723L6OI4ffdK5oMtfaTpe+SiUPpV1lUAcA=
 =jNlu
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "This release introduces support for the CB_RECALL_ANY operation. NFSD
  can send this operation to request that clients return any delegations
  they choose. The server uses this operation to handle low memory
  scenarios or indicate to a client when that client has reached the
  maximum number of delegations the server supports.

  The NFSv4.2 READ_PLUS operation has been simplified temporarily whilst
  support for sparse files in local filesystems and the VFS is improved.

  Two major data structure fixes appear in this release:

   - The nfs4_file hash table is replaced with a resizable hash table to
     reduce the latency of NFSv4 OPEN operations.

   - Reference counting in the NFSD filecache has been hardened against
     races.

  In furtherance of removing support for NFSv2 in a subsequent kernel
  release, a new Kconfig option enables server-side support for NFSv2 to
  be left out of a kernel build.

  MAINTAINERS has been updated to indicate that changes to fs/exportfs
  should go through the NFSD tree"

* tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (49 commits)
  NFSD: Avoid clashing function prototypes
  SUNRPC: Fix crasher in unwrap_integ_data()
  SUNRPC: Make the svc_authenticate tracepoint conditional
  NFSD: Use only RQ_DROPME to signal the need to drop a reply
  SUNRPC: Clean up xdr_write_pages()
  SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
  NFSD: add CB_RECALL_ANY tracepoints
  NFSD: add delegation reaper to react to low memory condition
  NFSD: add support for sending CB_RECALL_ANY
  NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
  trace: Relocate event helper files
  NFSD: pass range end to vfs_fsync_range() instead of count
  lockd: fix file selection in nlmsvc_cancel_blocked
  lockd: ensure we use the correct file descriptor when unlocking
  lockd: set missing fl_flags field when retrieving args
  NFSD: Use struct_size() helper in alloc_session()
  nfsd: return error if nfs4_setacl fails
  lockd: set other missing fields when unlocking files
  NFSD: Add an nfsd_file_fsync tracepoint
  sunrpc: svc: Remove an unused static function svc_ungetu32()
  ...
2022-12-12 20:54:39 -08:00
Linus Torvalds
6a518afcc2 fs.acl.rework.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc
 ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt
 pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg=
 =27Wm
 -----END PGP SIGNATURE-----

Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull VFS acl updates from Christian Brauner:
 "This contains the work that builds a dedicated vfs posix acl api.

  The origins of this work trace back to v5.19 but it took quite a while
  to understand the various filesystem specific implementations in
  sufficient detail and also come up with an acceptable solution.

  As we discussed and seen multiple times the current state of how posix
  acls are handled isn't nice and comes with a lot of problems: The
  current way of handling posix acls via the generic xattr api is error
  prone, hard to maintain, and type unsafe for the vfs until we call
  into the filesystem's dedicated get and set inode operations.

  It is already the case that posix acls are special-cased to death all
  the way through the vfs. There are an uncounted number of hacks that
  operate on the uapi posix acl struct instead of the dedicated vfs
  struct posix_acl. And the vfs must be involved in order to interpret
  and fixup posix acls before storing them to the backing store, caching
  them, reporting them to userspace, or for permission checking.

  Currently a range of hacks and duct tape exist to make this work. As
  with most things this is really no ones fault it's just something that
  happened over time. But the code is hard to understand and difficult
  to maintain and one is constantly at risk of introducing bugs and
  regressions when having to touch it.

  Instead of continuing to hack posix acls through the xattr handlers
  this series builds a dedicated posix acl api solely around the get and
  set inode operations.

  Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
  helpers must be used in order to interact with posix acls. They
  operate directly on the vfs internal struct posix_acl instead of
  abusing the uapi posix acl struct as we currently do. In the end this
  removes all of the hackiness, makes the codepaths easier to maintain,
  and gets us type safety.

  This series passes the LTP and xfstests suites without any
  regressions. For xfstests the following combinations were tested:
   - xfs
   - ext4
   - btrfs
   - overlayfs
   - overlayfs on top of idmapped mounts
   - orangefs
   - (limited) cifs

  There's more simplifications for posix acls that we can make in the
  future if the basic api has made it.

  A few implementation details:

   - The series makes sure to retain exactly the same security and
     integrity module permission checks. Especially for the integrity
     modules this api is a win because right now they convert the uapi
     posix acl struct passed to them via a void pointer into the vfs
     struct posix_acl format to perform permission checking on the mode.

     There's a new dedicated security hook for setting posix acls which
     passes the vfs struct posix_acl not a void pointer. Basing checking
     on the posix acl stored in the uapi format is really unreliable.
     The vfs currently hacks around directly in the uapi struct storing
     values that frankly the security and integrity modules can't
     correctly interpret as evidenced by bugs we reported and fixed in
     this area. It's not necessarily even their fault it's just that the
     format we provide to them is sub optimal.

   - Some filesystems like 9p and cifs need access to the dentry in
     order to get and set posix acls which is why they either only
     partially or not even at all implement get and set inode
     operations. For example, cifs allows setxattr() and getxattr()
     operations but doesn't allow permission checking based on posix
     acls because it can't implement a get acl inode operation.

     Thus, this patch series updates the set acl inode operation to take
     a dentry instead of an inode argument. However, for the get acl
     inode operation we can't do this as the old get acl method is
     called in e.g., generic_permission() and inode_permission(). These
     helpers in turn are called in various filesystem's permission inode
     operation. So passing a dentry argument to the old get acl inode
     operation would amount to passing a dentry to the permission inode
     operation which we shouldn't and probably can't do.

     So instead of extending the existing inode operation Christoph
     suggested to add a new one. He also requested to ensure that the
     get and set acl inode operation taking a dentry are consistently
     named. So for this version the old get acl operation is renamed to
     ->get_inode_acl() and a new ->get_acl() inode operation taking a
     dentry is added. With this we can give both 9p and cifs get and set
     acl inode operations and in turn remove their complex custom posix
     xattr handlers.

     In the future I hope to get rid of the inode method duplication but
     it isn't like we have never had this situation. Readdir is just one
     example. And frankly, the overall gain in type safety and the more
     pleasant api wise are simply too big of a benefit to not accept
     this duplication for a while.

   - We've done a full audit of every codepaths using variant of the
     current generic xattr api to get and set posix acls and
     surprisingly it isn't that many places. There's of course always a
     chance that we might have missed some and if so I'm sure we'll find
     them soon enough.

     The crucial codepaths to be converted are obviously stacking
     filesystems such as ecryptfs and overlayfs.

     For a list of all callers currently using generic xattr api helpers
     see [2] including comments whether they support posix acls or not.

   - The old vfs generic posix acl infrastructure doesn't obey the
     create and replace semantics promised on the setxattr(2) manpage.
     This patch series doesn't address this. It really is something we
     should revisit later though.

  The patches are roughly organized as follows:

   (1) Change existing set acl inode operation to take a dentry
       argument (Intended to be a non-functional change)

   (2) Rename existing get acl method (Intended to be a non-functional
       change)

   (3) Implement get and set acl inode operations for filesystems that
       couldn't implement one before because of the missing dentry.
       That's mostly 9p and cifs (Intended to be a non-functional
       change)

   (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
       and vfs_set_acl() including security and integrity hooks
       (Intended to be a non-functional change)

   (5) Implement get and set acl inode operations for stacking
       filesystems (Intended to be a non-functional change)

   (6) Switch posix acl handling in stacking filesystems to new posix
       acl api now that all filesystems it can stack upon support it.

   (7) Switch vfs to new posix acl api (semantical change)

   (8) Remove all now unused helpers

   (9) Additional regression fixes reported after we merged this into
       linux-next

  Thanks to Seth for a lot of good discussion around this and
  encouragement and input from Christoph"

* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
  posix_acl: Fix the type of sentinel in get_acl
  orangefs: fix mode handling
  ovl: call posix_acl_release() after error checking
  evm: remove dead code in evm_inode_set_acl()
  cifs: check whether acl is valid early
  acl: make vfs_posix_acl_to_xattr() static
  acl: remove a slew of now unused helpers
  9p: use stub posix acl handlers
  cifs: use stub posix acl handlers
  ovl: use stub posix acl handlers
  ecryptfs: use stub posix acl handlers
  evm: remove evm_xattr_acl_change()
  xattr: use posix acl api
  ovl: use posix acl api
  ovl: implement set acl method
  ovl: implement get acl method
  ecryptfs: implement set acl method
  ecryptfs: implement get acl method
  ksmbd: use vfs_remove_acl()
  acl: add vfs_remove_acl()
  ...
2022-12-12 18:46:39 -08:00
Linus Torvalds
75f4d9af8b iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing
 more of the same for the future.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ
 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N
 MzwiRTeqnGDxTTF7mgd//IB6hoatAA==
 =bcvF
 -----END PGP SIGNATURE-----

Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter updates from Al Viro:
 "iov_iter work; most of that is about getting rid of direction
  misannotations and (hopefully) preventing more of the same for the
  future"

* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  use less confusing names for iov_iter direction initializers
  iov_iter: saner checks for attempt to copy to/from iterator
  [xen] fix "direction" argument of iov_iter_kvec()
  [vhost] fix 'direction' argument of iov_iter_{init,bvec}()
  [target] fix iov_iter_bvec() "direction" argument
  [s390] memcpy_real(): WRITE is "data source", not destination...
  [s390] zcore: WRITE is "data source", not destination...
  [infiniband] READ is "data destination", not source...
  [fsi] WRITE is "data source", not destination...
  [s390] copy_oldmem_kernel() - WRITE is "data source", not destination
  csum_and_copy_to_iter(): handle ITER_DISCARD
  get rid of unlikely() on page_copy_sane() calls
2022-12-12 18:29:54 -08:00
Chuck Lever
4d1ea84557 NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
NFSv4 operations manage the lifetime of nfsd_file items they use by
means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
garbage collected.

Introduce a mechanism to enable garbage collection for nfsd_file
items used only by NFSv2/3 callers.

Note that the change in nfsd_file_put() ensures that both CLOSE and
DELEGRETURN will actually close out and free an nfsd_file on last
reference of a non-garbage-collected file.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-11-28 12:54:45 -05:00
Chuck Lever
c252849082 NFSD: Pass the target nfsd_file to nfsd_commit()
In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
2022-11-28 12:54:45 -05:00
Jeff Layton
cb12fae1c3 nfsd: move nfserrno() to vfs.c
nfserrno() is common to all nfs versions, but nfsproc.c is specifically
for NFSv2. Move it to vfs.c, and the prototype to vfs.h.

While we're in here, remove the #ifdef EDQUOT check in this function.
It's apparently a holdover from the initial merge of the nfsd code in
1997. No other place in the kernel checks that that symbol is defined
before using it, so I think we can dispense with it here.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Colin Ian King
69eed23baf NFSD: Remove redundant assignment to variable host_err
Variable host_err is assigned a value that is never read, it is being
re-assigned a value in every different execution path in the following
switch statement. The assignment is redundant and can be removed.

Cleans up clang-scan warning:
warning: Value stored to 'host_err' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28 12:54:44 -05:00
Linus Torvalds
cf562a45a0 Amir's copy_file_range() fix
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY4OtEwAKCRBZ7Krx/gZQ
 66LvAP9tMMKsXoZY5dNjkAeQo/I5PHx81iLYu5GyigqTsf0g8gD+MeM2qxQE9QTt
 6gngWpnNif7Pe5Jj5yuwl4IGbjDG9AQ=
 =Tx7P
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
 "Amir's copy_file_range() fix"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: fix copy_file_range() averts filesystem freeze protection
2022-11-27 12:40:06 -08:00
Linus Torvalds
e5f3ec38c8 Fixes:
- Fix rare data corruption on READ operations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmOCPhcACgkQM2qzM29m
 f5ewTRAAuAjJvtnsRxZBILs76U3KpAnn3ghtYKtzCmanq1UEJAlTz5vLLCqJRTXw
 OajN4f/T/kD3bY5XxYcsbSMhVJhgd7JMx/DyPjp3rY+nS1cS/CB+TxN3VqhOZJDl
 MfEJTN0MxWrq0pXKYxuPECmwk/C7jAU4vjWU+sDWhu4Anig564fdPxGGWw9vRfh5
 YP8U8cuM5LIFCM2ZSjPsqWEXgd7pbYpYdOx6JPA74ftia8RBU1YVbIqZ6uFXiTnV
 tpKAuQYem2ixV6UfsrVbeHDfsSkQcX2iaTGlcLB5y12nprV0wTOQM6lONTD878vh
 37oc8zG9cdpWfcRzkyB8a58jHfSk74pMkV300C38CeV1KtajFVoRkoTvLYJdxbf8
 WPcRcbsXbotb4+A1D2H6QvPKUbrlK+HIHe8POU67tKq5BI8xRw+ojTsJ2ANYr9Jt
 OviLUraUnKFvCK2LN03+Op4l+UBTIdOGzGMqveILfxPD6zlTYJbQQ64Nih4JLfJG
 uDN2892H4He3he5oD9Dzoh2xSWfLV5PJEAaN9wW7pyv8D5HfYBXWPan6JeaMHnfW
 +NcoFQ3cSewyMZ1Rw0aCsKfXRGwikJrbAsrlvNe+uFRD6ueeWULp5fnTlOn+0tW1
 tp+EzLOMoNkL9NW/rLeCIPN1MNXzgabgd4Diq8U3iTDRxf0FahU=
 =zxjk
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix rare data corruption on READ operations

* tag 'nfsd-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Fix reads with a non-zero offset that don't end on a page boundary
2022-11-26 12:25:49 -08:00
Al Viro
de4eda9de2 use less confusing names for iov_iter direction initializers
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 13:01:55 -05:00
Amir Goldstein
10bc8e4af6 vfs: fix copy_file_range() averts filesystem freeze protection
Commit 868f9f2f8e ("vfs: fix copy_file_range() regression in cross-fs
copies") removed fallback to generic_copy_file_range() for cross-fs
cases inside vfs_copy_file_range().

To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to
generic_copy_file_range() was added in nfsd and ksmbd code, but that
call is missing sb_start_write(), fsnotify hooks and more.

Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that
will take care of the fallback, but that code would be subtle and we got
vfs_copy_file_range() logic wrong too many times already.

Instead, add a flag to explicitly request vfs_copy_file_range() to
perform only generic_copy_file_range() and let nfsd and ksmbd use this
flag only in the fallback path.

This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code
paths to reduce the risk of further regressions.

Fixes: 868f9f2f8e ("vfs: fix copy_file_range() regression in cross-fs copies")
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25 00:52:28 -05:00
Chuck Lever
ac8db824ea NFSD: Fix reads with a non-zero offset that don't end on a page boundary
This was found when virtual machines with nfs-mounted qcow2 disks
failed to boot properly.

Reported-by: Anders Blomdell <anders.blomdell@control.lth.se>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132
Fixes: bfbfb6182a ("nfsd_splice_actor(): handle compound pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-23 14:32:35 -05:00
Christian Brauner
138060ba92
fs: pass dentry to set acl method
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].

Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
operation. But since ->set_acl() is required in order to use the generic
posix acl xattr handlers filesystems that do not implement this inode
operation cannot use the handler and need to implement their own
dedicated posix acl handlers.

Update the ->set_acl() inode method to take a dentry argument. This
allows all filesystems to rely on ->set_acl().

As far as I can tell all codepaths can be switched to rely on the dentry
instead of just the inode. Note that the original motivation for passing
the dentry separate from the inode instead of just the dentry in the
xattr handlers was because of security modules that call
security_d_instantiate(). This hook is called during
d_instantiate_new(), d_add(), __d_instantiate_anon(), and
d_splice_alias() to initialize the inode's security context and possibly
to set security.* xattrs. Since this only affects security.* xattrs this
is completely irrelevant for posix acls.

Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-19 12:55:42 +02:00
Linus Torvalds
7a3353c5c4 struct file-related stuff
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxjIQAKCRBZ7Krx/gZQ
 6/FPAQCNCZygQzd+54//vo4kTwv5T2Bv3hS8J51rASPJT87/BQD/TfCLS5urt/Gt
 81A1dFOfnTXseofuBKyGSXwQm0dWpgA=
 =PLre
 -----END PGP SIGNATURE-----

Merge tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs file updates from Al Viro:
 "struct file-related stuff"

* tag 'pull-file' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dma_buf_getfile(): don't bother with ->f_flags reassignments
  Change calling conventions for filldir_t
  locks: fix TOCTOU race when granting write lease
2022-10-06 17:13:18 -07:00
Chuck Lever
5f5f8b6d65 NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
nfsd_unlink() can kick off a CB_RECALL (via
vfs_unlink() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_unlink()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:41 -04:00
Chuck Lever
68c522afd0 NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

This version of the patch handles renaming an existing file,
but does not deal with renaming onto an existing file. That
case will still always trigger an NFS4ERR_DELAY.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:40 -04:00
Chuck Lever
34b91dda71 NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:36 -04:00
Chuck Lever
c0aa1913db NFSD: Refactor nfsd_setattr()
Move code that will be retried (in a subsequent patch) into a helper
function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-09-26 14:02:32 -04:00
NeilBrown
9558f9304c NFSD: drop fname and flen args from nfsd_create_locked()
nfsd_create_locked() does not use the "fname" and "flen" arguments, so
drop them from declaration and all callers.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26 14:02:30 -04:00
Linus Torvalds
d1221cea11 fix for nfsd regression caused by iov_iter stuff this window
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYx/tpQAKCRBZ7Krx/gZQ
 6wuPAQCrJ18XdncE/VnpRfcpQ+UQJXryURXwijaVBswv3LpM/gEAgQcmFff7pR0P
 ZXzS56F/D0faE2evltbAMQCWUoeapQA=
 =mTrE
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter fix from Al Viro:
 "Fix for a nfsd regression caused by the iov_iter stuff this window"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd_splice_actor(): handle compound pages
2022-09-13 15:11:38 +02:00
Al Viro
bfbfb6182a nfsd_splice_actor(): handle compound pages
pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE
worth of data).  Theoretically it had been possible since way back, but
nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change.
Fortunately, the only thing that changes for compound pages is that we
need to stuff each relevant subpage in and convert the offset into offset
in the first subpage.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: f0f6b614f8 "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-12 22:38:36 -04:00
NeilBrown
00801cd92d NFSD: fix regression with setting ACLs.
A recent patch moved ACL setting into nfsd_setattr().
Unfortunately it didn't work as nfsd_setattr() aborts early if
iap->ia_valid is 0.

Remove this test, and instead avoid calling notify_change() when
ia_valid is 0.

This means that nfsd_setattr() will now *always* lock the inode.
Previously it didn't if only a ATTR_MODE change was requested on a
symlink (see Commit 15b7a1b86d ("[PATCH] knfsd: fix setattr-on-symlink
error return")). I don't think this change really matters.

Fixes: c0cbe70742 ("NFSD: add posix ACLs to struct nfsd_attrs")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-08 17:53:24 -04:00
Al Viro
25885a35a7 Change calling conventions for filldir_t
filldir_t instances (directory iterators callbacks) used to return 0 for
"OK, keep going" or -E... for "stop".  Note that it's *NOT* how the
error values are reported - the rules for those are callback-dependent
and ->iterate{,_shared}() instances only care about zero vs. non-zero
(look at emit_dir() and friends).

So let's just return bool ("should we keep going?") - it's less confusing
that way.  The choice between "true means keep going" and "true means
stop" is bikesheddable; we have two groups of callbacks -
	do something for everything in directory, until we run into problem
and
	find an entry in directory and do something to it.

The former tended to use 0/-E... conventions - -E<something> on failure.
The latter tended to use 0/1, 1 being "stop, we are done".
The callers treated anything non-zero as "stop", ignoring which
non-zero value did they get.

"true means stop" would be more natural for the second group; "true
means keep going" - for the first one.  I tried both variants and
the things like
	if allocation failed
		something = -ENOMEM;
		return true;
just looked unnatural and asking for trouble.

[folded suggestion from Matthew Wilcox <willy@infradead.org>]
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-17 17:25:04 -04:00
NeilBrown
dd8dd403d7 NFSD: discard fh_locked flag and fh_lock/fh_unlock
As all inode locking is now fully balanced, fh_put() does not need to
call fh_unlock().
fh_lock() and fh_unlock() are no longer used, so discard them.
These are the only real users of ->fh_locked, so discard that too.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:48 -04:00
NeilBrown
bb4d53d66e NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock().  This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed.  Only the xattr
calls need pre/post information.

When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:41 -04:00
NeilBrown
debf16f0c6 NFSD: use explicit lock/unlock for directory ops
When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done
for renames, with lock_rename() as the explicit locking.

Also move the 'fill' calls closer to the operation that might change the
attributes.  This way they are avoided on some error paths.

For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.

Making the locking explicit will simplify proposed future changes to
locking for directories.  It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:20 -04:00
NeilBrown
19d008b469 NFSD: reduce locking in nfsd_lookup()
nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.

Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.

nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.

NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
  wasn't requested and no change happened.  Now that nfsd_lookup()
  doesn't use fh_lock(), we need to explicitly fill the attributes
  when no create happens.  A new fh_fill_both_attrs() is provided
  for that task.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:20 -04:00
NeilBrown
e18bcb33bc NFSD: only call fh_unlock() once in nfsd_link()
On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.

So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:19 -04:00
NeilBrown
b677c0c63a NFSD: always drop directory lock in nfsd_unlink()
Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory.  This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.

This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:19 -04:00
NeilBrown
927bfc5600 NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
nfsd_create() usually returns with the directory still locked.
nfsd_symlink() usually returns with it unlocked.  This is clumsy.

Until recently nfsd_create() needed to keep the directory locked until
ACLs and security label had been set.  These are now set inside
nfsd_create() (in nfsd_setattr()) so this need is gone.

So change nfsd_create() and nfsd_symlink() to always unlock, and remove
any fh_unlock() calls that follow calls to these functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:19 -04:00
NeilBrown
c0cbe70742 NFSD: add posix ACLs to struct nfsd_attrs
pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.

nfsd_setattr() sets the acls as appropriate.

Errors are handled as with security labels.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:03 -04:00
NeilBrown
d6a97d3f58 NFSD: add security label to struct nfsd_attrs
nfsd_setattr() now sets a security label if provided, and nfsv4 provides
it in the 'open' and 'create' paths and the 'setattr' path.
If setting the label failed (including because the kernel doesn't
support labels), an error field in 'struct nfsd_attrs' is set, and the
caller can respond.  The open/create callers clear
FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case.
The setattr caller returns the error.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
NeilBrown
93adc1e391 NFSD: set attributes when creating symlinks
The NFS protocol includes attributes when creating symlinks.
Linux does store attributes for symlinks and allows them to be set,
though they are not used for permission checking.

NFSD currently doesn't set standard (struct iattr) attributes when
creating symlinks, but for NFSv4 it does set ACLs and security labels.
This is inconsistent.

To improve consistency, pass the provided attributes into nfsd_symlink()
and call nfsd_create_setattr() to set them.

NOTE: this results in a behaviour change for all NFS versions when the
client sends non-default attributes with a SYMLINK request. With the
Linux client, the only attributes are:
	attr.ia_mode = S_IFLNK | S_IRWXUGO;
	attr.ia_valid = ATTR_MODE;
so the final outcome will be unchanged. Other clients might sent
different attributes, and if they did they probably expect them to be
honoured.

We ignore any error from nfsd_create_setattr().  It isn't really clear
what should be done if a file is successfully created, but the
attributes cannot be set.  NFS doesn't allow partial success to be
reported.  Reporting failure is probably more misleading than reporting
success, so the status is ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
NeilBrown
7fe2a71dda NFSD: introduce struct nfsd_attrs
The attributes that nfsd might want to set on a file include 'struct
iattr' as well as an ACL and security label.
The latter two are passed around quite separately from the first, in
part because they are only needed for NFSv4.  This leads to some
clumsiness in the code, such as the attributes NOT being set in
nfsd_create_setattr().

We need to keep the directory locked until all attributes are set to
ensure the file is never visibile without all its attributes.  This need
combined with the inconsistent handling of attributes leads to more
clumsiness.

As a first step towards tidying this up, introduce 'struct nfsd_attrs'.
This is passed (by reference) to vfs.c functions that work with
attributes, and is assembled by the various nfs*proc functions which
call them.  As yet only iattr is included, but future patches will
expand this.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
Linus Torvalds
69cb6c6556 Notable regression fixes:
- Fix NFSD crash during NFSv4.2 READ_PLUS operation
 - Fix incorrect status code returned by COMMIT operation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLAhFIACgkQM2qzM29m
 f5fK2w//TyUAzwoTZQEgmFbxNhXgUYC6uwWPqTCLMieStKtPwT8GsuXviY34QziF
 2vER0NW9Am2GQyL3gtiYFM07OoHhQ4gr86ltaeIHAHhcwm3eIs879ARwEsyN6eDR
 +RDpqnONwtg+yaepfCMc4Bki9Jex+mmoXro86nFPmH+TDM5QiIRY0ncBWSLVWvYT
 YciAgvL6vfo2G79NYOzohoTb15ydotmy6m9H70nN+a2l6bKOIT8cF4S8lZETJZXA
 Rlj+R0eE0iXZTtp7VsAfVAHHfOzexGJjE85hpVzGiZWbxe6o0WNmBpoHICXs9VoP
 WRkq6vBC9P6wJ7EOu84/SRlR/rQaxfrYB7beqTC2kO6t5Ka5/xpJMKZOhfukCMV+
 7+uPAUnSIRqpHG0hWm1kPto00bfXm9fMETQbOEw7UQ9dH/322iOJnXwy03ZEXtJq
 9G0k+yNcydoIs2g5OpNrw4f/wTQcdlbf1ZA5O9dAsxwS1ZTNpKUiE0Sd0Ez/0VIJ
 t5DZFWH44rs/60avxRYxtw965gNugTxb1YiKdXObvbFGf+xYtH0zePce++4kTJlE
 t886stLcHbuPYs4/XItwTxLMSnDM5UR22Icho7ElRHvPiWjZmc6o2fxCkWTzCOE2
 ylZowLFMYZ/KbrP5mcqLARKEP6oFJERXt/U8U0CWeVbZomy9GVY=
 =z5W8
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable regression fixes:

   - Fix NFSD crash during NFSv4.2 READ_PLUS operation

   - Fix incorrect status code returned by COMMIT operation"

* tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix READ_PLUS crasher
  NFSD: restore EINVAL error translation in nfsd_commit()
2022-07-02 11:20:56 -07:00
Amir Goldstein
868f9f2f8e vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.

Before commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems.  After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.

Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.

Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().

This patch restores some cross-filesystem copy restrictions that existed
prior to commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().

Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user.  Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e.  nfsv3).

Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb.  This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().

nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.

fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.

Fixes: 5dae222a5f ("vfs: allow copy_file_range to copy across devices")
Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: 64bf5ff58d ("vfs: no fallback for ->copy_file_range")
Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
Reported-by: He Zhe <zhe.he@windriver.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-30 15:16:38 -07:00