Commit graph

5722 commits

Author SHA1 Message Date
Dai Ngo
227823d207 nfs: optimise readdir cache page invalidation
When the directory is large and it's being modified by one client
while another client is doing the 'ls -l' on the same directory then
the cache page invalidation from nfs_force_use_readdirplus causes
the reading client to keep restarting READDIRPLUS from cookie 0
which causes the 'ls -l' to take a very long time to complete,
possibly never completing.

Currently when nfs_force_use_readdirplus is called to switch from
READDIR to READDIRPLUS, it invalidates all the cached pages of the
directory. This cache page invalidation causes the next nfs_readdir
to re-read the directory content from cookie 0.

This patch is to optimise the cache invalidation in
nfs_force_use_readdirplus by only truncating the cached pages from
last page index accessed to the end the file. It also marks the
inode to delay invalidating all the cached page of the directory
until the next initial nfs_readdir of the next 'ls' instance.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Fix conflicts with Trond's readdir patches]
[Anna - Remove redundant call to nfs_zap_mapping()]
[Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-04 10:50:44 -05:00
Trond Myklebust
93a6ab7b69 NFS: Switch readdir to using iterate_shared()
Now that the page cache locking is repaired, we should be able to
switch to using iterate_shared() for improved concurrency when
doing readdir().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:51 -05:00
Trond Myklebust
3803d6721b NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
The directory strings stored in the readdir cache may be used with
printk(), so it is better to ensure they are nul-terminated.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:45 -05:00
Trond Myklebust
114de38225 NFS: Directory page cache pages need to be locked when read
When a NFS directory page cache page is removed from the page cache,
its contents are freed through a call to nfs_readdir_clear_array().
To prevent the removal of the page cache entry until after we've
finished reading it, we must take the page lock.

Fixes: 11de3b11e0 ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:37:17 -05:00
Trond Myklebust
4b310319c6 NFS: Fix memory leaks and corruption in readdir
nfs_readdir_xdr_to_array() must not exit without having initialised
the array, so that the page cache deletion routines can safely
call nfs_readdir_clear_array().
Furthermore, we should ensure that if we exit nfs_readdir_filler()
with an error, we free up any page contents to prevent a leak
if we try to fill the page again.

Fixes: 11de3b11e0 ("NFS: Fix a memory leak in nfs_readdir")
Cc: stable@vger.kernel.org # v2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:17 -05:00
Trond Myklebust
a8bd9ddf39 NFS: Replace various occurrences of kstrndup() with kmemdup_nul()
When we already know the string length, it is more efficient to
use kmemdup_nul().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Anna - Changes to super.c were already made during fscontext conversion]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
10717f4563 NFSv4: Limit the total number of cached delegations
Delegations can be expensive to return, and can cause scalability issues
for the server. Let's therefore try to limit the number of inactive
delegations we hold.
Once the number of delegations is above a certain threshold, start
to return them on close.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
d2269ea14e NFSv4: Add accounting for the number of active delegations held
In order to better manage our delegation caching, add a counter
to track the number of active delegations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
b7b7dac684 NFSv4: Try to return the delegation immediately when marked for return on close
Add a routine to return the delegation immediately upon close of the
file if it was marked for return-on-close.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
0d10416797 NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned
If a delegation is marked as needing to be returned when the file is
closed, then don't clear that marking until we're ready to return
it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
f885ea640d NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING
In particular, the pnfs return-on-close code will check for that flag,
so ensure we set it appropriately.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
65f5160376 NFS: nfs_find_open_context() should use cred_fscmp()
We want to find open contexts that match our filesystem access
properties. They don't have to exactly match the cred.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
9a206de2ea NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
We do not need to have the rcu lookup method fail in the case where
the fsuid/fsgid and supplemental groups match.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Trond Myklebust
3871224787 NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().

Fixes: a52458b48a ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Alex Shi
c0399cf668 NFS: remove unused macros
MNT_fhs_status_sz/MNT_fhandle3_sz are never used after they were
introduced. So better to remove them.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 10:43:06 -05:00
Linus Torvalds
22b17db4ea y2038: core, driver and file system changes
These are updates to device drivers and file systems that for some reason
 or another were not included in the kernel in the previous y2038 series.
 
 I've gone through all users of time_t again to make sure the kernel is
 in a long-term maintainable state, replacing all remaining references
 to time_t with safe alternatives.
 
 Some related parts of the series were picked up into the nfsd, xfs,
 alsa and v4l2 trees. A final set of patches in linux-mm removes the now
 unused time_t/timeval/timespec types and helper functions after all five
 branches are merged for linux-5.6, ensuring that no new users get merged.
 
 As a result, linux-5.6, or my backport of the patches to 5.4 [1], should
 be the first release that can serve as a base for a 32-bit system designed
 to run beyond year 2038, with a few remaining caveats:
 
 - All user space must be compiled with a 64-bit time_t, which will be
   supported in the coming musl-1.2 and glibc-2.32 releases, along with
   installed kernel headers from linux-5.6 or higher.
 
 - Applications that use the system call interfaces directly need to be
   ported to use the time64 syscalls added in linux-5.1 in place of the
   existing system calls. This impacts most users of futex() and seccomp()
   as well as programming languages that have their own runtime environment
   not based on libc.
 
 - Applications that use a private copy of kernel uapi header files or
   their contents may need to update to the linux-5.6 version, in
   particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
   linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h.
 
 - A few remaining interfaces cannot be changed to pass a 64-bit time_t
   in a compatible way, so they must be configured to use CLOCK_MONOTONIC
   times or (with a y2106 problem) unsigned 32-bit timestamps. Most
   importantly this impacts all users of 'struct input_event'.
 
 - All y2038 problems that are present on 64-bit machines also apply to
   32-bit machines. In particular this affects file systems with on-disk
   timestamps using signed 32-bit seconds: ext4 with ext3-style small
   inodes, ext2, xfs (to be fixed soon) and ufs.
 
 Changes since v1 [2]:
 
 - Add Acks I received
 - Rebase to v5.5-rc1, dropping patches that got merged already
 - Add NFS, XFS and the final three patches from another series
 - Rewrite etnaviv patches
 - Add one late revert to avoid an etnaviv regression
 
 [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame
 [2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D
 IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp
 +BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv
 RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN
 DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW
 vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL
 MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu
 YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO
 4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99
 9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp
 xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1
 LjOBE2TfSeO8Ru1dXV4t
 =wx0A
 -----END PGP SIGNATURE-----

Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 updates from Arnd Bergmann:
 "Core, driver and file system changes

  These are updates to device drivers and file systems that for some
  reason or another were not included in the kernel in the previous
  y2038 series.

  I've gone through all users of time_t again to make sure the kernel is
  in a long-term maintainable state, replacing all remaining references
  to time_t with safe alternatives.

  Some related parts of the series were picked up into the nfsd, xfs,
  alsa and v4l2 trees. A final set of patches in linux-mm removes the
  now unused time_t/timeval/timespec types and helper functions after
  all five branches are merged for linux-5.6, ensuring that no new users
  get merged.

  As a result, linux-5.6, or my backport of the patches to 5.4 [1],
  should be the first release that can serve as a base for a 32-bit
  system designed to run beyond year 2038, with a few remaining caveats:

   - All user space must be compiled with a 64-bit time_t, which will be
     supported in the coming musl-1.2 and glibc-2.32 releases, along
     with installed kernel headers from linux-5.6 or higher.

   - Applications that use the system call interfaces directly need to
     be ported to use the time64 syscalls added in linux-5.1 in place of
     the existing system calls. This impacts most users of futex() and
     seccomp() as well as programming languages that have their own
     runtime environment not based on libc.

   - Applications that use a private copy of kernel uapi header files or
     their contents may need to update to the linux-5.6 version, in
     particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
     linux/elfcore.h, linux/sockios.h, linux/timex.h and
     linux/can/bcm.h.

   - A few remaining interfaces cannot be changed to pass a 64-bit
     time_t in a compatible way, so they must be configured to use
     CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
     timestamps. Most importantly this impacts all users of 'struct
     input_event'.

   - All y2038 problems that are present on 64-bit machines also apply
     to 32-bit machines. In particular this affects file systems with
     on-disk timestamps using signed 32-bit seconds: ext4 with
     ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame

* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
  Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
  y2038: sh: remove timeval/timespec usage from headers
  y2038: sparc: remove use of struct timex
  y2038: rename itimerval to __kernel_old_itimerval
  y2038: remove obsolete jiffies conversion functions
  nfs: fscache: use timespec64 in inode auxdata
  nfs: fix timstamp debug prints
  nfs: use time64_t internally
  sunrpc: convert to time64_t for expiry
  drm/etnaviv: avoid deprecated timespec
  drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
  drm/msm: avoid using 'timespec'
  hfs/hfsplus: use 64-bit inode timestamps
  hostfs: pass 64-bit timestamps to/from user space
  packet: clarify timestamp overflow
  tsacct: add 64-bit btime field
  acct: stop using get_seconds()
  um: ubd: use 64-bit time_t where possible
  xtensa: ISS: avoid struct timeval
  dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
  ...
2020-01-29 14:55:47 -08:00
David Howells
3a21409a0b nfs: Return EINVAL rather than ERANGE for mount parse errors
Return EINVAL rather than ERANGE for mount parse errors as the userspace
mount command doesn't necessarily understand what to do with anything other
than EINVAL.

The old code returned -ERANGE as an intermediate error that then get
converted to -EINVAL, whereas the new code returns -ERANGE.

This was induced by passing minorversion=1 to a v4 mount where
CONFIG_NFS_V4_1 was disabled in the kernel build.

Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24 16:51:13 -05:00
Olga Kornievskaia
b24ee6c64c NFS: allow deprecation of NFS UDP protocol
Add a kernel config CONFIG_NFS_DISABLE_UDP_SUPPORT to disallow NFS
UDP mounts and enable it by default.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24 16:51:13 -05:00
Trond Myklebust
f7b37b8b13 NFS: Add softreval behaviour to nfs_lookup_revalidate()
If the server is unavaliable, we want to allow the revalidating
lookup to time out, and to default to validating the cached dentry
if the 'softreval' mount option is set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24 16:51:13 -05:00
Su Yanjun
fe1e8dbec1 NFSv3: FIx bug when using chacl and chmod to change acl
We find a bug when running test under nfsv3  as below.
1)
chacl u::r--,g::rwx,o:rw- file1
2)
chmod u+w file1
3)
chacl -l file1

We expect u::rw-, but it shows u::r--, more likely it returns the
cached acl in inode.

We dig the code find that the code path is different.

chacl->..->__nfs3_proc_setacls->nfs_zap_acl_cache
Then nfs_zap_acl_cache clears the NFS_INO_INVALID_ACL in
NFS_I(inode)->cache_validity.

chmod->..->nfs3_proc_setattr
Because NFS_INO_INVALID_ACL has been cleared by chacl path,
nfs_zap_acl_cache wont be called.

nfs_setattr_update_inode will set NFS_INO_INVALID_ACL so let it
before nfs_zap_acl_cache call.

Signed-off-by: Su Yanjun <suyanjun218@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Olga Kornievskaia
d826e5b827 NFSv4.x recover from pre-mature loss of openstateid
Ever since the commit 0e0cb35b41, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".

Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.

Fixes: 0e0cb35b41 ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Olga Kornievskaia
62a1573fcf NFSv4 fix acl retrieval over krb5i/krb5p mounts
For the krb5i and krb5p mount, it was problematic to truncate the
received ACL to the provided buffer because an integrity check
could not be preformed.

Instead, provide enough pages to accommodate the largest buffer
bounded by the largest RPC receive buffer size.

Note: I don't think it's possible for the ACL to be truncated now.
Thus NFS4_ACL_TRUNC flag and related code could be possibly
removed but since I'm unsure, I'm leaving it.

v2: needs +1 page.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
c74dfe97c1 NFS: Add mount option 'softreval'
Add a mount option 'softreval' that allows attribute revalidation 'getattr'
calls to time out, and causes them to fall back to using the cached
attributes.
The use case for this option is for ensuring that we can still (slowly)
traverse paths and use cached information even when the server is down.
Once the server comes back up again, the getattr calls start succeeding,
and the caches will revalidate as usual.

The 'softreval' mount option is automatically enabled if you have
specified 'softerr'.  It can be turned off using the options
'nosoftreval', or 'hard'.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
5c965db86e NFS: Trust cached access if we've already revalidated the inode once
If we've already revalidated the inode once then don't distrust the
access cache unless the NFS_INO_INVALID_ACCESS flag is actually set.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
4daaeba938 NFS: Fix nfs_direct_write_reschedule_io()
The 'hdr->good_bytes' is defined as the number of bytes we expect to
read or write starting at offset hdr->io_start. In the case of a partial
read/write we may end up adjusting hdr->args.offset and hdr->args.count
to skip I/O for data that was already read/written, and so we must ensure
the calculation takes that into account.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
8c9cb71491 NFS: When resending after a short write, reset the reply count to zero
If we're resending a write due to a short read or write, ensure we
reset the reply count to zero.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
e8194b7dd3 NFS: Improve tracing of permission calls
On exit from nfs_do_access(), record the mask representing the requested
permissions, as well as the server-supplied set of access rights for
this user.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
088f3e68d8 pNFS/flexfiles: Add tracing for layout errors
Trace layout errors for pNFS/flexfiles on read/write/commit operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
7bdd297ea6 NFS: Clean up generic file commit tracepoint
Clean up the generic file commit tracepoints to use a 64-bit value
for the verifier, and to display the pNFS filehandle, if it exists.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Trond Myklebust
5bb2a7cb9f NFS: Clean up generic writeback tracepoints
Clean up the generic writeback tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually written.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
2343172d34 NFS: Clean up generic file read tracepoints
Clean up the generic file read tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually read.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
0722dc9fea pNFS/flexfiles: Record resend attempts on I/O failure
If the attempt to do pNFS fails, then record what action we
take to recover (resend, reset to pnfs or reset to mds).

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
118b629219 NFS: Fix fix of show_nfs_errors
Casting a negative value to an unsigned long is not the same as
converting it to its absolute value.

Fixes: 96650e2eff ("NFS: Fix show_nfs_errors macros again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
25925b00a9 NFSv4: Improve read/write/commit tracing
Ensure we always return the number of bytes read/written. Also display
the pnfs filehandle if it is in use.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
221203ce64 NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()
Instead of making assumptions about the commit verifier contents, change
the commit code to ensure we always check that the verifier was set
by the XDR code.

Fixes: f54bcf2ece ("pnfs: Prepare for flexfiles by pulling out common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
2197e9b06c NFS: Fix up fsync() when the server rebooted
Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling
nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an
error, we end up with dirty pages in the page cache, but no tag
to tell us that those pages need resending.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
b8946d7bfb NFS: Revalidate the file mapping on all fatal writeback errors
If a write or commit failed, and the mapping sees a fatal error, we
need to revalidate the contents of that mapping.

Fixes: 06c9fdf3b9 ("NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Trond Myklebust
0df68ced55 NFS: Revalidate the file size on a fatal write error
If we suffer a fatal error upon writing a file, which causes us to
need to revalidate the entire mapping, then we should also revalidate
the file size.

Fixes: d2ceb7e570 ("NFS: Don't use page_file_mapping after removing the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:32 -05:00
Colin Ian King
e0b27d98bf NFS: Add missing null check for failed allocation
Currently the allocation of buf is not being null checked and
a null pointer dereference can occur when the memory allocation fails.
Fix this by adding a check and returning -ENOMEM.

Addresses-Coverity: ("Dereference null return")
Fixes: 6d972518b821 ("NFS: Add fs_context support.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Geert Uytterhoeven
474c4f306e nfs: NFS_SWAP should depend on SWAP
If CONFIG_SWAP=n, it does not make much sense to offer the user the
option to enable support for swapping over NFS, as that will still fail
at run time:

    # swapon /swap
    swapon: /swap: swapon failed: Function not implemented

Fix this by adding a dependency on CONFIG_SWAP.

Fixes: a564b8f039 ("nfs: enable swap on NFS")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Murphy Zhou
bd89bc67f6 fs/nfs, swapon: check holes in swapfile
swapon over NFS does not go through generic_swapfile_activate
code path when setting up extents. This makes holes in NFS
swapfiles possible which is not expected for swapon.

Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Chuck Lever
2bb50aabb6 NFS4: Report callback authentication errors
This seems to be a somewhat common issue with Kerberos NFSv4.0
set-ups.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Chuck Lever
861e1671bc NFS: Introduce trace events triggered by page writeback errors
Try to capture the reason for the writeback path tagging an error on
a page.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
zhengbin
6ed2144a80 NFS: move dprintk after nfs_alloc_fattr in nfs3_proc_lookup
In nfs3_proc_lookup, if nfs_alloc_fattr fails, will only print
"NFS call lookup". This may be confusing, move dprintk after
nfs_alloc_fattr.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
zhengbin
8b98a53248 NFS4: Remove unneeded semicolon
Fixes coccicheck warning:

fs/nfs/nfs4state.c:1138:2-3: Unneeded semicolon
fs/nfs/nfs4proc.c:6862:2-3: Unneeded semicolon
fs/nfs/nfs4proc.c:8629:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:31 -05:00
Arnd Bergmann
a3167dacba nfs: encode nfsv4 timestamps as 64-bit
On 32-bit architectures, xdr_encode_nfstime4() needlessly
truncates timestamps to a 32-bit value in the range between
year 1902 and 2038.

Change it to use 'struct timespec64' to allow the entire range
of values supported by the server.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:30 -05:00
Arnd Bergmann
e5189e9a51 nfs: remove timespec from xdr_encode_nfstime
For NFSv2 and NFSv3, timestamps are stored using 32-bit entities
and overflow in y2038. For historic reasons we truncate the
64-bit timestamps by converting from a timespec64 to a timespec
first.

Remove this unnecessary conversion step and do the truncation
in the final functions that take a timestamp.

This is transparent to users, but avoids one of the last uses
of 'timespec' and lets us remove it later.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:30 -05:00
Arnd Bergmann
bc35b6b0cf nfs: fscache: use timespec64 in inode auxdata
nfs currently behaves differently on 32-bit and 64-bit kernels regarding
the on-disk format of nfs_fscache_inode_auxdata.

That format should really be the same on any kernel, and we should avoid
the 'timespec' type in order to remove that from the kernel later on.

Using plain 'timespec64' would not be good here, since that includes
implied padding and would possibly leak kernel stack data to the on-disk
format on 32-bit architectures.

struct __kernel_timespec would work as a replacement, but open-coding
the two struct members in nfs_fscache_inode_auxdata makes it more
obvious what's going on here, and keeps the current format for 64-bit
architectures.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:30 -05:00
Arnd Bergmann
ae08483cdd nfs: use timespec64 in nfs_fattr
Push down the use of timespec64 into NFS nfs_fattr, to avoid needless
conversions, and get closer to having 64-bit time_t support on 32-bit
NFSv4 and removing some old interfaces from the kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:30 -05:00
Scott Mayhew
ce8866f091 NFS: Attach supplementary error information to fs_context.
Split out from commit "NFS: Add fs_context support."

Add wrappers nfs_errorf(), nfs_invalf(), and nfs_warnf() which log error
information to the fs_context.  Convert some printk's to use these new
wrappers instead.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew
62a55d088c NFS: Additional refactoring for fs_context conversion
Split out from commit "NFS: Add fs_context support."

This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
     nfs_clone_mount has had several fields removed, and nfs_mount_info
     has been removed altogether.
 (*) Various functions now take an fs_context as an argument instead
     of nfs_mount_info, nfs_fs_context, etc.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
f2aedb713c NFS: Add fs_context support.
Add filesystem context support to NFS, parsing the options in advance and
attaching the information to struct nfs_fs_context.  The highlights are:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.  This
     structure represents NFS's superblock config.

 (*) Make use of the VFS's parsing support to split comma-separated lists

 (*) Pin the NFS protocol module in the nfs_fs_context.

 (*) Attach supplementary error information to fs_context.  This has the
     downside that these strings must be static and can't be formatted.

 (*) Remove the auxiliary file_system_type structs since the information
     necessary can be conveyed in the nfs_fs_context struct instead.

 (*) Root mounts are made by duplicating the config for the requested mount
     so as to have the same parameters.  Submounts pick up their parameters
     from the parent superblock.

[AV -- retrans is u32, not string]
[SM -- Renamed cfg to ctx in a few functions in an earlier patch]
[SM -- Moved fs_context mount option parsing to an earlier patch]
[SM -- Moved fs_context error logging to a later patch]
[SM -- Fixed printks in nfs4_try_get_tree() and nfs4_get_referral_tree()]
[SM -- Added is_remount_fc() helper]
[SM -- Deferred some refactoring to a later patch]
[SM -- Fixed referral mounts, which were broken in the original patch]
[SM -- Fixed leak of nfs_fattr when fs_context is freed]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew
e38bb238ed NFS: Convert mount option parsing to use functionality from fs_parser.h
Split out from commit "NFS: Add fs_context support."

Convert existing mount option definitions to fs_parameter_enum's and
fs_parameter_spec's.  Parse mount options using fs_parse() and
lookup_constant().

Notes:

1) Fixed a typo in the udp6 definition in nfs_xprt_protocol_tokens
from the original commit.

2) fs_parse() expects an fs_context as the first arg so that any
errors can be logged to the fs_context.  We're passing NULL for the
fs_context (this will change in commit "NFS: Add fs_context support.")
which is okay as it will cause logfc() to do a printk() instead.

3) fs_parse() expects an fs_paramter as the third arg.  We're
building an fs_parameter manually in nfs_fs_context_parse_option(),
which will go away in commit "NFS: Add fs_context support.".

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew
38465f5d1a NFS: rename nfs_fs_context pointer arg in a few functions
Split out from commit "NFS: Add fs_context support."

Rename cfg to ctx in nfs_init_server(), nfs_verify_authflavors(),
and nfs_request_mount().  No functional changes.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
e558100fda NFS: Do some tidying of the parsing code
Do some tidying of the parsing code, including:

 (*) Returning 0/error rather than true/false.

 (*) Putting the nfs_fs_context pointer first in some arg lists.

 (*) Unwrap some lines that will now fit on one line.

 (*) Provide unioned sockaddr/sockaddr_storage fields to avoid casts.

 (*) nfs_parse_devname() can paste its return values directly into the
     nfs_fs_context struct as that's where the caller puts them.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
48be8a66cf NFS: Add a small buffer in nfs_fs_context to avoid string dup
Add a small buffer in nfs_fs_context to avoid string duplication when
parsing numbers.  Also make the parsing function wrapper place the parsed
integer directly in the appropriate nfs_fs_context struct member.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
cbd071b5da NFS: Deindent nfs_fs_context_parse_option()
Deindent nfs_fs_context_parse_option().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
f8ee01e3e2 NFS: Split nfs_parse_mount_options()
Split nfs_parse_mount_options() to move the prologue, list-splitting and
epilogue into one function and the per-option processing into another.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
5eb005caf5 NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx".  At some point this will be pointed to by an
fs_context struct's fs_private pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
e0a626b124 NFS: Constify mount argument match tables
The mount argument match tables should never be altered so constify them.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
9954bf92c0 NFS: Move mount parameterisation bits into their own file
Split various bits relating to mount parameterisation out from
fs/nfs/super.c into their own file to form the basis of filesystem context
handling for NFS.

No other changes are made to the code beyond removing 'static' qualifiers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Al Viro
adf2314fe6 nfs: get rid of ->set_security()
it's always either nfs_set_sb_security() or nfs_clone_sb_security(),
the choice being controlled by mount_info->cloned != NULL.  No need
to add methods, especially when both instances live right next to
the caller and are never accessed anywhere else.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
ba8b614806 nfs_clone_sb_security(): simplify the check for server bogosity
We used to check ->i_op for being nfs_dir_inode_operations.  With
separate inode_operations for v3 and v4 that became bogus, but
rather than going for protocol-dependent comparison we could've
just checked ->i_fop instead; _that_ is the same for all protocol
versions.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
ab88dca311 nfs: get rid of mount_info ->fill_super()
The only possible values are nfs_fill_super and nfs_clone_super.  The
latter is used only when crossing into a submount and it is almost
identical to the former; the only differences are
	* ->s_time_gran unconditionally set to 1 (even for v2 mounts).
Regression dating back to 2012, actually.
	* ->s_blocksize/->s_blocksize_bits set to that of parent.

Rather than messing with the method, stash ->s_blocksize_bits in
mount_info in submount case and after the (now unconditional)
call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
if that has been set.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
0c38f2131d nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
1bc3a2cbf2 nfs: unexport nfs_fs_mount_common()
Make it static, even.  And remove a stale extern of (long-gone)
nfs_xdev_mount_common() from internal.h, while we are at it.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
82eaed2bee nfs: merge xdev and remote file_system_type
they are identical now...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
a55d3297be nfs: don't bother passing nfs_subversion to ->try_mount() and nfs_fs_mount_common()
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
6a3f7a399e nfs: stash nfs_subversion reference into nfs_mount_info
That will allow to get rid of passing those references around in
quite a few places.  Moreover, that will allow to merge xdev and
remote file_system_type.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
250d69f6a4 nfs: lift setting mount_info from nfs_xdev_mount()
Do it in nfs_do_submount() instead.  As a side benefit, nfs_clone_data
doesn't need ->fh and ->fattr anymore.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
4e357761bd nfs4: fold nfs_do_root_mount/nfs_follow_remote_path
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
6654f8e246 nfs: don't bother setting/restoring export_path around do_nfs_root_mount()
nothing in it will be looking at that thing anyway

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
15a9c4eff6 nfs: fold nfs4_remote_fs_type and nfs4_remote_referral_fs_type
They are identical now.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
7643c12e95 nfs: lift setting mount_info from nfs4_remote{,_referral}_mount
Do that (fhandle allocation, setting struct server up) in
nfs4_referral_mount() and nfs4_try_mount() resp. and pass the
server and pointer to mount_info into nfs_do_root_mount() so that
nfs4_remote_referral_mount()/nfs_remote_mount() could be merged.

Since we are moving stuff from ->mount() instances to the points
prior to vfs_kern_mount() that would trigger those, we need to
make sure that do_nfs_root_mount() will do the corresponding
cleanup itself if it doesn't trigger those ->mount() instances.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
d0b779d47c nfs: stash server into struct nfs_mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
444a52960c saner calling conventions for nfs_fs_mount_common()
Allow it to take ERR_PTR() for server and return ERR_CAST() of it in
such case.  All callers used to open-code that...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
c64cd6e34e reimplement path_mountpoint() with less magic
... and get rid of a bunch of bugs in it.  Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the root of what it's trying to umount.
The thing we want to avoid actually happen from complete_walk();
solution was to do something parallel to normal path_lookupat()
and it both went overboard and got the boilerplate subtly
(and not so subtly) wrong.

A better solution is to do pretty much what the normal path_lookupat()
does, but instead of complete_walk() do unlazy_walk().  All it takes
to avoid that ->d_weak_revalidate() call...  mountpoint_last() goes
away, along with everything it got wrong, and so does the magic around
LOOKUP_NO_REVAL.

Another source of bugs is that when we traverse mounts at the final
location (and we need to do that - umount . expects to get whatever's
overmounting ., if any, out of the lookup) we really ought to take
care of ->d_manage() - as it is, manual umount of autofs automount
in progress can lead to unpleasant surprises for the daemon.  Easily
solved by using handle_lookup_down() instead of follow_mount().

Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-15 01:36:06 -05:00
Arnd Bergmann
6e31ded689 nfs: fscache: use timespec64 in inode auxdata
nfs currently behaves differently on 32-bit and 64-bit kernels regarding
the on-disk format of nfs_fscache_inode_auxdata.

That format should really be the same on any kernel, and we should avoid
the 'timespec' type in order to remove that from the kernel later on.

Using plain 'timespec64' would not be good here, since that includes
implied padding and would possibly leak kernel stack data to the on-disk
format on 32-bit architectures.

struct __kernel_timespec would work as a replacement, but open-coding
the two struct members in nfs_fscache_inode_auxdata makes it more
obvious what's going on here, and keeps the current format for 64-bit
architectures.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-18 18:07:33 +01:00
Arnd Bergmann
057f184b12 nfs: fix timstamp debug prints
Starting in v5.5, the timestamps are correctly passed down as
64-bit seconds with NFSv4 on 32-bit machines, but some debug
statements still truncate them to 'long'.

Fixes: e86d5a0287 ("NFS: Convert struct nfs_fattr to use struct timespec64")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-18 18:07:32 +01:00
Chuck Lever
21f86d2d63 NFS4: Trace lock reclaims
One of the most frustrating messages our sustaining team sees is
the "Lock reclaim failed!" message. Add some observability in the
client's lock reclaim logic so we can capture better data the
first time a problem occurs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 11:04:32 +01:00
Chuck Lever
511ba52e4c NFS4: Trace state recovery operation
Add a trace point in the main state manager loop to observe state
recovery operation. Help track down state recovery bugs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:58:39 +01:00
Olga Kornievskaia
f751c54525 NFSv4.2 fix memory leak in nfs42_ssc_open
Static analysis with Coverity detected a memory leak

Reported-by: Colin King <colin.king@canonical.com>
Fixes: ec4b092508 ("NFS: inter ssc open")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:50:41 +01:00
Olga Kornievskaia
66588abe2d NFSv4.2 fix kfree in __nfs42_copy_file_range
This is triggering problems with static analysis with Coverity

Reported-by: Colin King <colin.king@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:50:30 +01:00
YueHaibing
843aa17a35 NFS: remove duplicated include from nfs4file.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:47:39 +01:00
YueHaibing
0003010424 NFSv4: Make _nfs42_proc_copy_notify() static
Fix sparse warning:

fs/nfs/nfs42proc.c:527:5: warning:
 symbol '_nfs42_proc_copy_notify' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:47:38 +01:00
Anna Schumaker
913eca1aea NFS: Fallocate should use the nfs4_fattr_bitmap
Changing a sparse file could have an effect not only on the file size,
but also on the number of blocks used by the file in the underlying
filesystem. The server's cache_consistency_bitmap doesn't update the
SPACE_USED attribute, so let's switch to the nfs4_fattr_bitmap to catch
this update whenever we do an ALLOCATE or DEALLOCATE.

This patch fixes xfstests generic/568, which tests that fallocating an
unaligned range allocates all blocks touched by that range. Without this
patch, `stat` reports 0 bytes used immediately after the fallocate.
Adding a `sleep 5` to the test also catches the update, but it's better
to do so when we know something has changed.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:47:05 +01:00
Anna Schumaker
89658c4d04 NFS: Return -ETXTBSY when attempting to write to a swapfile
My understanding is that -EBUSY refers to the underlying device, and
that -ETXTBSY is used when attempting to access a file in use by the
kernel (like a swapfile). Changing this return code helps us pass
xfstests generic/569

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:43:24 +01:00
Saurav Girepunje
0e96322b24 fs: nfs: sysfs: Remove NULL check before kfree
Remove NULL check before kfree, NULL check is taken care
on kfree.

Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:38:04 +01:00
YueHaibing
9c91fa36b6 NFS: remove unneeded semicolon
remove unneeded semicolon.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:37:58 +01:00
Ben Dooks
d49dd11753 NFSv4: add declaration of current_stateid
The current_stateid is exported from nfs4state.c but not
declared in any of the headers. Add to nfs4_fs.h to
remove the following warning:

fs/nfs/nfs4state.c:80:20: warning: symbol 'current_stateid' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:36:45 +01:00
Trond Myklebust
5326de9e94 NFSv4.x: Drop the slot if nfs4_delegreturn_prepare waits for layoutreturn
If nfs4_delegreturn_prepare needs to wait for a layoutreturn to complete
then make sure we drop the sequence slot if we hold it.

Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-13 16:37:17 +01:00
Trond Myklebust
5c441544f0 NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()
If the server returns a bad or dead session error, the we don't want
to update the session slot number, but just immediately schedule
recovery and allow it to proceed.

We can/should then remove handling in other places

Fixes: 3453d5708b ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-13 16:37:17 +01:00
Trond Myklebust
807ce06c24 Merge branch 'linux-ssc-for-5.5' 2019-11-06 08:55:23 -05:00
Trond Myklebust
43622eab8d NFS: Add a tracepoint in nfs_fh_to_dentry()
Add a tracepoint in nfs_fh_to_dentry() for debugging issues with bad
userspace filehandles.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
70d136b2dc NFSv4: Don't retry the GETATTR on old stateid in nfs4_delegreturn_done()
If the server returns NFS4ERR_OLD_STATEID, then just skip retrying the
GETATTR when replaying the delegreturn compound. We know nothing will
have changed on the server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
246afc0aa5 NFSv4: Handle NFS4ERR_OLD_STATEID in delegreturn
If the server returns NFS4ERR_OLD_STATEID in response to our delegreturn,
we want to sync to the most recent seqid for the delegation stateid. However
if we are already at the most recent, we have two possibilities:

- an OPEN reply is still outstanding and will return a new seqid
- an earlier OPEN reply was dropped on the floor due to a timeout.

In the latter case, we may end up unable to complete the delegreturn,
so we want to bump the seqid to a value greater than the cached value.
While this may cause us to lose the delegation in the former case,
it should now be safe to assume that the client will replay the OPEN
if necessary in order to get a new valid stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
ee05f45677 NFSv4: Fix races between open and delegreturn
If the server returns the same delegation in an open that we just used
in a delegreturn, we need to ensure we don't apply that stateid if
the delegreturn has freed it on the server.
To do so, we ensure that we do not free the storage for the delegation
until either it is replaced by a new one, or we throw the inode out of
cache.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
42c304c34e NFS: nfs_inode_find_state_and_recover() fix stateid matching
In nfs_inode_find_state_and_recover() we want to mark for recovery
only those stateids that match or are older than the supplied
stateid parameter.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
3887ce1aac NFSv4: Fix nfs4_inode_make_writeable()
Fix the checks in nfs4_inode_make_writeable() to ignore the case where
we hold no delegations. Currently, in such a case, we automatically
flush writes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
40e6aa10aa NFSv4: nfs4_return_incompatible_delegation() should check delegation validity
Ensure that we check that the delegation is valid in
nfs4_return_incompatible_delegation() before we try to return it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
1deed57235 NFSv4: Don't reclaim delegations that have been returned or revoked
If the delegation has already been revoked, we want to avoid reclaiming
it on reboot.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
af20b7b850 NFSv4: Ignore requests to return the delegation if it was revoked
If the delegation was revoked, or is already being returned, just
clear the NFS_DELEGATION_RETURN and NFS_DELEGATION_RETURN_IF_CLOSED
flags and keep going.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
d51f91d262 NFSv4: Revoke the delegation on success in nfs4_delegreturn_done()
If the delegation was successfully returned, then mark it as revoked.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
f2d47b5502 NFSv4: Update the stateid seqid in nfs_revoke_delegation()
If we revoke a delegation, but the stateid's seqid is newer, then
ensure we update the seqid when marking the delegation as revoked.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:46 -05:00
Trond Myklebust
ae084a32ee NFSv4: Clear the NFS_DELEGATION_REVOKED flag in nfs_update_inplace_delegation()
If the server sent us a new delegation stateid that is more recent than
the one that got revoked, then clear the NFS_DELEGATION_REVOKED flag.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
e0f07896af NFSv4: Hold the delegation spinlock when updating the seqid
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
f9e0cc9c97 NFSv4: Don't remove the delegation from the super_list more than once
Add a check to ensure that we haven't already removed the delegation
from the inode after we take all the relevant locks.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
b47e0e478c NFS: Rename nfs_inode_return_delegation_noreclaim()
Rename nfs_inode_return_delegation_noreclaim() to
nfs_inode_evict_delegation(), which better describes what it
does.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
b57562087b NFSv4: fail nfs4_refresh_delegation_stateid() when the delegation was revoked
If the delegation was revoked, we don't want to retry the delegreturn.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
457a50424b NFSv4: Delegation recalls should not find revoked delegations
If we're processsing a delegation recall, ignore the delegations that
have already been revoked or returned.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
5decae1623 NFSv4: nfs4_callback_getattr() should ignore revoked delegations
If the delegation has been revoked, ignore it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
333ac786a1 NFSv4: Fix delegation handling in update_open_stateid()
If the delegation is marked as being revoked, then don't use it in
the open state structure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
e6237b6feb NFSv4.1: Don't rebind to the same source port when reconnecting to the server
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.

NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
52f98f1a2d NFS/pnfs: Separate NFSv3 DS and MDS traffic
If a NFSv3 server is being used as both a DS and as a regular NFSv3 server,
we may want to keep the IO traffic on a separate TCP connection, since
it will typically have very different timeout characteristics.

This patch therefore sets up a flag to separate the two modes of operation
for the nfs_client.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
c6eb58435b pNFS: nfs3_set_ds_client should set NFS_CS_NOPING
Connecting to the DS is a non-interactive, asynchronous task, so there is
no reason to fire up an extra RPC null ping in order to ensure that the
server is up.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
4b1b69cedf NFS: Add a flag to tell nfs_client to set RPC_CLNT_CREATE_NOPING
Add a flag to tell the nfs_client it should set RPC_CLNT_CREATE_NOPING when
creating the rpc client.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
d0372b679c NFS: Use non-atomic bit ops when initialising struct nfs_client_initdata
We don't need atomic bit ops when initialising a local structure on the
stack.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
6430b323ae NFSv3: Clean up timespec encode
Simplify the struct iattr timestamp encoding by skipping the step of
an intermediate struct timespec.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
c9dbfd961b NFSv2: Clean up timespec encode
Simplify the struct iattr timestamp encoding by skipping the step of
an intermediate struct timespec.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
ad97a995d8 NFSv2: Fix a typo in encode_sattr()
Encode the mtime correctly.

Fixes: 95582b0083 ("vfs: change inode times to use struct timespec64")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
7d34ff5141 NFSv4: NFSv4 callbacks also support 64-bit timestamps
Convert the NFSv4 callbacks to use struct timestamp64, rather than
truncating times to 32-bit values.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
e7d4b05c5e NFSv4: Encode 64-bit timestamps
NFSv4 supports 64-bit timestamps, so there is no point in converting
the struct iattr timestamps to 32-bits before encoding.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
e86d5a0287 NFS: Convert struct nfs_fattr to use struct timespec64
NFSv4 supports 64-bit times, so we should switch to using struct
timespec64 when decoding attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
22a1ae9a93 NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts
If we set nfs_mountpoint_expiry_timeout to a negative value, then
allow that to imply that we do not expire NFSv4 submounts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Trond Myklebust
79cc55422c NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
A typo in nfs4_refresh_delegation_stateid() means we're leaking an
RCU lock, and always returning a value of 'false'. As the function
description states, we were always supposed to return 'true' if a
matching delegation was found.

Fixes: 12f275cdd1 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-11-01 11:03:56 -04:00
Trond Myklebust
be3df3dd4c NFSv4: Don't allow a cached open with a revoked delegation
If the delegation is marked as being revoked, we must not use it
for cached opens.

Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-11-01 10:59:26 -04:00
Chuck Lever
1047ec8683 NFSv4: Fix leak of clp->cl_acceptor string
Our client can issue multiple SETCLIENTID operations to the same
server in some circumstances. Ensure that calls to
nfs4_proc_setclientid() after the first one do not overwrite the
previously allocated cl_acceptor string.

unreferenced object 0xffff888461031800 (size 32):
  comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s)
  hex dump (first 32 bytes):
    6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31  nfs@klimt.ib.101
    35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00  5granger.net....
  backtrace:
    [<00000000ab820188>] __kmalloc+0x128/0x176
    [<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss]
    [<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4]
    [<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4]
    [<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4]
    [<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4]
    [<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4]
    [<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4]
    [<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4]
    [<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c
    [<00000000797e947c>] vfs_get_tree+0x20/0xc7
    [<00000000ecabaaa8>] fc_mount+0xe/0x36
    [<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d
    [<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4]
    [<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4]
    [<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs]

Fixes: f11b2a1cfb ("nfs4: copy acceptor name from context ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-10 16:14:02 -04:00
Olga Kornievskaia
8dff1df551 NFS: replace cross device check in copy_file_range
Add a check to disallow cross file systems copy offload, both
files are expected to be of NFS4.2+ type.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:26 -04:00
Olga Kornievskaia
1275101026 NFS based on file size issue sync copy or fallback to generic copy offload
For small file sizes, it make sense to issue a synchronous copy (and
save an RPC callback operation). Also, for the inter copy offload,
copy len must be larger than the cost of doing a mount between the
destination and source server (14RPCs are sent during 4.x mount).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:22 -04:00
Olga Kornievskaia
0e65a32c8a NFS: handle source server reboot
When the source server reboots after a server-to-server copy was
issued, we need to retry the copy from COPY_NOTIFY. We need to
detect that the source server rebooted and there is a copy waiting
on a destination server and wake it up.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:19 -04:00
Olga Kornievskaia
fefa1a812a NFS handle NFS4ERR_PARTNER_NO_AUTH error
When a destination server sends a READ to the source server it can
get a NFS4ERR_PARTNER_NO_AUTH, which means a copy needs to be
restarted.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:15 -04:00
Olga Kornievskaia
124060255d NFS: also send OFFLOAD_CANCEL to source server
In case of copy is cancelled, also send OFFLOAD_CANCEL to the source
server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:10 -04:00
Olga Kornievskaia
6b61c969d5 NFS: COPY handle ERR_OFFLOAD_DENIED
If server sends ERR_OFFLOAD_DENIED error, the client must fall
back on doing copy the normal way. Return ENOTSUPP to the vfs and
fallback to regular copy.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:05 -04:00
Olga Kornievskaia
7e350197a1 NFS: for "inter" copy treat ESTALE as ENOTSUPP
If the client sends an "inter" copy to the destination server but
it only supports "intra" copy, it can return ESTALE (since it
doesn't know anything about the file handle from the other server
and does not recognize the special case of "inter" copy). Translate
this error as ENOTSUPP and also send OFFLOAD_CANCEL to the source
server so that it can clean up state.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:06:00 -04:00
Olga Kornievskaia
0b9018b9ca NFS: skip recovery of copy open on dest server
Mark the open created for the source file on the destination
server. Then if this open is going thru a recovery, then fail
the recovery as we don't need to be recoving a "fake" open.
We need to fail the ongoing READs and vfs_copy_file_range().

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:05:56 -04:00
Olga Kornievskaia
ec4b092508 NFS: inter ssc open
NFSv4.2 inter server to server copy requires the destination server to
READ the data from the source server using the provided stateid and
file handle.

Given an NFSv4 stateid and filehandle from the COPY operaion, provide the
destination server with an NFS client function to create a struct file
suitable for the destiniation server to READ the data to be copied.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
2019-10-09 12:05:52 -04:00
Olga Kornievskaia
1d38f3f0d7 NFS: add ca_source_server<> to COPY
Support only one source server address: the same address that
the client and source server use.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:05:49 -04:00
Olga Kornievskaia
0491567b51 NFS: add COPY_NOTIFY operation
Try using the delegation stateid, then the open stateid.

Only NL4_NETATTR, No support for NL4_NAME and NL4_URL.
Allow only one source server address to be returned for now.

To distinguish between same server copy offload ("intra") and
a copy between different server ("inter"), do a check of server
owner identity and also make sure server is capable of doing
a copy offload.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
2019-10-09 12:05:45 -04:00
Trond Myklebust
0b57484779 NFS: Remove redundant mirror tracking in O_DIRECT
We no longer need the extra mirror length tracking in the O_DIRECT code,
as we are able to track the maximum contiguous length in dreq->max_count.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-09 11:45:59 -04:00
Trond Myklebust
031d73ed76 NFS: Fix O_DIRECT accounting of number of bytes read/written
When a series of O_DIRECT reads or writes are truncated, either due to
eof or due to an error, then we should return the number of contiguous
bytes that were received/sent starting at the offset specified by the
application.

Currently, we are failing to correctly check contiguity, and so we're
failing the generic/465 in xfstests when the race between the read
and write RPCs causes the file to get extended while the 2 reads are
outstanding. If the first read RPC call wins the race and returns with
eof set, we should treat the second read RPC as being truncated.

Reported-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Fixes: 1ccbad9f9f ("nfs: fix DIO good bytes calculation")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-09 11:45:59 -04:00
ZhangXiaoxu
33ea5aaa87 nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
When xfstests testing, there are some WARNING as below:

WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8
Modules linked in:
CPU: 0 PID: 6235 Comm: umount.nfs
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : nfs_clear_inode+0x9c/0xd8
lr : nfs_evict_inode+0x60/0x78
sp : fffffc000f68fc00
x29: fffffc000f68fc00 x28: fffffe00c53155c0
x27: fffffe00c5315000 x26: fffffc0009a63748
x25: fffffc000f68fd18 x24: fffffc000bfaaf40
x23: fffffc000936d3c0 x22: fffffe00c4ff5e20
x21: fffffc000bfaaf40 x20: fffffe00c4ff5d10
x19: fffffc000c056000 x18: 000000000000003c
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000040 x14: 0000000000000228
x13: fffffc000c3a2000 x12: 0000000000000045
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 0000000000000000 x6 : fffffc00084b027c
x5 : fffffc0009a64000 x4 : fffffe00c0e77400
x3 : fffffc000c0563a8 x2 : fffffffffffffffb
x1 : 000000000000764e x0 : 0000000000000001
Call trace:
 nfs_clear_inode+0x9c/0xd8
 nfs_evict_inode+0x60/0x78
 evict+0x108/0x380
 dispose_list+0x70/0xa0
 evict_inodes+0x194/0x210
 generic_shutdown_super+0xb0/0x220
 nfs_kill_super+0x40/0x88
 deactivate_locked_super+0xb4/0x120
 deactivate_super+0x144/0x160
 cleanup_mnt+0x98/0x148
 __cleanup_mnt+0x38/0x50
 task_work_run+0x114/0x160
 do_notify_resume+0x2f8/0x308
 work_pending+0x8/0x14

The nrequest should be increased/decreased only if PG_INODE_REF flag
was setted.

But in the nfs_inode_remove_request function, it maybe decrease when
no PG_INODE_REF flag, this maybe lead nrequests count error.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-02 08:52:17 -04:00
Linus Torvalds
972a2bf7df NFS Client Updates for Linux 5.3
Stable bugfixes:
 - Dequeue the request from the receive queue while we're re-encoding # v4.20+
 - Fix buffer handling of GSS MIC without slack # 5.1
 
 Features:
 - Increase xprtrdma maximum transport header and slot table sizes
 - Add support for nfs4_call_sync() calls using a custom rpc_task_struct
 - Optimize the default readahead size
 - Enable pNFS filelayout LAYOUTGET on OPEN
 
 Other bugfixes and cleanups:
 - Fix possible null-pointer dereferences and memory leaks
 - Various NFS over RDMA cleanups
 - Various NFS over RDMA comment updates
 - Don't receive TCP data into a reset request buffer
 - Don't try to parse incomplete RPC messages
 - Fix congestion window race with disconnect
 - Clean up pNFS return-on-close error handling
 - Fixes for NFS4ERR_OLD_STATEID handling
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl2NC04ACgkQ18tUv7Cl
 QOs4Tg//bAlGs+dIKixAmeMKmTd6I34laUnuyV/12yPQDgo6bryLrTngfe2BYvmG
 2l+8H7yHfR4/gQE4vhR0c15xFgu6pvjBGR0/nNRaXienIPXO4xsQkcaxVA7SFRY2
 HjffZwyoBfjyRps0jL+2sTsKbRtSkf9Dn+BONRgesg51jK1jyWkXqXpmgi4uMO4i
 ojpTrW81dwo7Yhv08U2A/Q1ifMJ8F9dVYuL5sm+fEbVI/Nxoz766qyB8rs8+b4Xj
 3gkfyh/Y1zoMmu6c+r2Q67rhj9WYbDKpa6HH9yX1zM/RLTiU7czMX+kjuQuOHWxY
 YiEk73NjJ48WJEep3odess1q/6WiAXX7UiJM1SnDFgAa9NZMdfhqMm6XduNO1m60
 sy0i8AdxdQciWYexOXMsBuDUCzlcoj4WYs1QGpY3uqO1MznQS/QUfu65fx8CzaT5
 snm6ki5ivqXH/js/0Z4MX2n/sd1PGJ5ynMkekxJ8G3gw+GC/oeSeGNawfedifLKK
 OdzyDdeiel5Me1p4I28j1WYVLHvtFmEWEU9oytdG0D/rjC/pgYgW/NYvAao8lQ4Z
 06wdcyAM66ViAPrbYeE7Bx4jy8zYRkiw6Y3kIbLgrlMugu3BhIW5Mi3BsgL4f4am
 KsqkzUqPZMCOVwDuUILSuPp4uHaR+JTJttywiLniTL6reF5kTiA=
 =4Ey6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Stable bugfixes:
   - Dequeue the request from the receive queue while we're re-encoding
     # v4.20+
   - Fix buffer handling of GSS MIC without slack # 5.1

  Features:
   - Increase xprtrdma maximum transport header and slot table sizes
   - Add support for nfs4_call_sync() calls using a custom
     rpc_task_struct
   - Optimize the default readahead size
   - Enable pNFS filelayout LAYOUTGET on OPEN

  Other bugfixes and cleanups:
   - Fix possible null-pointer dereferences and memory leaks
   - Various NFS over RDMA cleanups
   - Various NFS over RDMA comment updates
   - Don't receive TCP data into a reset request buffer
   - Don't try to parse incomplete RPC messages
   - Fix congestion window race with disconnect
   - Clean up pNFS return-on-close error handling
   - Fixes for NFS4ERR_OLD_STATEID handling"

* tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits)
  pNFS/filelayout: enable LAYOUTGET on OPEN
  NFS: Optimise the default readahead size
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Fix OPEN_DOWNGRADE error handling
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  NFSv4: Add a helper to increment stateid seqids
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Clean up pNFS return-on-close error handling
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  NFS: remove unused check for negative dentry
  NFSv3: use nfs_add_or_obtain() to create and reference inodes
  NFS: Refactor nfs_instantiate() for dentry referencing callers
  SUNRPC: Fix congestion window race with disconnect
  SUNRPC: Don't try to parse incomplete RPC messages
  SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic
  SUNRPC: Fix buffer handling of GSS MIC without slack
  SUNRPC: RPC level errors should always set task->tk_rpc_status
  SUNRPC: Don't receive TCP data into a request buffer that has been reset
  ...
2019-09-26 12:20:14 -07:00
Olga Kornievskaia
a8fd0feeca pNFS/filelayout: enable LAYOUTGET on OPEN
Add the flag to the filelayout driver to add LAYOUTGET to
the OPEN compound.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-24 16:28:38 -04:00
Trond Myklebust
c128e57551 NFS: Optimise the default readahead size
In the years since the max readahead size was fixed in NFS, a number of
things have happened:
- Users can now set the value directly using /sys/class/bdi
- NFS max supported block sizes have increased by several orders of
  magnitude from 64K to 1MB.
- Disk access latencies are orders of magnitude faster due to SSD + NVME.

In particular note that if the server is advertising 1MB as the optimal
read size, as that will set the readahead size to 15MB.
Let's therefore adjust down, and try to default to VM_READAHEAD_PAGES.
However let's inform the VM about our preferred block size so that it
can choose to round up in cases where that makes sense.

Reported-by: Alkis Georgopoulos <alkisg@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-24 15:58:20 -04:00
Trond Myklebust
32c6e7eee3 NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
If a LOCKU request receives a NFS4ERR_OLD_STATEID, then bump the
seqid before resending. Ensure we only bump the seqid by 1.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 16:00:14 -04:00
Trond Myklebust
0e0cb35b41 NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
If a CLOSE or OPEN_DOWNGRADE operation receives a NFS4ERR_OLD_STATEID
then bump the seqid before resending. Ensure we only bump the seqid
by 1.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:56:19 -04:00
Trond Myklebust
e217e825dc NFSv4: Fix OPEN_DOWNGRADE error handling
If OPEN_DOWNGRADE returns a state error, then we want to initiate
state recovery in addition to marking the stateid as closed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:52:44 -04:00
Trond Myklebust
30cb3ee299 pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
If a LAYOUTRETURN receives a reply of NFS4ERR_OLD_STATEID then assume we've
missed an update, and just bump the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:48:35 -04:00
Trond Myklebust
9228395709 NFSv4: Add a helper to increment stateid seqids
Add a helper function to increment stateid seqids according to the
rules specified in RFC5661 Section 8.2.2.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:44:45 -04:00
Trond Myklebust
6109bcf713 NFSv4: Handle RPC level errors in LAYOUTRETURN
Handle RPC level errors by assuming that the RPC call was successful.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:39:56 -04:00