Commit graph

443 commits

Author SHA1 Message Date
Anna Schumaker
f989065218 NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
commit a421d21860 upstream.

Commit de144ff423 changes _pnfs_return_layout() to call
pnfs_mark_matching_lsegs_return() passing NULL as the struct
pnfs_layout_range argument. Unfortunately,
pnfs_mark_matching_lsegs_return() doesn't check if we have a value here
before dereferencing it, causing an oops.

I'm able to hit this crash consistently when running connectathon basic
tests on NFS v4.1/v4.2 against Ontap.

Fixes: de144ff423 ("NFSv4: Don't discard segments marked for return in _pnfs_return_layout()")
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-03 09:00:27 +02:00
Trond Myklebust
2fafe7d504 NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
commit de144ff423 upstream.

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: 6d597e1750 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:34 +02:00
Trond Myklebust
334165d9fb NFS: Don't discard pNFS layout segments that are marked for return
commit 39fd018636 upstream.

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: e0b7d420f7 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:34 +02:00
Trond Myklebust
ff557bf971 pNFS/NFSv4: Improve rejection of out-of-order layouts
[ Upstream commit d29b468da4 ]

If a layoutget ends up being reordered w.r.t. a layoutreturn, e.g. due
to a layoutget-on-open not knowing a priori which file to lock, then we
must assume the layout is no longer being considered valid state by the
server.
Incrementally improve our ability to reject such states by using the
cached old stateid in conjunction with the plh_barrier to try to
identify them.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-13 13:55:06 +01:00
Trond Myklebust
386b142945 pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
[ Upstream commit 08bd8dbe88 ]

If the server returns a new stateid that does not match the one in our
cache, then try to return the one we hold instead of just invalidating
it on the client side. This ensures that both client and server will
agree that the stateid is invalid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-13 13:55:05 +01:00
Trond Myklebust
d46c0d64db pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
[ Upstream commit 1bcf34fdac ]

When we're scheduling a layoutreturn, we need to ignore any further
incoming layouts with sequence ids that are going to be affected by the
layout return.

Fixes: 44ea8dfce0 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-03 23:28:47 +01:00
Trond Myklebust
dba0d4b150 pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
[ Upstream commit 814b849713 ]

If the server returns a new stateid that does not match the one in our
cache, then pnfs_layout_process() will leak the layout segments returned
by pnfs_mark_layout_stateid_invalid().

Fixes: 9888d837f3 ("pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-03 23:28:47 +01:00
Trond Myklebust
01a12a24f9 NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter
commit cb2856c597 upstream.

If we exit _lgopen_prepare_attached() without setting a layout, we will
currently leak the plh_outstanding counter.

Fixes: 411ae722d1 ("pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:31 +01:00
Trond Myklebust
06f58dbc49 pNFS: Stricter ordering of layoutget and layoutreturn
commit 2c8d5fc37f upstream.

If a layout return is in progress, we should wait for it to complete,
in case the layout segment we are picking up gets returned too.

Fixes: 30cb3ee299 ("pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:30 +01:00
Trond Myklebust
ecaaad1801 pNFS: Mark layout for return if return-on-close was not sent
commit 67bbceedc9 upstream.

If the layout return-on-close failed because the layoutreturn was never
sent, then we should mark the layout for return again.

Fixes: 9c47b18cf7 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:30 +01:00
Trond Myklebust
f128de17c8 pNFS: We want return-on-close to complete when evicting the inode
commit 078000d02d upstream.

If the inode is being evicted, it should be safe to run return-on-close,
so we should do it to ensure we don't inadvertently leak layout segments.

Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:30 +01:00
Trond Myklebust
3c0f0f5f58 NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode
[ Upstream commit b6d49ecd10 ]

When returning the layout in nfs4_evict_inode(), we need to ensure that
the layout is actually done being freed before we can proceed to free the
inode itself.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:54 +01:00
Wang Qing
9f26645127 nfs: fix spellint typo in pnfs.c
Change the comment typo: "manger" -> "manager".

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-24 10:42:49 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Trond Myklebust
563c53e73b NFS: Fix flexfiles read failover
The current mirrored read failover code is correctly resetting the mirror
index between failed reads, however it is not able to actually flip the
RPC call over to the next RPC client.
The end result is that we keep resending the RPC call to the same client
over and over.

The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a
new RPC call, but we need to add the ability to pass in a mirror
index so that we always retry the next mirror in the list.

Fixes: 166bd5b889 ("pNFS/flexfiles: Fix layoutstats handling during read failovers")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12 11:20:29 -04:00
Trond Myklebust
d474f96104 NFS: Don't return layout segments that are in use
If the NFS_LAYOUT_RETURN_REQUESTED flag is set, we want to return the
layout as soon as possible, meaning that the affected layout segments
should be marked as invalid, and should no longer be in use for I/O.

Fixes: f0b429819b ("pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12 09:46:06 -04:00
Trond Myklebust
ff041727e9 NFS: Don't move layouts to plh_return_segs list while in use
If the layout segment is still in use for a read or a write, we should
not move it to the layout plh_return_segs list. If we do, we can end
up returning the layout while I/O is still in progress.

Fixes: e0b7d420f7 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-12 09:46:05 -04:00
Trond Myklebust
a19b4785d9 NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
Ensure we correctly report the stateid and status in the layoutreturn on
close tracepoint.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-05 07:26:42 -04:00
Trond Myklebust
4d8948c733 NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
If the credential returned by pnfs_prepare_layoutreturn()
does not match the credential of the RPC call, then we do
end up calling pnfs_send_layoutreturn() with that credential,
so don't free it!

Fixes: 44ea8dfce0 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19 23:53:52 -04:00
Trond Myklebust
7bcc10585b NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
We require that any outstanding layout return completes before we can
free up the inode so that the layout itself can be freed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-19 19:27:26 -04:00
Trond Myklebust
fbf4bcc9a8 NFS: Fix an ABBA spinlock issue in pnfs_update_layout()
We need to drop the inode spinlock while calling nfs4_select_rw_stateid(),
since nfs4_copy_delegation_stateid() could take the delegation lock.
Note that it is safe to do this, since all other calls to
pnfs_update_layout() for that inode will find themselves blocked by
the lock we hold on NFS_LAYOUT_FIRST_LAYOUTGET.

Fixes: fc51b1cf39 ("NFS: Beware when dereferencing the delegation cred")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-13 15:55:21 -04:00
Trond Myklebust
44ea8dfce0 NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
When we're sending a layoutreturn, ensure that we reference the
layout cred atomically with the copy of the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-03 18:29:10 -04:00
Trond Myklebust
97a728f5e2 NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
Ensure that the dereference of the layout cred is atomic with the
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-03 18:29:10 -04:00
Trond Myklebust
e1e54ab710 pNFS/flexfiles: Check the layout segment range before doing I/O
When starting to read or write with a layout segment, check that the
range matches our request.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:35 -04:00
Trond Myklebust
a9901899b6 pNFS: Add infrastructure for cleaning up per-layout commit structures
Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-27 16:34:34 -04:00
Trond Myklebust
b5fdf8418c NFSv4: Add support for CB_RECALL_ANY for flexfiles layouts
When we receive a CB_RECALL_ANY that asks us to return flexfiles
layouts, we iterate through all the layouts and look at whether or
not there are active open file descriptors that might need them
for I/O. If there are no such descriptors, we return the layouts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:30 -04:00
Trond Myklebust
cf6605d194 NFSv4: Ensure layout headers are RCU safe
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
63ec2b69e9 NFSv4: Avoid unnecessary credential references in layoutget
Layoutget is just using the credential attached to the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:29 -04:00
Trond Myklebust
59b5639490 NFSv4/pnfs: pnfs_set_layout_stateid() should update the layout cred
If the cred assigned to the layout that we're updating differs from
the one used to retrieve the new layout segment, then we need to
update the layout plh_lc_cred field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-16 08:34:28 -04:00
Trond Myklebust
3871224787 NFSv4: pnfs_roc() must use cred_fscmp() to compare creds
When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().

Fixes: a52458b48a ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-03 16:35:07 -05:00
Olga Kornievskaia
d826e5b827 NFSv4.x recover from pre-mature loss of openstateid
Ever since the commit 0e0cb35b41, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".

Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.

Fixes: 0e0cb35b41 ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:54:33 -05:00
Ben Dooks
d49dd11753 NFSv4: add declaration of current_stateid
The current_stateid is exported from nfs4state.c but not
declared in any of the headers. Add to nfs4_fs.h to
remove the following warning:

fs/nfs/nfs4state.c:80:20: warning: symbol 'current_stateid' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-18 10:36:45 +01:00
Trond Myklebust
30cb3ee299 pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
If a LAYOUTRETURN receives a reply of NFS4ERR_OLD_STATEID then assume we've
missed an update, and just bump the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:48:35 -04:00
Trond Myklebust
6109bcf713 NFSv4: Handle RPC level errors in LAYOUTRETURN
Handle RPC level errors by assuming that the RPC call was successful.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:39:56 -04:00
Trond Myklebust
078a432d1c NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
If the server sends a NFS4ERR_DELAY, then allow the caller to retry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:36:58 -04:00
Trond Myklebust
287a9c558b NFSv4: Clean up pNFS return-on-close error handling
Both close and delegreturn have identical code to handle pNFS
return-on-close. This patch refactors that code and places it
in pnfs.c

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:27:51 -04:00
Trond Myklebust
9c47b18cf7 pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
IF the server rejected our layout return with a state error such as
NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
to clear out all the remaining layout segments and mark that stateid
as invalid.

Fixes: 1c5bd76d17 ("pNFS: Enable layoutreturn operation for...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-09-20 15:17:42 -04:00
Trond Myklebust
731c74dd98 NFSv4: Report the error from nfs4_select_rw_stateid()
In pnfs_update_layout() ensure that we do report any fatal errors from
nfs4_select_rw_stateid().

Fixes: d9aba2b40d ("NFSv4: Don't use the zero stateid with layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04 22:35:40 -04:00
Trond Myklebust
d5b9216fd5 pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS
Add tracepoints to allow debugging of the event chain leading to
a pnfs fallback to doing I/O through the MDS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 15:50:28 -04:00
Trond Myklebust
58bbeab425 pnfs: Fix a problem where we gratuitously start doing I/O through the MDS
If the client has to stop in pnfs_update_layout() to wait for another
layoutget to complete, it currently exits and defaults to I/O through
the MDS if the layoutget was successful.

Fixes: d03360aaf5 ("pNFS: Ensure we return the error if someone kills...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
2019-07-18 15:33:42 -04:00
Trond Myklebust
d9aba2b40d NFSv4: Don't use the zero stateid with layoutget
The NFSv4.1 protocol explicitly forbids us from using the zero stateid
together with layoutget, so when we see that nfs4_select_rw_stateid()
is unable to return a valid delegation, lock or open stateid, then
we should initiate recovery and retry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18 14:43:52 -04:00
Trond Myklebust
2b17d725f9 NFS: Clean up writeback code
Now that the VM promises never to recurse back into the filesystem
layer on writeback, remove all the GFP_NOFS references etc from
the generic writeback code.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06 14:54:52 -04:00
Trond Myklebust
9fcd5960e8 NFS: Add a helper to return a pointer to the open context of a struct nfs_page
Add a helper for when we remove the explicit pointer to the open
context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-04-25 14:18:15 -04:00
Trond Myklebust
400417b05f pNFS: Fix a typo in pnfs_update_layout
We're supposed to wait for the outstanding layout count to go to zero,
but that got lost somehow.

Fixes: d03360aaf5 ("pNFS: Ensure we return the error if someone...")
Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-03-12 16:04:51 -04:00
Trond Myklebust
5085607d20 NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount
If a bulk layout recall or a metadata server reboot coincides with a
umount, then holding a reference to an inode is unsafe unless we
also hold a reference to the super block.

Fixes: fd9a8d7160 ("NFSv4.1: Fix bulk recall and destroy of layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-23 13:59:29 -05:00
NeilBrown
a52458b48a NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.
SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning.  A look-up of a low-level cred will
map this to a machine credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:46 -05:00
Trond Myklebust
0de43976fb NFS: Convert lookups of the open context to RCU
Reduce contention on the inode->i_lock by ensuring that we use RCU
when looking up the NFS open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:17 -04:00
Trond Myklebust
28ced9a84c pNFS: Don't allocate more pages than we need to fit a layoutget response
For the 'files' and 'flexfiles' layout types, we do not expect the reply
to be any larger than 4k. The block and scsi layout types are a little more
greedy, so we keep allocating the maximum response size for now.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
a2791d3a2c pNFS: Don't zero out the array in nfs4_alloc_pages()
We don't need a zeroed out array, since it is immediately being filled.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
d03360aaf5 pNFS: Ensure we return the error if someone kills a waiting layoutget
If someone interrupts a wait on one or more outstanding layoutgets in
pnfs_update_layout() then return the ERESTARTSYS/EINTR error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-09-14 16:24:08 -04:00