Commit graph

375 commits

Author SHA1 Message Date
Fred Isaman
78746a384c pnfs: Add LAYOUTGET to OPEN of an existing file
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Trond Myklebust
29a8bfe52d pNFS: Refactor nfs4_layoutget_release()
Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c
where it can be reused by the layoutget on open code.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
2409a976a2 pnfs: Add LAYOUTGET to OPEN of a new file
This triggers when have no pre-existing inode to attach to.
The preexisting case is saved for later.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
5e36e2a941 pnfs: Change pnfs_alloc_init_layoutget_args call signature
Don't send in a layout, instead use the (possibly NULL) inode.

This is needed for LAYOUTGET attached to an OPEN where the inode is not
yet set.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
1b146fcff7 pnfs: Move nfs4_opendata into nfs4_fs.h
It will be needed now by the pnfs code.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
dacb452db8 pnfs: move allocations out of nfs4_proc_layoutget
They work better in the new alloc_init function.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Fred Isaman
587f03deb6 pnfs: refactor send_layoutget
Pull out the alloc/init part for eventual reuse by OPEN.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-05-31 15:03:11 -04:00
Trond Myklebust
9c6376ebdd pNFS: Prevent the layout header refcount going to zero in pnfs_roc()
Ensure that we hold a reference to the layout header when processing
the pNFS return-on-close so that the refcount value does not inadvertently
go to zero.

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.10+
Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
2018-03-08 12:56:31 -05:00
Scott Mayhew
ba4a76f703 nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds
Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init.  It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait.  Once
that happens, other processes accessing the inode will become stuck as
well.

Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
up correctly by calling hdr->completion_ops->completion() instead of
calling hdr->release() directly.

This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.

This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
<dwysocha@redhat.com>):

stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'

Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14 23:06:29 -05:00
Benjamin Coddington
b3dce6a2f0 pnfs/blocklayout: handle transient devices
PNFS block/SCSI layouts should gracefully handle cases where block devices
are not available when a layout is retrieved, or the block devices are
removed while the client holds a layout.

While setting up a layout segment, keep a record of an unavailable or
un-parsable block device in cache with a flag so that subsequent layouts do
not spam the server with GETDEVINFO.  We can reuse the current
NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing
the device, we will discard it and send a fresh GETDEVINFO after the
timeout, since the lookup and validation of the device occurs within the
GETDEVINFO response handling.

A lookup of a layout segment that references an unavailable device will
return a segment with the NFS_LSEG_UNAVAILABLE flag set.  This will allow
the pgio layer to mark the layout with the appropriate fail bit, which
forces subsequent IO to the MDS, and prevents spamming the server with
LAYOUTGET, LAYOUTRETURN.

Finally, when IO to a block device fails, look up the block device(s)
referenced by the pgio header, and mark them as unavailable.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2018-01-14 23:06:29 -05:00
Trond Myklebust
7380020e77 pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close
If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID,
then try to update the stateid and retry. We know that there should
be no further LAYOUTGET requests being launched.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 16:43:47 -05:00
Thomas Meyer
6089dd0d73 NFS: Fix bool initialization/comparison
Bool initializations should use true and false. Bool tests don't need
comparisons.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 16:43:43 -05:00
Elena Reshetova
2b28a7bee4 fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 13:47:59 -05:00
Elena Reshetova
eba6dd6917 fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-17 13:47:59 -05:00
Trond Myklebust
70d2f7b1ea pNFS: Use the standard I/O stateid when calling LAYOUTGET
Instead of having a private method for copying the open/delegation stateid,
use the same call that is used for standard I/O through the MDS.

Note that this means we transmit the stateid with a zero seqid, avoiding
issues with NFS4ERR_OLD_STATEID.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-11 22:19:00 -04:00
Trond Myklebust
196639ebbe NFS: Fix 2 use after free issues in the I/O code
The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9a8 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd ("nfs: remove pgio_header refcount, related cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-08 22:07:52 -04:00
Trond Myklebust
8205b9ce03 NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()
Now that we no longer hold the inode->i_lock when manipulating the
commit lists, it is safe to call pnfs_put_lseg() again.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-08-15 11:54:48 -04:00
Benjamin Coddington
08cb5b0f05 pnfs: Fix the check for requests in range of layout segment
It's possible and acceptable for NFS to attempt to add requests beyond the
range of the current pgio->pg_lseg, a case which should be caught and
limited by the pg_test operation.  However, the current handling of this
case replaces pgio->pg_lseg with a new layout segment (after a WARN) within
that pg_test operation.  That will cause all the previously added requests
to be submitted with this new layout segment, which may not be valid for
those requests.

Fix this problem by only returning zero for the number of bytes to coalesce
from pg_test for this case which allows any previously added requests to
complete on the current layout segment.  The check for requests starting
out of range of the layout segment moves to pg_init, so that the
replacement of pgio->pg_lseg will be done when the next request is added.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-24 07:55:02 -04:00
Trond Myklebust
61f454e30c pNFS: Fix a deadlock when coalescing writes and returning the layout
Consider the following deadlock:

Process P1	Process P2		Process P3
==========	==========		==========
					lock_page(page)

		lseg = pnfs_update_layout(inode)

lo = NFS_I(inode)->layout
pnfs_error_mark_layout_for_return(lo)

		lock_page(page)

					lseg = pnfs_update_layout(inode)

In this scenario,
- P1 has declared the layout to be in error, but P2 holds a reference to
  a layout segment on that inode, so the layoutreturn is deferred.
- P2 is waiting for a page lock held by P3.
- P3 is asking for a new layout segment, but is blocked waiting
  for the layoutreturn.

The fix is to ensure that pnfs_error_mark_layout_for_return() does
not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow
the latter to call LAYOUTGET so that it can make progress and unblock
P2.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02 12:35:33 -04:00
Trond Myklebust
5466d21411 pNFS: Don't clear the layout return info if there are segments to return
In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout
return info if there are new segments queued for return due to, for
instance, a race between a LAYOUTRETURN and a failed I/O attempt.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-02 12:35:33 -04:00
Trond Myklebust
1f18b82c34 pNFS: Ensure we commit the layout if it has been invalidated
If the layout is being invalidated on the server, then we must
invoke nfs_commit_inode() to ensure any commits to the DS get
cleared out.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29 11:29:30 -04:00
Trond Myklebust
37f8aa16da pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
If the attempt to write through pNFS fails, we need to use the same
failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS
flag is set or we have sufficient valid DSes, then we must retry through
pNFS

Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-29 00:02:37 -04:00
Trond Myklebust
bdebfccd0e pNFS: Ensure we check layout validity before marking it for return
pnfs_error_mark_layout_for_return needs to check that the layout is
valid before calling pnfs_set_plh_return_info().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-28 13:07:01 -04:00
Trond Myklebust
6aeafd05ec pNFS: Fix use after free issues in pnfs_do_read()
The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr
has been consumed, and so we should not be testing hdr->task.tk_status.
If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and
free hdr.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25 15:42:34 -04:00
Trond Myklebust
b3230e80a6 pNFS: Ensure we check layout segment validity in the pg_init() callback
If we have a layout segment cached in pgio->pg_lseg, we should check it
for validity before reusing it in a new RPC request. Otherwise, if we
recoalesce, we can end up looping forever.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-25 10:56:19 -04:00
Trond Myklebust
b94196888f pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layout
They are not used outside the NFSv4 module.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-20 16:53:58 -04:00
Trond Myklebust
ee6625a948 pNFS: Fix a reference leak in _pnfs_return_layout
IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit
without freeing the list of invalidated layout segments, leading
to a reference leak.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-01-26 15:50:41 -05:00
Trond Myklebust
e71708d4df pNFS: Return RW layouts on OPEN_DOWNGRADE
If the client holds no more writeable open state, and does not hold a
write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE.

We do this only for writes, since some layout drivers may require you to
also hold a read layout if you are doing a R/W workload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-19 17:29:55 -05:00
Trond Myklebust
362fb578a5 pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid
Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-05 22:52:01 -05:00
Trond Myklebust
287bd3e954 pNFS: Add a layoutreturn callback to performa layout-private setup
Add a callback to allow the flexfiles layout driver to initialise the
layout private payload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-03 13:12:16 -05:00
Trond Myklebust
4d796d751c pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturn
Cleanup to allow layout drivers to attach private data to layoutreturn,
and manage the data.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02 23:37:45 -05:00
Trond Myklebust
b85f562049 pNFS: Skip invalid stateids when doing a bulk destroy
If the layout stateid is already invalid, we have no work to do.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:51 -05:00
Trond Myklebust
29ade5db12 pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:50 -05:00
Trond Myklebust
abb3e1c877 pNFS: Don't mark the layout as freed if the last lseg is marked for return
Address another memory leak.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:50 -05:00
Trond Myklebust
4aab97327f pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturn
Ensure that the layout state bits are synced when we cache a layout
segment for layoutreturn using an appropriate call to
pnfs_set_plh_return_info.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:49 -05:00
Trond Myklebust
24408f5282 pNFS: Fix bugs in _pnfs_return_layout
We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of
whether or not there are layout segments pending.
Furthermore, we should ensure that we leave the plh_return_segs list
empty.

This patch fixes a memory leak of the layout segments on plh_return_segs.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:49 -05:00
Trond Myklebust
fe1cf9469d pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid
When the layout state is invalidated, then so is the layout segment
state, and hence we do need to clean up the state bits.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:48 -05:00
Trond Myklebust
1c5bd76d17 pNFS: Enable layoutreturn operation for return-on-close
Amend the pnfs return on close helper functions to enable sending the
layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between
CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the
client and the server to agree on whether or not there is an outstanding
layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:47 -05:00
Trond Myklebust
828ed9ec1b pNFS: Clean up - add a helper to initialise struct layoutreturn_args
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:47 -05:00
Trond Myklebust
69820d22c5 pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_roc
The layoutreturn call will take care of invalidating the layout segments
once the call is successful.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:45 -05:00
Trond Myklebust
0cdc329ec9 pNFS: Skip checking for return-on-close if the layout is invalid
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:44 -05:00
Trond Myklebust
e685d237e6 pNFS: Remove spurious wake up in pnfs_layout_remove_lseg()
There is no change to the value of NFS_LAYOUT_RETURN, so we should
not be waking up the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:43 -05:00
Trond Myklebust
2a974425e5 NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalid
Fix a potential race with CB_LAYOUTRECALL in which the server recalls the
remaining layout segments while our LAYOUTRETURN is still in transit.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:43 -05:00
Trond Myklebust
68f744797e pNFS: Do not free layout segments that are marked for return
We may want to process and transmit layout stat information for the
layout segments that are being returned, so we should defer freeing
them until after the layoutreturn has completed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:42 -05:00
Trond Myklebust
17822b207f pNFS: consolidate the different range intersection tests
Both pnfs.c and the flexfiles code have their own versions of the
range intersection testing, and the "end_offset" helper.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:41 -05:00
Trond Myklebust
ee284e35d8 pNFS: Fix race in pnfs_wait_on_layoutreturn
We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.

Fixes: 500d701f33 ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:41 -05:00
Trond Myklebust
6604b203fb pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed
If there is an I/O error, we should not call LAYOUTGET until the
LAYOUTRETURN that reports the error is complete.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:40 -05:00
Trond Myklebust
9888d837f3 pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache
If the server sends us a completely new stateid, and the client thinks
it already holds a layout, then force a retry of the LAYOUTGET after
invalidating the existing layout in order to avoid corruption due to
races.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-01 17:21:40 -05:00
Trond Myklebust
ae5a459d5f pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid
We must ensure that we don't schedule a layoutreturn if the layout stateid
has been marked as invalid.

Fixes: 2a59a04116 ("pNFS: Fix pnfs_set_layout_stateid() to clear...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:39 -05:00
Trond Myklebust
7b650994ab pNFS: Don't clear the layout stateid if a layout return is outstanding
If we no longer hold any layout segments, we're normally expected to
consider the layout stateid to be invalid. However we cannot assume this
if we're about to, or in the process of sending a layoutreturn.

Fixes: 334a8f3711 ("pNFS: Don't forget the layout stateid if...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+
2016-12-01 17:21:39 -05:00