Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"

The reverted commit attempted to enable NFSD to retransmit pending
callback operations if an NFS client disconnects, but
unintentionally introduces a hazardous behavior regression if the
client becomes permanently unreachable while callback operations are
still pending.

A disconnect can occur due to network partition or if the NFS server
needs to force the NFS client to retransmit (for example, if a GSS
window under-run occurs).

Reverting the commit will make NFSD behave the same as it did in
v6.8 and before. Pending callback operations are permanently lost if
the client connection is terminated before the client receives them.

For some callback operations, this loss is not harmful.

However, for CB_RECALL, the loss means a delegation might be revoked
unnecessarily. For CB_OFFLOAD, pending COPY operations will never
complete unless the NFS client subsequently sends an OFFLOAD_STATUS
operation, which the Linux NFS client does not currently implement.

These issues still need to be addressed somehow.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This commit is contained in:
Chuck Lever 2024-04-23 17:48:43 -04:00
parent 32cf5a4eda
commit 9c8ecb9308
1 changed files with 2 additions and 18 deletions

View File

@ -986,14 +986,6 @@ static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
return queue_delayed_work(callback_wq, &cb->cb_work, 0);
}
static void nfsd4_queue_cb_delayed(struct nfsd4_callback *cb,
unsigned long msecs)
{
trace_nfsd_cb_queue(cb->cb_clp, cb);
queue_delayed_work(callback_wq, &cb->cb_work,
msecs_to_jiffies(msecs));
}
static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
{
atomic_inc(&clp->cl_cb_inflight);
@ -1502,16 +1494,8 @@ nfsd4_run_cb_work(struct work_struct *work)
clnt = clp->cl_cb_client;
if (!clnt) {
if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
nfsd41_destroy_cb(cb);
else {
/*
* XXX: Ideally, we could wait for the client to
* reconnect, but I haven't figured out how
* to do that yet.
*/
nfsd4_queue_cb_delayed(cb, 25);
}
/* Callback channel broken, or client killed; give up: */
nfsd41_destroy_cb(cb);
return;
}