Commit graph

3562 commits

Author SHA1 Message Date
NeilBrown
c0cbe70742 NFSD: add posix ACLs to struct nfsd_attrs
pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.

nfsd_setattr() sets the acls as appropriate.

Errors are handled as with security labels.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-08-04 10:28:03 -04:00
NeilBrown
d6a97d3f58 NFSD: add security label to struct nfsd_attrs
nfsd_setattr() now sets a security label if provided, and nfsv4 provides
it in the 'open' and 'create' paths and the 'setattr' path.
If setting the label failed (including because the kernel doesn't
support labels), an error field in 'struct nfsd_attrs' is set, and the
caller can respond.  The open/create callers clear
FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case.
The setattr caller returns the error.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
NeilBrown
93adc1e391 NFSD: set attributes when creating symlinks
The NFS protocol includes attributes when creating symlinks.
Linux does store attributes for symlinks and allows them to be set,
though they are not used for permission checking.

NFSD currently doesn't set standard (struct iattr) attributes when
creating symlinks, but for NFSv4 it does set ACLs and security labels.
This is inconsistent.

To improve consistency, pass the provided attributes into nfsd_symlink()
and call nfsd_create_setattr() to set them.

NOTE: this results in a behaviour change for all NFS versions when the
client sends non-default attributes with a SYMLINK request. With the
Linux client, the only attributes are:
	attr.ia_mode = S_IFLNK | S_IRWXUGO;
	attr.ia_valid = ATTR_MODE;
so the final outcome will be unchanged. Other clients might sent
different attributes, and if they did they probably expect them to be
honoured.

We ignore any error from nfsd_create_setattr().  It isn't really clear
what should be done if a file is successfully created, but the
attributes cannot be set.  NFS doesn't allow partial success to be
reported.  Reporting failure is probably more misleading than reporting
success, so the status is ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
NeilBrown
7fe2a71dda NFSD: introduce struct nfsd_attrs
The attributes that nfsd might want to set on a file include 'struct
iattr' as well as an ACL and security label.
The latter two are passed around quite separately from the first, in
part because they are only needed for NFSv4.  This leads to some
clumsiness in the code, such as the attributes NOT being set in
nfsd_create_setattr().

We need to keep the directory locked until all attributes are set to
ensure the file is never visibile without all its attributes.  This need
combined with the inconsistent handling of attributes leads to more
clumsiness.

As a first step towards tidying this up, introduce 'struct nfsd_attrs'.
This is passed (by reference) to vfs.c functions that work with
attributes, and is assembled by the various nfs*proc functions which
call them.  As yet only iattr is included, but future patches will
expand this.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
Jeff Layton
876c553cb4 NFSD: verify the opened dentry after setting a delegation
Between opening a file and setting a delegation on it, someone could
rename or unlink the dentry. If this happens, we do not want to grant a
delegation on the open.

On a CLAIM_NULL open, we're opening by filename, and we may (in the
non-create case) or may not (in the create case) be holding i_rwsem
when attempting to set a delegation.  The latter case allows a
race.

After getting a lease, redo the lookup of the file being opened and
validate that the resulting dentry matches the one in the open file
description.

To properly redo the lookup we need an rqst pointer to pass to
nfsd_lookup_dentry(), so make sure that is available.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
Jeff Layton
bbf936edd5 NFSD: drop fh argument from alloc_init_deleg
Currently, we pass the fh of the opened file down through several
functions so that alloc_init_deleg can pass it to delegation_blocked.
The filehandle of the open file is available in the nfs4_file however,
so there's no need to pass it in a separate argument.

Drop the argument from alloc_init_deleg, nfs4_open_delegation and
nfs4_set_delegation.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
Chuck Lever
a11ada99ce NFSD: Move copy offload callback arguments into a separate structure
Refactor so that CB_OFFLOAD arguments can be passed without
allocating a whole struct nfsd4_copy object. On my system (x86_64)
this removes another 96 bytes from struct nfsd4_copy.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:17:00 -04:00
Chuck Lever
e72f9bc006 NFSD: Add nfsd4_send_cb_offload()
Refactor for legibility.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
ad1e46c9b0 NFSD: Remove kmalloc from nfsd4_do_async_copy()
Instead of manufacturing a phony struct nfsd_file, pass the
struct file returned by nfs42_ssc_open() directly to
nfsd4_do_copy().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
3b7bf5933c NFSD: Refactor nfsd4_do_copy()
Refactor: Now that nfsd4_do_copy() no longer calls the cleanup
helpers, plumb the use of struct file pointers all the way down to
_nfsd_copy_file_range().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
478ed7b10d NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A
subsequent patch will modify one of the new call sites to avoid
the need to manufacture the phony struct nfsd_file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
24d796ea38 NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
The @src parameter is sometimes a pointer to a struct nfsd_file and
sometimes a pointer to struct file hiding in a phony struct
nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter
is always an explicit struct file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
1913cdf56c NFSD: Replace boolean fields in struct nfsd4_copy
Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
with an atomic bitop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
8ea6e2c90b NFSD: Make nfs4_put_copy() static
Clean up: All call sites are in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:59 -04:00
Chuck Lever
d314309425 NFSD: Reorder the fields in struct nfsd4_op
Pack the fields to reduce the size of struct nfsd4_op, which is used
an array in struct nfsd4_compoundargs.

sizeof(struct nfsd4_op):
Before: /* size: 672, cachelines: 11, members: 5 */
After:  /* size: 640, cachelines: 10, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
87689df694 NFSD: Shrink size of struct nfsd4_copy
struct nfsd4_copy is part of struct nfsd4_op, which resides in an
8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 1696, cachelines: 27, members: 5 */
After:  /* size: 672, cachelines: 11, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
09426ef2a6 NFSD: Shrink size of struct nfsd4_copy_notify
struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
in an 8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 2208, cachelines: 35, members: 5 */
After:  /* size: 1696, cachelines: 27, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
bb4d842722 NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
5304877936 NFSD: Fix strncpy() fortify warning
In function ‘strncpy’,
    inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3,
    inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11:
/home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation]
   52 | #define __underlying_strncpy    __builtin_strncpy
      |                                 ^
/home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’
   89 |         return __underlying_strncpy(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
99b002a1fa NFSD: Clean up nfsd4_encode_readlink()
Similar changes to nfsd4_encode_readv(), all bundled into a single
patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:58 -04:00
Chuck Lever
5e64d85c7d NFSD: Use xdr_pad_size()
Clean up: Use a helper instead of open-coding the calculation of
the XDR pad size.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
071ae99fea NFSD: Simplify starting_len
Clean-up: Now that nfsd4_encode_readv() does not have to encode the
EOF or rd_length values, it no longer needs to subtract 8 from
@starting_len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
28d5bc468e NFSD: Optimize nfsd4_encode_readv()
write_bytes_to_xdr_buf() is pretty expensive to use for inserting
an XDR data item that is always 1 XDR_UNIT at an address that is
always XDR word-aligned.

Since both the readv and splice read paths encode EOF and maxcount
values, move both to a common code path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
24c7fb8549 NFSD: Add an nfsd4_read::rd_eof field
Refactor: Make the EOF result available in the entire NFSv4 READ
path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
c738b218a2 NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Do the test_bit() once -- this reduces the number of locked-bus
operations and makes the function a little easier to read.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
ab04de60ae NFSD: Optimize nfsd4_encode_fattr()
write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.

However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
because the data item is fixed in size and the buffer destination
address is always word-aligned.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Chuck Lever
095a764b7a NFSD: Optimize nfsd4_encode_operation()
write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.
However, it is costly, and here, it is unnecessary because the
data item is fixed in size, the buffer destination address is
always word-aligned, and the destination location is already in
@p.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:57 -04:00
Jeff Layton
3a5940bfa1 nfsd: silence extraneous printk on nfsd.ko insertion
This printk pops every time nfsd.ko gets plugged in. Most kmods don't do
that and this one is not very informative. Olaf's email address seems to
be defunct at this point anyway. Just drop it.

Cc: Olaf Kirch <okir@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:56 -04:00
Dai Ngo
4271c2c088 NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Currently there is no limit on how many v4 clients are supported
by the system. This can be a problem in systems with small memory
configuration to function properly when a very large number of
clients exist that creates memory shortage conditions.

This patch enforces a limit of 1024 NFSv4 clients, including courtesy
clients, per 1GB of system memory.  When the number of the clients
reaches the limit, requests that create new clients are returned
with NFS4ERR_DELAY and the laundromat is kicked start to trim old
clients. Due to the overhead of the upcall to remove the client
record, the maximun number of clients the laundromat removes on
each run is limited to 128. This is done to ensure the laundromat
can still process the other tasks in a timely manner.

Since there is now a limit of the number of clients, the 24-hr
idle time limit of courtesy client is no longer needed and was
removed.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:56 -04:00
Dai Ngo
0926c39515 NFSD: keep track of the number of v4 clients in the system
Add counter nfs4_client_count to keep track of the total number
of v4 clients, including courtesy clients, in the system.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:56 -04:00
Dai Ngo
6867137ebc NFSD: refactoring v4 specific code to a helper in nfs4state.c
This patch moves the v4 specific code from nfsd_init_net() to
nfsd4_init_leases_net() helper in nfs4state.c

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:56 -04:00
Chuck Lever
427f5f83a3 NFSD: Ensure nf_inode is never dereferenced
The documenting comment for struct nf_file states:

/*
 * A representation of a file that has been opened by knfsd. These are hashed
 * in the hashtable by inode pointer value. Note that this object doesn't
 * hold a reference to the inode by itself, so the nf_inode pointer should
 * never be dereferenced, only used for comparison.
 */

Replace the two existing dereferences to make the comment always
true.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:56 -04:00
Chuck Lever
5e138c4a75 NFSD: NFSv4 CLOSE should release an nfsd_file immediately
The last close of a file should enable other accessors to open and
use that file immediately. Leaving the file open in the filecache
prevents other users from accessing that file until the filecache
garbage-collects the file -- sometimes that takes several seconds.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:42 -04:00
Chuck Lever
b40a283947 NFSD: Move nfsd_file_trace_alloc() tracepoint
Avoid recording the allocation of an nfsd_file item that is
immediately released because a matching item was already
inserted in the hash.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:16:07 -04:00
Chuck Lever
be0230069f NFSD: Separate tracepoints for acquire and create
These tracepoints collect different information: the create case does
not open a file, so there's no nf_file available.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:15:54 -04:00
Chuck Lever
0ec8e9d153 NFSD: Clean up unused code after rhashtable conversion
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:15:38 -04:00
Chuck Lever
ce502f81ba NFSD: Convert the filecache to use rhashtable
Enable the filecache hash table to start small, then grow with the
workload. Smaller server deployments benefit because there should
be lower memory utilization. Larger server deployments should see
improved scaling with the number of open files.

Suggested-by: Jeff Layton <jlayton@kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:14:25 -04:00
Chuck Lever
fc22945ecc NFSD: Set up an rhashtable for the filecache
Add code to initialize and tear down an rhashtable. The rhashtable
is not used yet.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:12:02 -04:00
Chuck Lever
c7b824c3d0 NFSD: Replace the "init once" mechanism
In a moment, the nfsd_file_hashtbl global will be replaced with an
rhashtable. Replace the one or two spots that need to check if the
hash table is available. We can easily reuse the SHUTDOWN flag for
this purpose.

Document that this mechanism relies on callers to hold the
nfsd_mutex to prevent init, shutdown, and purging to run
concurrently.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:12:01 -04:00
Chuck Lever
f0743c2b25 NFSD: Remove nfsd_file::nf_hashval
The value in this field can always be computed from nf_inode, thus
it is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:12:01 -04:00
Chuck Lever
cb7ec76e73 NFSD: nfsd_file_hash_remove can compute hashval
Remove an unnecessary use of nf_hashval.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:12:01 -04:00
Chuck Lever
a845511007 NFSD: Refactor __nfsd_file_close_inode()
The code that computes the hashval is the same in both callers.

To prevent them from going stale, reframe the documenting comments
to remove descriptions of the underlying hash table structure, which
is about to be replaced.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:50 -04:00
Chuck Lever
8755326399 NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
Remove an unnecessary usage of nf_hashval.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:42 -04:00
Chuck Lever
f53cef15dd NFSD: Remove lockdep assertion from unhash_and_release_locked()
IIUC, holding the hash bucket lock is needed only in
nfsd_file_unhash, and there is already a lockdep assertion there.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:41 -04:00
Chuck Lever
54f7df7094 NFSD: No longer record nf_hashval in the trace log
I'm about to replace nfsd_file_hashtbl with an rhashtable. The
individual hash values will no longer be visible or relevant, so
remove them from the tracepoints.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:29 -04:00
Chuck Lever
6df1941136 NFSD: Never call nfsd_file_gc() in foreground paths
The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.

However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:

- Every call to nfsd_file_acquire() and nfsd_file_put() results in
  an LRU walk. This has the same effect on lookup latency as long
  chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
  lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
  clients, which are the type of files the filecache is supposed to
  help.

To address those negative impacts, remove the direct calls to the
garbage collector. Subsequent patches will address maintaining
lookup efficiency as cache capacity increases.

Suggested-by: Wang Yugui <wangyugui@e16-tech.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:19 -04:00
Chuck Lever
edead3a558 NFSD: Fix the filecache LRU shrinker
Without LRU item rotation, the shrinker visits only a few items on
the end of the LRU list, and those would always be long-term OPEN
files for NFSv4 workloads. That makes the filecache shrinker
completely ineffective.

Adopt the same strategy as the inode LRU by using LRU_ROTATE.

Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:11:19 -04:00
Chuck Lever
4a0e73e635 NFSD: Leave open files out of the filecache LRU
There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:

fs/nfsd/filecache.c
 482                 ret = list_lru_walk(&nfsd_file_lru,
 483                                 nfsd_file_lru_cb,
 484                                 &head, LONG_MAX);

causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.

What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.

Address this by ensuring that open files are not kept on the LRU
list.

Reported-by: Frank van der Linden <fllinden@amazon.com>
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:08 -04:00
Chuck Lever
c46203acdd NFSD: Trace filecache LRU activity
Observe the operation of garbage collection and the lifetime of
filecache items.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
668ed92e65 NFSD: WARN when freeing an item still linked via nf_lru
Add a guardrail to prevent freeing memory that is still on a list.
This includes either a dispose list or the LRU list.

This is the sign of a bug, but this class of bugs can be detected
so that they don't endanger system stability, especially while
debugging.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
2e6c6e4c43 NFSD: Hook up the filecache stat file
There has always been the capability of exporting filecache metrics
via /proc, but it was never hooked up. Let's surface these metrics
to enable better observability of the filecache.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
8b330f7804 NFSD: Zero counters when the filecache is re-initialized
If nfsd_file_cache_init() is called after a shutdown, be sure the
stat counters are reset.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
df2aff524f NFSD: Record number of flush calls
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
94660cc19c NFSD: Report the number of items evicted by the LRU walk
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:10:07 -04:00
Chuck Lever
39f1d1ff81 NFSD: Refactor nfsd_file_lru_scan()
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:55 -04:00
Chuck Lever
3bc6d3470f NFSD: Refactor nfsd_file_gc()
Refactor nfsd_file_gc() to use the new list_lru helper.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:48 -04:00
Chuck Lever
0bac5a264d NFSD: Add nfsd_file_lru_dispose_list() helper
Refactor the invariant part of nfsd_file_lru_walk_list() into a
separate helper function.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:39 -04:00
Chuck Lever
904940e94a NFSD: Report average age of filecache items
This is a measure of how long items stay in the filecache, to help
assess how efficient the cache is.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:22 -04:00
Chuck Lever
d63293272a NFSD: Report count of freed filecache items
Surface the count of freed nfsd_file items.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:22 -04:00
Chuck Lever
29d4bdbbb9 NFSD: Report count of calls to nfsd_file_acquire()
Count the number of successful acquisitions that did not create a
file (ie, acquisitions that do not result in a compulsory cache
miss). This count can be compared directly with the reported hit
count to compute a hit ratio.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:22 -04:00
Chuck Lever
0fd244c115 NFSD: Report filecache LRU size
Surface the NFSD filecache's LRU list length to help field
troubleshooters monitor filecache issues.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:21 -04:00
Chuck Lever
ca3f9acb6d NFSD: Demote a WARN to a pr_warn()
The call trace doesn't add much value, but it sure is noisy.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:09:21 -04:00
Colin Ian King
842e00ac3a nfsd: remove redundant assignment to variable len
Variable len is being assigned a value zero and this is never
read, it is being re-assigned later. The assignment is redundant
and can be removed.

Cleans up clang scan-build warning:
fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:08:56 -04:00
Zhang Jiaming
f532c9ff10 NFSD: Fix space and spelling mistake
Add a blank space after ','.
Change 'succesful' to 'successful'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:08:56 -04:00
Chuck Lever
0513828855 NFSD: Instrument fh_verify()
Capture file handles and how they map to local inodes. In particular,
NFSv4 PUTFH uses fh_verify() so we can now observe which file handles
are the target of OPEN, LOOKUP, RENAME, and so on.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:08:56 -04:00
Jeff Layton
23ba98de6d nfsd: eliminate the NFSD_FILE_BREAK_* flags
We had a report from the spring Bake-a-thon of data corruption in some
nfstest_interop tests. Looking at the traces showed the NFS server
allowing a v3 WRITE to proceed while a read delegation was still
outstanding.

Currently, we only set NFSD_FILE_BREAK_* flags if
NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc.
NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for
COMMIT ops, where we need a writeable filehandle but don't need to
break read leases.

It doesn't make any sense to consult that flag when allocating a file
since the file may be used on subsequent calls where we do want to break
the lease (and the usage of it here seems to be reverse from what it
should be anyway).

Also, after calling nfsd_open_break_lease, we don't want to clear the
BREAK_* bits. A lease could end up being set on it later (more than
once) and we need to be able to break those leases as well.

This means that the NFSD_FILE_BREAK_* flags now just mirror
NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just
drop those flags and unconditionally call nfsd_open_break_lease every
time.

Reported-by: Olga Kornieskaia <kolga@netapp.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360
Fixes: 65294c1f2c (nfsd: add a new struct file caching facility to nfsd)
Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca18d NFSD: Clean up the show_nf_flags() macro
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:08:56 -04:00
Linus Torvalds
a24a6c05ff Notable regression fixes:
- Enable SETATTR(time_create) to fix regression with Mac OS clients
 - Fix a lockd crasher and broken NLM UNLCK behavior
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLQSzcACgkQM2qzM29m
 f5d5Sg/9Hln5dSnIpgJKyhIy+ic5QB29VL9RWZ7JtVCUsWsaVObkwobmne7UWAms
 X9eehGG2gYTsLcXH9oIGeLRrjbWGYbdg3B6tGMFeQZgI6NeSvBS9AeDzyG+rRTyL
 2czvODorTSfYokbsHNE3kWh0Tohkvd/4wV1y5GUcLYKUj6ZO21xsSbEj7381nriJ
 9nkudqgJTfCINxqRiIRBT+iCNfjvAc8EhfxP+ThcmcjJH9WeWi9kR0ZF050jloO/
 JH2RZRNN+crNdg77HlXyc/jcr2G/wv/IR5G40LqeLgrJbvPAKRGAltzeyErWY7eB
 6jsWf7/vil6+PfuUaPskt9h60hGe3DeBzraK4gFcEKci6YSqbE7h1GrUtM6/2E0a
 5HM2mYAZfYBZyT2HitZCoFXOvGssM6543QXW2noMo/6eAWFPfdzjlN55FLNssqkq
 8fPMLoiBElOvVLCYvi91CpnqSsmR4WW44dwY/d+H7DG4QnCurPGMH2OFHX3epk5J
 kJzhT63GuemgwrxZICT8KQ6aVWHreHRO2LTKv9yjhEohsqkCf3eV/xlWKzcRFXLj
 HQIBfQf1UIOonxOkdGUG1w0Es/Wz7btQehc/CwaMi74VUjZUp5qSMjCUWkckyALc
 3k1ojcODEyR/5rOepQF4kRzOzSbm7TETwhF7YG0TRPugIGri/sQ=
 =T8gK
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable regression fixes:

   - Enable SETATTR(time_create) to fix regression with Mac OS clients

   - Fix a lockd crasher and broken NLM UNLCK behavior"

* tag 'nfsd-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  lockd: fix nlm_close_files
  lockd: set fl_owner when unlocking files
  NFSD: Decode NFSv4 birth time attribute
2022-07-14 12:29:43 -07:00
Chuck Lever
5b2f3e0777 NFSD: Decode NFSv4 birth time attribute
NFSD has advertised support for the NFSv4 time_create attribute
since commit e377a3e698 ("nfsd: Add support for the birth time
attribute").

Igor Mammedov reports that Mac OS clients attempt to set the NFSv4
birth time attribute via OPEN(CREATE) and SETATTR if the server
indicates that it supports it, but since the above commit was
merged, those attempts now fail.

Table 5 in RFC 8881 lists the time_create attribute as one that can
be both set and retrieved, but the above commit did not add server
support for clients to provide a time_create attribute. IMO that's
a bug in our implementation of the NFSv4 protocol, which this commit
addresses.

Whether NFSD silently ignores the new birth time or actually sets it
is another matter. I haven't found another filesystem service in the
Linux kernel that enables users or clients to modify a file's birth
time attribute.

This commit reflects my (perhaps incorrect) understanding of whether
Linux users can set a file's birth time. NFSD will now recognize a
time_create attribute but it ignores its value. It clears the
time_create bit in the returned attribute bitmask to indicate that
the value was not used.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Fixes: e377a3e698 ("nfsd: Add support for the birth time attribute")
Tested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-11 13:52:22 -04:00
Roman Gushchin
e33c267ab7 mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects.  For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g.  for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers.  register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
    <subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
  $ cd /sys/kernel/debug/shrinker/
  $ ls
    dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
    mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
    mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
    rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
    sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
    sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
    sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
    sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
    sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
    sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
    sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
    sb-dax-11           sb-proc-45       sb-tmpfs-35
    sb-debugfs-7        sb-proc-46       sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
  Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
  Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-03 18:08:40 -07:00
Linus Torvalds
69cb6c6556 Notable regression fixes:
- Fix NFSD crash during NFSv4.2 READ_PLUS operation
 - Fix incorrect status code returned by COMMIT operation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmLAhFIACgkQM2qzM29m
 f5fK2w//TyUAzwoTZQEgmFbxNhXgUYC6uwWPqTCLMieStKtPwT8GsuXviY34QziF
 2vER0NW9Am2GQyL3gtiYFM07OoHhQ4gr86ltaeIHAHhcwm3eIs879ARwEsyN6eDR
 +RDpqnONwtg+yaepfCMc4Bki9Jex+mmoXro86nFPmH+TDM5QiIRY0ncBWSLVWvYT
 YciAgvL6vfo2G79NYOzohoTb15ydotmy6m9H70nN+a2l6bKOIT8cF4S8lZETJZXA
 Rlj+R0eE0iXZTtp7VsAfVAHHfOzexGJjE85hpVzGiZWbxe6o0WNmBpoHICXs9VoP
 WRkq6vBC9P6wJ7EOu84/SRlR/rQaxfrYB7beqTC2kO6t5Ka5/xpJMKZOhfukCMV+
 7+uPAUnSIRqpHG0hWm1kPto00bfXm9fMETQbOEw7UQ9dH/322iOJnXwy03ZEXtJq
 9G0k+yNcydoIs2g5OpNrw4f/wTQcdlbf1ZA5O9dAsxwS1ZTNpKUiE0Sd0Ez/0VIJ
 t5DZFWH44rs/60avxRYxtw965gNugTxb1YiKdXObvbFGf+xYtH0zePce++4kTJlE
 t886stLcHbuPYs4/XItwTxLMSnDM5UR22Icho7ElRHvPiWjZmc6o2fxCkWTzCOE2
 ylZowLFMYZ/KbrP5mcqLARKEP6oFJERXt/U8U0CWeVbZomy9GVY=
 =z5W8
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable regression fixes:

   - Fix NFSD crash during NFSv4.2 READ_PLUS operation

   - Fix incorrect status code returned by COMMIT operation"

* tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix READ_PLUS crasher
  NFSD: restore EINVAL error translation in nfsd_commit()
2022-07-02 11:20:56 -07:00
Amir Goldstein
868f9f2f8e vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.

Before commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems.  After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.

Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.

Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().

This patch restores some cross-filesystem copy restrictions that existed
prior to commit 5dae222a5f ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().

Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user.  Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e.  nfsv3).

Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb.  This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().

nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.

fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.

Fixes: 5dae222a5f ("vfs: allow copy_file_range to copy across devices")
Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
Link: https://lore.kernel.org/linux-fsdevel/20210630161320.29006-1-lhenriques@suse.de/
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Fixes: 64bf5ff58d ("vfs: no fallback for ->copy_file_range")
Link: https://lore.kernel.org/linux-fsdevel/20f17f64-88cb-4e80-07c1-85cb96c83619@windriver.com/
Reported-by: He Zhe <zhe.he@windriver.com>
Tested-by: Namjae Jeon <linkinjeon@kernel.org>
Tested-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-30 15:16:38 -07:00
Alexey Khoroshilov
8a9ffb8c85 NFSD: restore EINVAL error translation in nfsd_commit()
commit 555dbf1a9a ("nfsd: Replace use of rwsem with errseq_t")
incidentally broke translation of -EINVAL to nfserr_notsupp.
The patch restores that.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: 555dbf1a9a ("nfsd: Replace use of rwsem with errseq_t")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-06-27 10:33:05 -04:00
Linus Torvalds
0885eacdc8 Notable changes:
- There is now a backup maintainer for NFSD
 
 Notable fixes:
 - Prevent array overruns in svc_rdma_build_writes()
 - Prevent buffer overruns when encoding NFSv3 READDIR results
 - Fix a potential UAF in nfsd_file_put()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKjhbsACgkQM2qzM29m
 f5cgig/6A9gC2c9v4lR2fH6ufiCWvJBfuVaBbToubwktJHaDLvqH56JcvS3s/gKL
 PKGmbQTI/6lgmVgJqQSxJUnfe6wzHx8G1MdjlZEIwi3pUeiV+LpiprZz9TOhgYYV
 YDnXRGhb4wKOm75w8rb6X8k106XdKwdBaRQwb88FDawffWoEY0XNYrlmNmmWi8To
 ELlOlIwRCBbKJoJ6yEEWQRrBuVBXapbsn29tipZXbdo58g+vL0yDQq9s97b0mHhi
 C2apAN2+k18FiBJsA7b7pW1l/P6k9FNEeetvgWyN8OSMpPNmt0vz1HvKaIstPgg1
 BX6rgWe5eQBFEk2KNvSGHrV3R+wAp7jeuVpHUMjxXvzmfj4exJV/H8lu+qZJNDGN
 ybCJatomR4APFxk+s1kptlzNo7zfyPz15L80HmWIngYJ/lrBOoKPIIi3bwPQcBwW
 q2Rc+SlvpqbJvEcomgF/lqQN6inmx44J+KpOSA/S8qSIdSkz0iaZsDahFxgZNe82
 h+X/i1maRtnSIvWdGMR7O6kEFT5jky35WlTv/VutTOsUwA4mUU9vZUnufBBHJH07
 nOdLMi/QS/O5GOnlyegrODtN75wi+IeKt+WMNmnN+JB8Tsg0kZwjOsc/dbQfyJqP
 PrQJ5AUP0TMm90B1873z8yKmhWeXtB71vgAI/d53aadBG4ZEstU=
 =6ugk
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable changes:

   - There is now a backup maintainer for NFSD

  Notable fixes:

   - Prevent array overruns in svc_rdma_build_writes()

   - Prevent buffer overruns when encoding NFSv3 READDIR results

   - Fix a potential UAF in nfsd_file_put()"

* tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_commit_encode()
  SUNRPC: Optimize xdr_reserve_space()
  SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
  SUNRPC: Trap RDMA segment overflows
  NFSD: Fix potential use-after-free in nfsd_file_put()
  MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
2022-06-10 17:28:43 -07:00
Chuck Lever
b6c71c66b0 NFSD: Fix potential use-after-free in nfsd_file_put()
nfsd_file_put_noref() can free @nf, so don't dereference @nf
immediately upon return from nfsd_file_put_noref().

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 999397926a ("nfsd: Clean up nfsd_file_put()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-06-02 13:05:58 -04:00
Linus Torvalds
6d29d7fe4f NFSD 5.19 Release Notes
We introduce "courteous server" in this release. Previously NFSD
 would purge open and lock state for an unresponsive client after
 one lease period (typically 90 seconds). Now, after one lease
 period, another client can open and lock those files and the
 unresponsive client's lease is purged; otherwise if the unrespon-
 sive client's open and lock state is uncontended, the server retains
 that open and lock state for up to 24 hours, allowing the client's
 workload to resume after a lengthy network partition.
 
 A longstanding issue with NFSv4 file creation is also addressed.
 Previously a file creation can fail internally, returning an error
 to the client, but leave the newly created file in place as an
 artifact. The file creation code path has been reorganized so that
 internal failures and race conditions are less likely to result in
 an unwanted file creation.
 
 A fault injector has been added to help exercise paths that are run
 during kernel metadata cache invalidation. These caches contain
 information maintained by user space about exported filesystems.
 Many of our test workloads do not trigger cache invalidation.
 
 There is one patch that is needed to support PREEMPT_RT and a fix
 for an ancient "sleep while spin-locked" splat that seems to have
 become easier to hit since v5.18-rc3.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
 f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
 strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
 N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
 7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
 XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
 tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
 fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
 oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
 HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
 WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
 hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
 =phE1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "We introduce 'courteous server' in this release. Previously NFSD would
  purge open and lock state for an unresponsive client after one lease
  period (typically 90 seconds). Now, after one lease period, another
  client can open and lock those files and the unresponsive client's
  lease is purged; otherwise if the unresponsive client's open and lock
  state is uncontended, the server retains that open and lock state for
  up to 24 hours, allowing the client's workload to resume after a
  lengthy network partition.

  A longstanding issue with NFSv4 file creation is also addressed.
  Previously a file creation can fail internally, returning an error to
  the client, but leave the newly created file in place as an artifact.
  The file creation code path has been reorganized so that internal
  failures and race conditions are less likely to result in an unwanted
  file creation.

  A fault injector has been added to help exercise paths that are run
  during kernel metadata cache invalidation. These caches contain
  information maintained by user space about exported filesystems. Many
  of our test workloads do not trigger cache invalidation.

  There is one patch that is needed to support PREEMPT_RT and a fix for
  an ancient 'sleep while spin-locked' splat that seems to have become
  easier to hit since v5.18-rc3"

* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
  NFSD: nfsd_file_put() can sleep
  NFSD: Add documenting comment for nfsd4_release_lockowner()
  NFSD: Modernize nfsd4_release_lockowner()
  NFSD: Fix possible sleep during nfsd4_release_lockowner()
  nfsd: destroy percpu stats counters after reply cache shutdown
  nfsd: Fix null-ptr-deref in nfsd_fill_super()
  nfsd: Unregister the cld notifier when laundry_wq create failed
  SUNRPC: Use RMW bitops in single-threaded hot paths
  NFSD: Clean up the show_nf_flags() macro
  NFSD: Trace filecache opens
  NFSD: Move documenting comment for nfsd4_process_open2()
  NFSD: Fix whitespace
  NFSD: Remove dprintk call sites from tail of nfsd4_open()
  NFSD: Instantiate a struct file when creating a regular NFSv4 file
  NFSD: Clean up nfsd_open_verified()
  NFSD: Remove do_nfsd_create()
  NFSD: Refactor NFSv4 OPEN(CREATE)
  NFSD: Refactor NFSv3 CREATE
  NFSD: Refactor nfsd_create_setattr()
  NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
  ...
2022-05-26 20:52:24 -07:00
Chuck Lever
08af54b3e5 NFSD: nfsd_file_put() can sleep
Now that there are no more callers of nfsd_file_put() that might
hold a spin lock, ensure the lockdep infrastructure can catch
newly introduced calls to nfsd_file_put() made while a spinlock
is held.

Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracle.com/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-26 10:50:51 -04:00
Chuck Lever
043862b09c NFSD: Add documenting comment for nfsd4_release_lockowner()
And return explicit nfserr values that match what is documented in the
new comment / API contract.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-26 10:50:50 -04:00
Chuck Lever
bd8fdb6e54 NFSD: Modernize nfsd4_release_lockowner()
Refactor: Use existing helpers that other lock operations use. This
change removes several automatic variables, so re-organize the
variable declarations for readability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-26 10:50:50 -04:00
Chuck Lever
ce3c4ad7f4 NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd4_release_lockowner() holds clp->cl_lock when it calls
check_for_locks(). However, check_for_locks() calls nfsd_file_get()
/ nfsd_file_put() to access the backing inode's flc_posix list, and
nfsd_file_put() can sleep if the inode was recently removed.

Let's instead rely on the stateowner's reference count to gate
whether the release is permitted. This should be a reliable
indication of locks-in-use since file lock operations and
->lm_get_owner take appropriate references, which are released
appropriately when file locks are removed.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
2022-05-26 10:50:49 -04:00
Julian Schroeder
fd5e363eac nfsd: destroy percpu stats counters after reply cache shutdown
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is
tracked via a percpu counter. In the current code the percpu counter
is destroyed before. If any pending cache is still present,
percpu_counter_add is called with a percpu counter==NULL. This causes
a kernel crash.
The solution is to destroy the percpu counter after the cache is freed.

Fixes: e567b98ce9 (“nfsd: protect concurrent access to nfsd stats counters”)
Signed-off-by: Julian Schroeder <jumaco@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 15:22:17 -04:00
Zhang Xiaoxu
6f6f84aa21 nfsd: Fix null-ptr-deref in nfsd_fill_super()
KASAN report null-ptr-deref as follows:

  BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd]
  Write of size 8 at addr 000000000000005d by task a.out/852

  CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   kasan_report+0xab/0x120
   ? nfsd_mkdir+0x71/0x1c0 [nfsd]
   ? nfsd_fill_super+0xc6/0xe0 [nfsd]
   nfsd_fill_super+0xc6/0xe0 [nfsd]
   ? nfsd_mkdir+0x1c0/0x1c0 [nfsd]
   get_tree_keyed+0x8e/0x100
   vfs_get_tree+0x41/0xf0
   __do_sys_fsconfig+0x590/0x670
   ? fscontext_read+0x180/0x180
   ? anon_inode_getfd+0x4f/0x70
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This can be reproduce by concurrent operations:
	1. fsopen(nfsd)/fsconfig
	2. insmod/rmmod nfsd

Since the nfsd file system is registered before than nfsd_net allocated,
the caller may get the file_system_type and use the nfsd_net before it
allocated, then null-ptr-deref occurred.

So init_nfsd() should call register_filesystem() last.

Fixes: bd5ae9288d ("nfsd: register pernet ops last, unregister first")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Zhang Xiaoxu
62fdb65edb nfsd: Unregister the cld notifier when laundry_wq create failed
If laundry_wq create failed, the cld notifier should be unregistered.

Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
28df098881 SUNRPC: Use RMW bitops in single-threaded hot paths
I noticed CPU pipeline stalls while using perf.

Once an svc thread is scheduled and executing an RPC, no other
processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are
not needed outside the svc thread scheduler.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
bb283ca18d NFSD: Clean up the show_nf_flags() macro
The flags are defined using C macros, so TRACE_DEFINE_ENUM is
unnecessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
0122e88211 NFSD: Trace filecache opens
Instrument calls to nfsd_open_verified() to get a sense of the
filecache hit rate.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
7e2ce0cc15 NFSD: Move documenting comment for nfsd4_process_open2()
Clean up nfsd4_open() by converting a large comment at the only
call site for nfsd4_process_open2() to a kerneldoc comment in
front of that function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
26320d7e31 NFSD: Fix whitespace
Clean up: Pull case arms back one tab stop to conform every other
switch statement in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
f67a16b147 NFSD: Remove dprintk call sites from tail of nfsd4_open()
Clean up: These relics are not likely to benefit server
administrators.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
fb70bf124b NFSD: Instantiate a struct file when creating a regular NFSv4 file
There have been reports of races that cause NFSv4 OPEN(CREATE) to
return an error even though the requested file was created. NFSv4
does not provide a status code for this case.

To mitigate some of these problems, reorganize the NFSv4
OPEN(CREATE) logic to allocate resources before the file is actually
created, and open the new file while the parent directory is still
locked.

Two new APIs are added:

+ Add an API that works like nfsd_file_acquire() but does not open
the underlying file. The OPEN(CREATE) path can use this API when it
already has an open file.

+ Add an API that is kin to dentry_open(). NFSD needs to create a
file and grab an open "struct file *" atomically. The
alloc_empty_file() has to be done before the inode create. If it
fails (for example, because the NFS server has exceeded its
max_files limit), we avoid creating the file and can still return
an error to the NFS client.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: JianHong Yin <jiyin@redhat.com>
2022-05-23 11:06:29 -04:00
Chuck Lever
f4d84c5264 NFSD: Clean up nfsd_open_verified()
Its only caller always passes S_IFREG as the @type parameter. As an
additional clean-up, add a kerneldoc comment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:25 -04:00
Chuck Lever
1c388f2775 NFSD: Remove do_nfsd_create()
Now that its two callers have their own version-specific instance of
this function, do_nfsd_create() is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:25 -04:00
Chuck Lever
254454a5aa NFSD: Refactor NFSv4 OPEN(CREATE)
Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:25 -04:00
Chuck Lever
df9606abdd NFSD: Refactor NFSv3 CREATE
The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to
diverge such that it makes sense to split do_nfsd_create() into one
version for NFSv3 and one for NFSv4.

As a first step, copy do_nfsd_create() to nfs3proc.c and remove
NFSv4-specific logic.

One immediate legibility benefit is that the logic for handling
NFSv3 createhow is now quite straightforward. NFSv4 createhow
has some subtleties that IMO do not belong in generic code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:24 -04:00
Chuck Lever
5f46e950c3 NFSD: Refactor nfsd_create_setattr()
I'd like to move do_nfsd_create() out of vfs.c. Therefore
nfsd_create_setattr() needs to be made publicly visible.

Note that both call sites in vfs.c commit both the new object and
its parent directory, so just combine those common metadata commits
into nfsd_create_setattr().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:24 -04:00
Chuck Lever
14ee45b70d NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
Clean up: The "out" label already invokes fh_drop_write().

Note that fh_drop_write() is already careful not to invoke
mnt_drop_write() if either it has already been done or there is
nothing to drop. Therefore no change in behavior is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:24 -04:00
Chuck Lever
e61568599c NFSD: Clean up nfsd3_proc_create()
As near as I can tell, mode bit masking and setting S_IFREG is
already done by do_nfsd_create() and vfs_create(). The NFSv4 path
(do_open_lookup), for example, does not bother with this special
processing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-20 13:18:24 -04:00
Dai Ngo
e9488d5ae1 NFSD: Show state of courtesy client in client info
Update client_info_show to show state of courtesy client
and seconds since last renew.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:40 -04:00
Dai Ngo
27431affb0 NFSD: add support for lock conflict to courteous server
This patch allows expired client with lock state to be in COURTESY
state. Lock conflict with COURTESY client is resolved by the fs/lock
code using the lm_lock_expirable and lm_expire_lock callback in the
struct lock_manager_operations.

If conflict client is in COURTESY state, set it to EXPIRABLE and
schedule the laundromat to run immediately to expire the client. The
callback lm_expire_lock waits for the laundromat to flush its work
queue before returning to caller.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:39 -04:00
Dai Ngo
d76cc46b37 NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
This patch moves create/destroy of laundry_wq from nfs4_state_start
and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent
the laundromat from being freed while a thread is processing a
conflicting lock.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:39 -04:00
Dai Ngo
3d69427151 NFSD: add support for share reservation conflict to courteous server
This patch allows expired client with open state to be in COURTESY
state. Share/access conflict with COURTESY client is resolved by
setting COURTESY client to EXPIRABLE state, schedule laundromat
to run and returning nfserr_jukebox to the request client.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:39 -04:00
Dai Ngo
66af257999 NFSD: add courteous server support for thread with only delegation
This patch provides courteous server support for delegation only.
Only expired client with delegation but no conflict and no open
or lock state is allowed to be in COURTESY state.

Delegation conflict with COURTESY/EXPIRABLE client is resolved by
setting it to EXPIRABLE, queue work for the laundromat and return
delay to the caller. Conflict is resolved when the laudromat runs
and expires the EXIRABLE client while the NFS client retries the
OPEN request. Local thread request that gets conflict is doing the
retry in _break_lease.

Client in COURTESY or EXPIRABLE state is allowed to reconnect and
continues to have access to its state. Access to the nfs4_client by
the reconnecting thread and the laundromat is serialized via the
client_lock.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:39 -04:00
Chuck Lever
91e23b1c39 NFSD: Clean up nfsd_splice_actor()
nfsd_splice_actor() checks that the page being spliced does not
match the previous element in the svc_rqst::rq_pages array. We
believe this is to prevent a double put_page() in cases where the
READ payload is partially contained in the xdr_buf's head buffer.

However, the NFSD READ proc functions no longer place any part of
the READ payload in the head buffer, in order to properly support
NFS/RDMA READ with Write chunks. Therefore, simplify the logic in
nfsd_splice_actor() to remove this unnecessary check.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19 12:25:38 -04:00
Amir Goldstein
b8962a9d8c nfsd: use fsnotify group lock helpers
Before commit 9542e6a643 ("nfsd: Containerise filecache laundrette")
nfsd would close open files in direct reclaim context and that could
cause a deadlock when fsnotify mark allocation went into direct reclaim
and nfsd shrinker tried to free existing fsnotify marks.

To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS
flag on nfsd fsnotify group to prevent going into direct reclaim from
fsnotify_add_inode_mark().

Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:37:31 +02:00
Amir Goldstein
867a448d58 fsnotify: pass flags argument to fsnotify_alloc_group()
Add flags argument to fsnotify_alloc_group(), define and use the flag
FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper
fsnotify_alloc_user_group() to indicate user allocation.

Although the flag FSNOTIFY_GROUP_USER is currently not used after group
allocation, we store the flags argument in the group struct for future
use of other group flags.

Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-04-25 14:37:12 +02:00
Linus Torvalds
c1488c9751 NFSD bug fixes for 5.18-rc:
- Fix a write performance regression
 - Fix crashes during request deferral on RDMA transports
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmJVnIgACgkQM2qzM29m
 f5cdQA//SS2pbiAcjePL6zuVrd/eeALWRJIdCj8fP9eQ4dfi7OM0xQE3qHyLDC3n
 2fjciUSnoE+lIYiRXD7Ii+GqYThSrrxlR7aCT+vrIAL/ksnzTeUiUSuEihPOOYjn
 ZEz9vkP4wlCTFh4SdeBoETEjksvRU3ql37l4S04nxxgidpY/NkF5wZH5iUlV8Z14
 Q8OxJJk1tFJARPz6RqGrRkMws24NhYzeuXQktVA+AOoKjUeQSp4bN1ZSCJv3eX/E
 EjAPDFMYVdenp8OY9RJoP3Xpxb6e1mv5flXYQa7YpJUVH3AccMJ8aKuFadjGfGNS
 2tEzoipJR0fmxkxpDX1g0Bzm8fAIc467l4tskVzUxLnqNS/FjXv2P85h8p9U/R06
 BWbF5CxRu89h1tZ/ZIh4lfw/Ro94XvYaYCDeu9V1TndP+QroNDDY9Ypl13+uy6uS
 zLwEEo5nMUbc7FVmT7UidHqeEukwbNzmXEXYrgQD5hRaT9L+85R0L5/Y+gi+ODDK
 7SKu1Bomi3WN7WKzjvZspHMivVT6JNy/ngHlKSYkWYl/dJzMy5I4Z4sNRVCHKoKX
 17SpKHfKkZAt5oS54dZ40O1PhICZTMclAB7Vb8bs/yggrHTsqwqf21v8FTJW79K9
 MjnYigEWgwi62GC5WdZr5LqAkqRHi2K2rLnFwVSLtZvpb2sxPdM=
 =3AeS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12 14:23:19 -10:00
Trond Myklebust
999397926a nfsd: Clean up nfsd_file_put()
Make it a little less racy, by removing the refcount_read() test. Then
remove the redundant 'is_hashed' variable.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:59 -04:00
Trond Myklebust
6b8a94332e nfsd: Fix a write performance regression
The call to filemap_flush() in nfsd_file_put() is there to ensure that
we clear out any writes belonging to a NFSv3 client relatively quickly
and avoid situations where the file can't be evicted by the garbage
collector. It also ensures that we detect write errors quickly.

The problem is this causes a regression in performance for some
workloads.

So try to improve matters by deferring writeback until we're ready to
close the file, and need to detect errors so that we can force the
client to resend.

Tested-by: Jan Kara <jack@suse.cz>
Fixes: b6669305d3 ("nfsd: Reduce the number of calls to nfsd_file_gc()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:57 -04:00
Haowen Bai
5f7b839d47 SUNRPC: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.  This fixes the following warnings from coccicheck:

./fs/nfsd/nfs2acl.c:289:9-10: WARNING: return of 0/1 in function
'nfsaclsvc_encode_accessres' with return type bool
./fs/nfsd/nfs2acl.c:252:9-10: WARNING: return of 0/1 in function
'nfsaclsvc_encode_getaclres' with return type bool

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-27 23:25:52 -04:00
Linus Torvalds
9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds
14705fda8f New features:
- NFSv3 support in NFSD is now always built
 - Added NFSD support for the NFSv4 birth-time file attribute
 - Added support for storing and displaying sockaddrs in trace points
 - NFSD now recognizes RPC_AUTH_TLS probes
 
 Performance improvements:
 - Optimized the svc transport enqueuing mechanism
 - Added micro-optimizations for the duplicate reply cache
 
 Notable bug fixes:
 - Allocation of the NFSD file cache hash table is more reliable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmI3XNkACgkQM2qzM29m
 f5dNERAAqJ/nzfVp2H5BKLszJ7p/s13wFqbW719Rzymzl6/tUUHOqIsVdBsJsa/b
 BdZQLLDwa6ZB5zOAnC6FosRKYu4lwixOOf94pC6a9ZDD/glYVKF8mZG+RZXPAy16
 g3JUOi/bcyHXv0ZUhbv7DqW+HHM+owPP4vlNJ9ChiiLr/Xdp8NBKj+4Qtn/wcAo+
 Xuvx7fU/5Mbemh+dd5mWker4afHvpt9V6U6s04m5LiTPPnHVnxmeyekJGUCOY0QO
 cm/6SPNDqyn/VEfM/SRxEnLE9QcHRhZo/4PKRGF4wYolcviIogbILE1M7Ig/r/Gv
 6Du2kcRAhyZ3zgWnu799Ivn3Q6IrVjxZwqmsi7YHURTwYKyZtxYsUk0MCBcpnxrE
 WyTS2onpElbMop3viKCnQdpIetbsHnUNg3udUV6ugbdCbnZuhUw5B/d6x0o8ZWDE
 C0f+jnX+GnBstn0vkcj2H0+VQTd5hUJtXMrooI42ODJoboQRZcmePwoXjqCmw3sy
 PXTxLZC/5+4zNHGUuz4Pq4V7FKr4nHhDzaW4ZDO3mILx4ahceotulY1B/yoBUu8o
 /LAhu2kJ6nFQkmpzdrGzPeOstgJYHm9CaitRvMzg+NJxEAJdebypdQDbX5iNpgfW
 MDXH4n8eIqroTlQ/mQYEV0EbC7BaTqSCL6rQdcrcFUPu9n4Fcno=
 =5nac
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "New features:

   - NFSv3 support in NFSD is now always built

   - Added NFSD support for the NFSv4 birth-time file attribute

   - Added support for storing and displaying sockaddrs in trace points

   - NFSD now recognizes RPC_AUTH_TLS probes

  Performance improvements:

   - Optimized the svc transport enqueuing mechanism

   - Added micro-optimizations for the duplicate reply cache

  Notable bug fixes:

   - Allocation of the NFSD file cache hash table is more reliable"

* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
  nfsd: fix using the correct variable for sizeof()
  nfsd: use correct format characters
  NFSD: prevent integer overflow on 32 bit systems
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
  NFSD: Fix nfsd_breaker_owns_lease() return values
  NFSD: Clean up _lm_ operation names
  arch: Remove references to CONFIG_NFSD_V3 in the default configs
  NFSD: Remove CONFIG_NFSD_V3
  nfsd: more robust allocation failure handling in nfsd_file_cache_init
  SUNRPC: Teach server to recognize RPC_AUTH_TLS
  NFSD: Move svc_serv_ops::svo_function into struct svc_serv
  NFSD: Remove svc_serv_ops::svo_module
  SUNRPC: Remove svc_shutdown_net()
  SUNRPC: Rename svc_close_xprt()
  SUNRPC: Rename svc_create_xprt()
  SUNRPC: Remove svo_shutdown method
  SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
  SUNRPC: Remove the .svo_enqueue_xprt method
  SUNRPC: Record endpoint information in trace log
  ...
2022-03-22 10:29:51 -07:00
Linus Torvalds
616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Matthew Wilcox (Oracle)
cbcc268bb1 fs: Move many prototypes to pagemap.h
These functions are page cache functionality and don't need to be
declared in fs.h.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
2022-03-21 12:59:02 -04:00
Jakob Koschel
4fc5f53465 nfsd: fix using the correct variable for sizeof()
While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the destination.

Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-20 12:49:38 -04:00
Bill Wendling
4e1b04af4f nfsd: use correct format characters
When compiling with -Wformat, clang emits the following warnings:

fs/nfsd/flexfilelayout.c:120:27: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
                         "%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
                             ~~~~              ^~~~~~~~~
                             %d
fs/nfsd/flexfilelayout.c:120:38: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
                         "%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
                                  ~~~~                    ^~~~~~~~~~~
                                  %d

The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Haynes <loghyr@hammerspace.com>
2022-03-17 19:47:38 -04:00
Dan Carpenter
184416d4b9 NFSD: prevent underflow in nfssvc_decode_writeargs()
Smatch complains:

	fs/nfsd/nfsxdr.c:341 nfssvc_decode_writeargs()
	warn: no lower bound on 'args->len'

Change the type to unsigned to prevent this issue.

Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-15 09:35:56 -04:00
Chuck Lever
50719bf344 NFSD: Fix nfsd_breaker_owns_lease() return values
These have been incorrect since the function was introduced.

A proper kerneldoc comment is added since this function, though
static, is part of an external interface.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11 10:25:16 -05:00
Chuck Lever
35aff0678f NFSD: Clean up _lm_ operation names
The common practice is to name function instances the same as the
method names, but with a uniquifying prefix. Commit aef9583b23
("NFSD: Get reference of lockowner when coping file_lock") missed
this -- the new function names should both have been of the form
"nfsd4_lm_*".

Before more lock manager operations are added in NFSD, rename these
two functions for consistency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11 10:25:16 -05:00
Chuck Lever
5f9a62ff7d NFSD: Remove CONFIG_NFSD_V3
Eventually support for NFSv2 in the Linux NFS server is to be
deprecated and then removed.

However, NFSv2 is the "always supported" version that is available
as soon as CONFIG_NFSD is set.  Before NFSv2 support can be removed,
we need to choose a different "always supported" version.

This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported,
as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become
the only "always supported" NFS version.

The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-11 10:25:14 -05:00
Amir Goldstein
4d2eeafecd nfsd: more robust allocation failure handling in nfsd_file_cache_init
The nfsd file cache table can be pretty large and its allocation
may require as many as 80 contigious pages.

Employ the same fix that was employed for similar issue that was
reported for the reply cache hash table allocation several years ago
by commit 8f97514b42 ("nfsd: more robust allocation failure handling
in nfsd_reply_cache_init").

Fixes: 65294c1f2c ("nfsd: add a new struct file caching facility to nfsd")
Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
37902c6313 NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
f49169c97f NFSD: Remove svc_serv_ops::svo_module
struct svc_serv_ops is about to be removed.

Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.

A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.

Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
c7d7ec8f04 SUNRPC: Remove svc_shutdown_net()
Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.

svc_close_net() is renamed to be the inverse of svc_xprt_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
4355d767a2 SUNRPC: Rename svc_close_xprt()
Clean up: Use the "svc_xprt_<task>" function naming convention as
is used for other external APIs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:40 -05:00
Chuck Lever
352ad31448 SUNRPC: Rename svc_create_xprt()
Clean up: Use the "svc_xprt_<task>" function naming convention as
is used for other external APIs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
87cdd8641c SUNRPC: Remove svo_shutdown method
Clean up. Neil observed that "any code that calls svc_shutdown_net()
knows what the shutdown function should be, and so can call it
directly."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
2022-02-28 10:26:39 -05:00
Chuck Lever
a9ff2e99e9 SUNRPC: Remove the .svo_enqueue_xprt method
We have never been able to track down and address the underlying
cause of the performance issues with workqueue-based service
support. svo_enqueue_xprt is called multiple times per RPC, so
it adds instruction path length, but always ends up at the same
function: svc_xprt_do_enqueue(). We do not anticipate needing
this flexibility for dynamic nfsd thread management support.

As a micro-optimization, remove .svo_enqueue_xprt because
Spectre/Meltdown makes virtual function calls more costly.

This change essentially reverts commit b9e13cdfac ("nfsd/sunrpc:
turn enqueueing a svc_xprt into a svc_serv operation").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
c1a3f2ce66 NFSD: Remove NFSD_PROC_ARGS_* macros
Clean up.

The PROC_ARGS macros were added when I thought that NFSD tracepoints
would be reporting endpoint information. However, tracepoints in the
RPC server now report transport endpoint information, so in general
there's no need for the upper layers to do that any more, and these
macros can be retired.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
9db0e15fb3 NFSD: Use __sockaddr field to store socket addresses
As an example usage of the new __sockaddr field, convert some NFSD
trace points to use it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:39 -05:00
Chuck Lever
add1511c38 NFSD: Streamline the rare "found" case
Move a rarely called function call site out of the hot path.

This is an exceptionally small improvement because the compiler
inlines most of the functions that nfsd_cache_lookup() calls.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Chuck Lever
0f29ce32fb NFSD: Skip extra computation for RC_NOCACHE case
Force the compiler to skip unneeded initialization for cases that
don't need those values. For example, NFSv4 COMPOUND operations are
RC_NOCACHE.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Chuck Lever
378a6109dd NFSD: De-duplicate hash bucket indexing
Clean up: The details of finding the right hash bucket are exactly
the same in both nfsd_cache_lookup() and nfsd_cache_update().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Ondrej Valousek
e377a3e698 nfsd: Add support for the birth time attribute
For filesystems that supports "btime" timestamp (i.e. most modern
filesystems do) we share it via kernel nfsd. Btime support for NFS
client has already been added by Trond recently.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Ondrej Valousek <ondrej.valousek.xm@renesas.com>
[ cel: addressed some whitespace/checkpatch nits ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28 10:26:38 -05:00
Linus Torvalds
f4bc5bbb5f Notable bug fixes:
Ensure that NFS clients cannot send file size or offset values that
 can cause the NFS server to crash or to return incorrect or
 surprising results. In particular, fix how the NFS server handles
 values larger than OFFSET_MAX.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmIDzr4ACgkQM2qzM29m
 f5eJRBAAikdh0PYOlZbvy9M1eY6wq3k+Y10JsnCZk4T8Uq0NJF/7CJ3R4/4+xGOh
 ZA/2vE1dN4IfqnIOdxw1cXbzzgAO5p/nDLMo9wC6NimrVLkE+S8j38oWvEHOCJXC
 TzUbIKkxqBBcfDw4pO4BT42iHx+cqVUuRFd2qkob1ZRoe+BKI+F4+7QNVc8iEw5z
 j85i2/h6JohsItzekRbMO1q1iXxBc+IZRYafjibtVRWxRuNUWP8C1cv0eXrlSy3O
 L07kZRwzrd52PAi1Q8K07Ip+yTHUMZptyHoB6S863uuz/mOzlpXewvXHMGA1btlr
 POHYG/lBXpDS0e2pjksyXXp2I7HJV/HuaMyyLveWRO0qleBc3G5PsvIJNBW7xl5f
 NPGdgfaa+8ZeOCGolvPruykL9Eh7QAyWTdPKz1J+NuhjkAB4p6ba9QcKVwP7kYTi
 I8zdeUPgbjuFW35hal0ZIlNi2RfcuSGk1FKjotrQ6J3XNIaqPkUWK+1Zz3MzqPUW
 +1ElzoXQugJASPBkEZuf1aXr8/vRjKT16l8EX1kbtJ5wjj2OPbnWWZk03ZncLVfv
 CzbJTZLqiM0JuRqXvYpUGAQdryWcwvTCAuWxcqrt4ALNWW6Z4Y35Vl8H4sTh8wkr
 Q3m6bAVYJx3FmFop7y5ubVH137k1SFJ0NzGJJK0mYoZQSMZoPZI=
 =64n/
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd fixes from Chuck Lever:
 "Ensure that NFS clients cannot send file size or offset values that
  can cause the NFS server to crash or to return incorrect or surprising
  results.

  In particular, fix how the NFS server handles values larger than
  OFFSET_MAX"

* tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Deprecate NFS_OFFSET_MAX
  NFSD: Fix offset type in I/O trace points
  NFSD: COMMIT operations must not return NFS?ERR_INVAL
  NFSD: Clamp WRITE offsets
  NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
  NFSD: Fix ia_size underflow
  NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09 09:56:57 -08:00
Chuck Lever
c306d73769 NFSD: Deprecate NFS_OFFSET_MAX
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:24:40 -05:00
Chuck Lever
6a4d333d54 NFSD: Fix offset type in I/O trace points
NFSv3 and NFSv4 use u64 offset values on the wire. Record these values
verbatim without the implicit type case to loff_t.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:24:30 -05:00
Chuck Lever
3f965021c8 NFSD: COMMIT operations must not return NFS?ERR_INVAL
Since, well, forever, the Linux NFS server's nfsd_commit() function
has returned nfserr_inval when the passed-in byte range arguments
were non-sensical.

However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
are permitted to return only the following non-zero status codes:

      NFS3ERR_IO
      NFS3ERR_STALE
      NFS3ERR_BADHANDLE
      NFS3ERR_SERVERFAULT

NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
is not listed in the COMMIT row of Table 6 in RFC 8881.

RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
specify when it can or should be used.

Instead of dropping or failing a COMMIT request in a byte range that
is not supported, turn it into a valid request by treating one or
both arguments as zero. Offset zero means start-of-file, count zero
means until-end-of-file, so we only ever extend the commit range.
NFS servers are always allowed to commit more and sooner than
requested.

The range check is no longer bounded by NFS_OFFSET_MAX, but rather
by the value that is returned in the maxfilesize field of the NFSv3
FSINFO procedure or the NFSv4 maxfilesize file attribute.

Note that this change results in a new pynfs failure:

CMT4     st_commit.testCommitOverflow                             : RUNNING
CMT4     st_commit.testCommitOverflow                             : FAILURE
           COMMIT with offset + count overflow should return
           NFS4ERR_INVAL, instead got NFS4_OK

IMO the test is not correct as written: RFC 8881 does not allow the
COMMIT operation to return NFS4ERR_INVAL.

Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-02-09 09:24:23 -05:00
Chuck Lever
6260d9a56a NFSD: Clamp WRITE offsets
Ensure that a client cannot specify a WRITE range that falls in a
byte range outside what the kernel's internal types (such as loff_t,
which is signed) can represent. The kiocb iterators, invoked in
nfsd_vfs_write(), should properly limit write operations to within
the underlying file system's s_maxbytes.

Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:24:01 -05:00
Chuck Lever
a648fdeb7c NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
iattr::ia_size is a loff_t, so these NFSv3 procedures must be
careful to deal with incoming client size values that are larger
than s64_max without corrupting the value.

Silently capping the value results in storing a different value
than the client passed in which is unexpected behavior, so remove
the min_t() check in decode_sattr3().

Note that RFC 1813 permits only the WRITE procedure to return
NFS3ERR_FBIG. We believe that NFSv3 reference implementations
also return NFS3ERR_FBIG when ia_size is too large.

Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:23:38 -05:00
Chuck Lever
e6faac3f58 NFSD: Fix ia_size underflow
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and
NFSv4 both define file size as an unsigned 64-bit type. Thus there
is a range of valid file size values an NFS client can send that is
already larger than Linux can handle.

Currently decode_fattr4() dumps a full u64 value into ia_size. If
that value happens to be larger than S64_MAX, then ia_size
underflows. I'm about to fix up the NFSv3 behavior as well, so let's
catch the underflow in the common code path: nfsd_setattr().

Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:22:48 -05:00
Chuck Lever
0cb4d23ae0 NFSD: Fix the behavior of READ near OFFSET_MAX
Dan Aloni reports:
> Due to commit 8cfb901528 ("NFS: Always provide aligned buffers to
> the RPC read layers") on the client, a read of 0xfff is aligned up
> to server rsize of 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server
> and it returns an NFS code of EINVAL to the client. The client as
> a result indefinitely retries the request.

The Linux NFS client does not handle NFS?ERR_INVAL, even though all
NFS specifications permit servers to return that status code for a
READ.

Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed
and return a short result. Set the EOF flag in the result to prevent
the client from retrying the READ request. This behavior appears to
be consistent with Solaris NFS servers.

Note that NFSv3 and NFSv4 use u64 offset values on the wire. These
must be converted to loff_t internally before use -- an implicit
type cast is not adequate for this purpose. Otherwise VFS checks
against sb->s_maxbytes do not work properly.

Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:22:34 -05:00
Linus Torvalds
88808fbbea Notable bug fixes:
- Ensure SM_NOTIFY doesn't crash the NFS server host
 - Ensure NLM locks are cleaned up after client reboot
 - Fix a leak of internal NFSv4 lease information
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmH1em4ACgkQM2qzM29m
 f5drhRAAq8uU+tgABqZNj4aLivUOAionkSiV6Blk1V44DO00yhY2y3dAsOu8bO0k
 Kh1Yu0QSZeaYDSi2Ak9qCKAl8eNg8lvlxWJ5pQ+GERVJiZj3JJRPSUJI+5r/aQMi
 k774Y+DzLwPn6/r5iTyymm3vx1wcas+Y/v2nvmHob/G74UKngbhOhP05XS/1MDlM
 fdTtXVKqLx92grDljTXWCtT5q5mpOc+OFufo2a5+b1aJjUWiU/rraT1mArNlEC7F
 IMw/eZn6ZnZv+ywbVJFGeRib/Xa7jNeKA+4CQMH+quk/s8rHEaUJqeM5439HLBYk
 E0KrFAdn+VDV5A6I9TIB1vtykl0KzC/r2u8G4vbA++rfpuxW36lGS95JFnDctGG+
 uwk/f4p2+D7oSGt7gLXt8LTOAx0/NeT+OTtUqZRPcoKO7uXvkkCCu2irD9VpGSpD
 A83Qq0ewT9ntNy0Feik3FgmRSmPTgvywE78MeRFoundd3QhtghUunfY1N2soDt7t
 0hyqBhcH8ypWjFoKmv+wAHLPcGcdeg+8T0w3hFPcyTrrdYo/OJl4MNgrIczA2z8O
 nWCZ+lOZq3QtAkd0eGSFPhnTVebCP5n6yvIfDN4rZc+ASNAqXCR5e1yCDE1gfO+E
 I1uCcxzewWPe3DsuYWQznEx5u4Rpiml5JF1q5uKFwTNj4UTBFKQ=
 =IC/r
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable bug fixes:

   - Ensure SM_NOTIFY doesn't crash the NFS server host

   - Ensure NLM locks are cleaned up after client reboot

   - Fix a leak of internal NFSv4 lease information"

* tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
  lockd: fix failure to cleanup client locks
  lockd: fix server crash on reboot of client holding lock
2022-02-02 10:14:31 -08:00
Christoph Hellwig
322cbb50de block: remove genhd.h
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it.  So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Dai Ngo
ab451ea952 nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
From RFC 7530 Section 16.34.5:

o  The server has not recorded an unconfirmed { v, x, c, *, * } and
   has recorded a confirmed { v, x, c, *, s }.  If the principals of
   the record and of SETCLIENTID_CONFIRM do not match, the server
   returns NFS4ERR_CLID_INUSE without removing any relevant leased
   client state, and without changing recorded callback and
   callback_ident values for client { x }.

The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.

Fixes: 2b63482185 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-01-28 09:04:00 -05:00
Amir Goldstein
29044dae2e fsnotify: fix fsnotify hooks in pseudo filesystems
Commit 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

To fix the regression in pseudo filesystems, convert d_delete() calls
to d_drop() (see commit 46c46f8df9 ("devpts_pty_kill(): don't bother
with d_delete()") and move the fsnotify hook after d_drop().

Add a missing fsnotify_unlink() hook in nfsdfs that was found during
the audit of fsnotify hooks in pseudo filesystems.

Note that the fsnotify hooks in simple_recursive_removal() follow
d_invalidate(), so they require no change.

Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com
Reported-by: Ivan Delalande <colona@arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a9 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-24 14:17:02 +01:00
Linus Torvalds
35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds
175398a097 Highlights:
- Bruce steps down as NFSD maintainer
 - Prepare for dynamic nfsd thread management
 - More work on supporting re-exporting NFS mounts
 - One fs/locks patch on behalf of Jeff Layton
 
 Notable bug fixes:
 - Fix zero-length NFSv3 WRITEs
 - Fix directory cinfo on FS's that do not support iversion
 - Fix WRITE verifiers for stable writes
 - Fix crash on COPY_NOTIFY with a special state ID
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmHcWOMACgkQM2qzM29m
 f5dh0Q/+MjEL0IK551FdChx9Es1JqKRggv9KwJkLIoa1bw/PMSwP2pnKz6eL0Yun
 mdhE9AZQgyFH1IAGdqjeLZKIYRin6bvAdDrnlqQ9SvTviPLWniSUI6AuyUqK6Zyk
 wMcXpyOze0fhpxkYmz8/g7i66w967tmLh5MRvV1dkpOYAe99rYwGhvj+9ZeEWfNI
 TgmptntMG6YEb+xY0E73otXZHMr2DL67ZYvOUYWemJA1uxcX4joaWBg8sx74dB6k
 DUB4BFuoURk6viDD1QYh3qPU3dz9RCJNMz/cWd8+2t7BdaujTSXRIcaFslrQnKfL
 Rm+O7pi5W+XohFDjeuMZ1g0c1ot/aoZSaAz00LoCVhejJ/sK9NiPAN1+LyY91Lja
 cUBMVPNfW7ClIpiZcORP/chNmVn2qlaL2nxzSY/Uegnd5pIIeVD0pFVgx4+NlEat
 mbrrQBcMpBRM0B+RzHS6AusqHrGdSEcwqWoVXWdxsBigJQT/AxWmii3U88k0Z54i
 ooMWLaQ9EBBmygV01JN/OBySW2M/dvbfz3eFROvAVqsIP9JWP3FlUOlRDl8GcjXA
 azi9fTysBom7WtL6NPcxDJbJ2t9hYr2YaztTpdo9YCHOuQbSQT6IWR5PAa3zvwMu
 Bfz6Y8Hoo/KZHCqmkPGYM+x1ENCyDPv788E+erdnw1PFP5F3Pbo=
 =/kX3
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Bruce has announced he is leaving Red Hat at the end of the month and
  is stepping back from his role as NFSD co-maintainer. As a result,
  this includes a patch removing him from the MAINTAINERS file.

  There is one patch in here that Jeff Layton was carrying in the locks
  tree. Since he had only one for this cycle, he asked us to send it to
  you via the nfsd tree.

  There continues to be 0-day reports from Robert Morris @MIT. This time
  we include a fix for a crash in the COPY_NOTIFY operation.

  Highlights:
   - Bruce steps down as NFSD maintainer
   - Prepare for dynamic nfsd thread management
   - More work on supporting re-exporting NFS mounts
   - One fs/locks patch on behalf of Jeff Layton

  Notable bug fixes:
   - Fix zero-length NFSv3 WRITEs
   - Fix directory cinfo on FS's that do not support iversion
   - Fix WRITE verifiers for stable writes
   - Fix crash on COPY_NOTIFY with a special state ID"

* tag 'nfsd-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (51 commits)
  SUNRPC: Fix sockaddr handling in svcsock_accept_class trace points
  SUNRPC: Fix sockaddr handling in the svc_xprt_create_error trace point
  fs/locks: fix fcntl_getlk64/fcntl_setlk64 stub prototypes
  nfsd: fix crash on COPY_NOTIFY with special stateid
  MAINTAINERS: remove bfields
  NFSD: Move fill_pre_wcc() and fill_post_wcc()
  Revert "nfsd: skip some unnecessary stats in the v4 case"
  NFSD: Trace boot verifier resets
  NFSD: Rename boot verifier functions
  NFSD: Clean up the nfsd_net::nfssvc_boot field
  NFSD: Write verifier might go backwards
  nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
  NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id)
  NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
  NFSD: Clean up nfsd_vfs_write()
  nfsd: Replace use of rwsem with errseq_t
  NFSD: Fix verifier returned in stable WRITEs
  nfsd: Retry once in nfsd_open on an -EOPENSTALE return
  nfsd: Add errno mapping for EREMOTEIO
  nfsd: map EBADF
  ...
2022-01-16 07:42:58 +02:00
Linus Torvalds
5dfbfe71e3 fs.idmapped.v5.17
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYdRCkgAKCRCRxhvAZXjc
 olrvAQCdp8LWkT8TauJSl8wmUm3mZhNy+5+fXuCUSwe3PyUtTQEAq4fxm41JpG8u
 WCZTrrxVhaXwgUY3aWzzeQnLCZjtEQw=
 =woqV
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull fs idmapping updates from Christian Brauner:
 "This contains the work to enable the idmapping infrastructure to
  support idmapped mounts of filesystems mounted with an idmapping.

  In addition this contains various cleanups that avoid repeated
  open-coding of the same functionality and simplify the code in quite a
  few places.

  We also finish the renaming of the mapping helpers we started a few
  kernel releases back and move them to a dedicated header to not
  continue polluting the fs header needlessly with low-level idmapping
  helpers. With this series the fs header only contains idmapping
  helpers that interact with fs objects.

  Currently we only support idmapped mounts for filesystems mounted
  without an idmapping themselves. This was a conscious decision
  mentioned in multiple places (cf. [1]).

  As explained at length in [3] it is perfectly fine to extend support
  for idmapped mounts to filesystem's mounted with an idmapping should
  the need arise. The need has been there for some time now (cf. [2]).

  Before we can port any filesystem that is mountable with an idmapping
  to support idmapped mounts in the coming cycles, we need to first
  extend the mapping helpers to account for the filesystem's idmapping.
  This again, is explained at length in our documentation at [3] and
  also in the individual commit messages so here's an overview.

  Currently, the low-level mapping helpers implement the remapping
  algorithms described in [3] in a simplified manner as we could rely on
  the fact that all filesystems supporting idmapped mounts are mounted
  without an idmapping.

  In contrast, filesystems mounted with an idmapping are very likely to
  not use an identity mapping and will instead use a non-identity
  mapping. So the translation step from or into the filesystem's
  idmapping in the remapping algorithm cannot be skipped for such
  filesystems.

  Non-idmapped filesystems and filesystems not supporting idmapped
  mounts are unaffected by this change as the remapping algorithms can
  take the same shortcut as before. If the low-level helpers detect that
  they are dealing with an idmapped mount but the underlying filesystem
  is mounted without an idmapping we can rely on the previous shortcut
  and can continue to skip the translation step from or into the
  filesystem's idmapping. And of course, if the low-level helpers detect
  that they are not dealing with an idmapped mount they can simply
  return the relevant id unchanged; no remapping needs to be performed
  at all.

  These checks guarantee that only the minimal amount of work is
  performed. As before, if idmapped mounts aren't used the low-level
  helpers are idempotent and no work is performed at all"

Link: 2ca4dcc490 ("fs/mount_setattr: tighten permission checks") [1]
Link: https://github.com/containers/podman/issues/10374 [2]
Link: Documentations/filesystems/idmappings.rst [3]
Link: a65e58e791 ("fs: document and rename fsid helpers") [4]

* tag 'fs.idmapped.v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: support mapped mounts of mapped filesystems
  fs: add i_user_ns() helper
  fs: port higher-level mapping helpers
  fs: remove unused low-level mapping helpers
  fs: use low-level mapping helpers
  docs: update mapping documentation
  fs: account for filesystem mappings
  fs: tweak fsuidgid_has_mapping()
  fs: move mapping helpers
  fs: add is_idmapped_mnt() helper
2022-01-11 14:26:55 -08:00
J. Bruce Fields
074b07d94e nfsd: fix crash on COPY_NOTIFY with special stateid
RTM says "If the special ONE stateid is passed to
nfs4_preprocess_stateid_op(), it returns status=0 but does not set
*cstid. nfsd4_copy_notify() depends on stid being set if status=0, and
thus can crash if the client sends the right COPY_NOTIFY RPC."

RFC 7862 says "The cna_src_stateid MUST refer to either open or locking
states provided earlier by the server.  If it is invalid, then the
operation MUST fail."

The RFC doesn't specify an error, and the choice doesn't matter much as
this is clearly illegal client behavior, but bad_stateid seems
reasonable.

Simplest is just to guarantee that nfs4_preprocess_stateid_op, called
with non-NULL cstid, errors out if it can't return a stateid.

Reported-by: rtm@csail.mit.edu
Fixes: 624322f1ad ("NFSD add COPY_NOTIFY operation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
2022-01-08 14:42:03 -05:00
Chuck Lever
fcb5e3fa01 NFSD: Move fill_pre_wcc() and fill_post_wcc()
These functions are related to file handle processing and have
nothing to do with XDR encoding or decoding. Also they are no longer
NFSv3-specific. As a clean-up, move their definitions to a more
appropriate location. WCC is also an NFSv3-specific term, so rename
them as general-purpose helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:03 -05:00
Chuck Lever
58f258f652 Revert "nfsd: skip some unnecessary stats in the v4 case"
On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
returning a reasonable-looking value in the cinfo.before field and
zero in the cinfo.after field.

RFC 8881 Section 10.8.1 says:
> When a client is making changes to a given directory, it needs to
> determine whether there have been changes made to the directory by
> other clients.  It does this by using the change attribute as
> reported before and after the directory operation in the associated
> change_info4 value returned for the operation.

and

> ... The post-operation change
> value needs to be saved as the basis for future change_info4
> comparisons.

A good quality client implementation therefore saves the zero
cinfo.after value. During a subsequent OPEN operation, it will
receive a different non-zero value in the cinfo.before field for
that directory, and it will incorrectly believe the directory has
changed, triggering an undesirable directory cache invalidation.

There are filesystem types where fs_supports_change_attribute()
returns false, tmpfs being one. On NFSv4 mounts, this means the
fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
never invoked. Subsequently, nfsd4_change_attribute() is invoked
with an uninitialized @stat argument.

In fill_pre_wcc(), @stat contains stale stack garbage, which is
then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
zeroes, so zero is placed on the wire. Both of these values are
meaningless.

This fix can be applied immediately to stable kernels. Once there
are more regression tests in this area, this optimization can be
attempted again.

Fixes: 428a23d2bf ("nfsd: skip some unnecessary stats in the v4 case")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-08 14:42:03 -05:00