linux-stable/fs/nfs
Josef Bacik 31db25e314 nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
[ Upstream commit 719fcafe07 ]

We've been seeing the following panic in production

BUG: kernel NULL pointer dereference, address: 0000000000000065
PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0
RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
Call Trace:
 <TASK>
 ? __die+0x78/0xc0
 ? page_fault_oops+0x286/0x380
 ? __rpc_execute+0x2c3/0x470 [sunrpc]
 ? rpc_new_task+0x42/0x1c0 [sunrpc]
 ? exc_page_fault+0x5d/0x110
 ? asm_exc_page_fault+0x22/0x30
 ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles]
 ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles]
 ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles]
 pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4]
 pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4]
 ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles]
 nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles]
 ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
 __nfs_pageio_add_request+0x154/0x6c0 [nfs]
 nfs_pageio_add_request+0x26b/0x380 [nfs]
 nfs_do_writepage+0x111/0x1e0 [nfs]
 nfs_writepages_callback+0xf/0x30 [nfs]
 write_cache_pages+0x17f/0x380
 ? nfs_pageio_init_write+0x50/0x50 [nfs]
 ? nfs_writepages+0x6d/0x210 [nfs]
 ? nfs_writepages+0x6d/0x210 [nfs]
 nfs_writepages+0x125/0x210 [nfs]
 do_writepages+0x67/0x220
 ? generic_perform_write+0x14b/0x210
 filemap_fdatawrite_wbc+0x5b/0x80
 file_write_and_wait_range+0x6d/0xc0
 nfs_file_fsync+0x81/0x170 [nfs]
 ? nfs_file_mmap+0x60/0x60 [nfs]
 __x64_sys_fsync+0x53/0x90
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Inspecting the core with drgn I was able to pull this

  >>> prog.crashed_thread().stack_trace()[0]
  #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27
  >>> prog.crashed_thread().stack_trace()[0]['idx']
  (u32)1
  >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds
  (struct nfs4_ff_layout_ds *)0xffffffffffffffed

This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds()
which could error out initializing the mirror_ds, and then we go to
clean it all up and our check is only for if (!mirror->mirror_ds).  This
is inconsistent with the rest of the users of mirror_ds, which have

  if (IS_ERR_OR_NULL(mirror_ds))

to keep from tripping over this exact scenario.  Fix this up in
ff_layout_cancel_io() to make sure we don't panic when we get an error.
I also spot checked all the other instances of checking mirror_ds and we
appear to be doing the correct checks everywhere, only unconditionally
dereferencing mirror_ds when we know it would be valid.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Fixes: b739a5bd9d ("NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:20:57 -04:00
..
blocklayout pNFS: Fix the pnfs block driver's calculation of layoutget size 2024-01-25 15:27:23 -08:00
filelayout
flexfilelayout nfs: fix panic when nfs4_ff_layout_prepare_ds() fails 2024-03-26 18:20:57 -04:00
cache_lib.c
cache_lib.h
callback.c
callback.h
callback_proc.c
callback_xdr.c
client.c
delegation.c
delegation.h
dir.c
direct.c pNFS: Fix the pnfs block driver's calculation of layoutget size 2024-01-25 15:27:23 -08:00
dns_resolve.c
dns_resolve.h
export.c nfsd: allow reaping files still under writeback 2024-03-26 18:20:23 -04:00
file.c
fs_context.c
fscache.c mm, netfs, fscache: stop read optimisation when folio removed from pagecache 2024-01-10 17:10:31 +01:00
fscache.h
getroot.c
inode.c nfs: use vfs setgid helper 2023-08-30 16:11:10 +02:00
internal.h pNFS: Fix the pnfs block driver's calculation of layoutget size 2024-01-25 15:27:23 -08:00
io.c
iostat.h
Kconfig
Makefile
mount_clnt.c
namespace.c
netns.h
nfs.h
nfs2super.c
nfs2xdr.c NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN 2023-09-13 09:42:49 +02:00
nfs3_fs.h
nfs3acl.c
nfs3client.c
nfs3proc.c
nfs3super.c
nfs3xdr.c NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN 2023-09-13 09:42:49 +02:00
nfs4_fs.h
nfs4client.c NFSv4.1: fix pnfs MDS=DS session trunking 2023-10-06 14:56:31 +02:00
nfs4file.c
nfs4getroot.c
nfs4idmap.c
nfs4idmap.h
nfs4namespace.c
nfs4proc.c NFSv4.2: fix nfs4_listxattr kernel BUG at mm/usercopy.c:102 2024-03-26 18:20:56 -04:00
nfs4renewd.c
nfs4session.c
nfs4session.h
nfs4state.c NFSv4: Fix a nfs4_state_manager() race 2023-10-10 22:00:41 +02:00
nfs4super.c
nfs4sysctl.c
nfs4trace.c
nfs4trace.h trace: Relocate event helper files 2024-03-06 14:45:17 +00:00
nfs4xdr.c
nfs42.h NFSv4.2: fix listxattr maximum XDR buffer size 2024-03-26 18:20:56 -04:00
nfs42proc.c nfs42: client needs to strip file mode's suid/sgid bit after ALLOCATE op 2023-10-25 12:03:14 +02:00
nfs42xattr.c
nfs42xdr.c NFSv4.2: Rework scratch handling for READ_PLUS (again) 2023-09-13 09:43:05 +02:00
nfsroot.c NFS: Fix an off by one in root_nfs_cat() 2024-03-26 18:20:56 -04:00
nfstrace.c
nfstrace.h trace: Relocate event helper files 2024-03-06 14:45:17 +00:00
pagelist.c
pnfs.c pNFS: Fix the pnfs block driver's calculation of layoutget size 2024-01-25 15:27:23 -08:00
pnfs.h
pnfs_dev.c NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info 2023-09-19 12:27:58 +02:00
pnfs_nfs.c pNFS: Fix assignment of xprtdata.cred 2023-09-13 09:42:49 +02:00
proc.c
read.c NFSv4.2: Rework scratch handling for READ_PLUS (again) 2023-09-13 09:43:05 +02:00
super.c
symlink.c
sysctl.c
sysfs.c NFS: rename nfs_client_kset to nfs_kset 2023-10-10 22:00:35 +02:00
sysfs.h
unlink.c
write.c NFS: Fix data corruption caused by congestion. 2024-03-06 14:45:14 +00:00