linux-stable/fs
Tejun Heo 10b12027f8 kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id()
[ Upstream commit 4207b556e6 ]

The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
programs including e.g. the ones that attach to functions which are holding
the scheduler rq lock.

Consider the following BPF program:

  SEC("fentry/__set_cpus_allowed_ptr_locked")
  int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
  {
	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);

	  if (cgrp) {
		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
		  bpf_cgroup_release(cgrp);
	  }
	  return 0;
  }

__set_cpus_allowed_ptr_locked() is called with rq lock held and the above
BPF program calls bpf_cgroup_from_id() within leading to the following
lockdep warning:

  =====================================================
  WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
  6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
  -----------------------------------------------------
  repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70

		and this task is already holding:
  ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
  which would create a new lock dependency:
   (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
  ...
   Possible interrupt unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(kernfs_idr_lock);
				 local_irq_disable();
				 lock(&rq->__lock);
				 lock(kernfs_idr_lock);
    <Interrupt>
      lock(&rq->__lock);

		 *** DEADLOCK ***
  ...
  Call Trace:
   dump_stack_lvl+0x55/0x70
   dump_stack+0x10/0x20
   __lock_acquire+0x781/0x2a40
   lock_acquire+0xbf/0x1f0
   _raw_spin_lock+0x2f/0x40
   kernfs_find_and_get_node_by_id+0x1e/0x70
   cgroup_get_from_id+0x21/0x240
   bpf_cgroup_from_id+0xe/0x20
   bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
   bpf_trampoline_6442545632+0x4f/0x1000
   __set_cpus_allowed_ptr_locked+0x5/0x5a0
   sched_setaffinity+0x1b3/0x290
   __x64_sys_sched_setaffinity+0x4f/0x60
   do_syscall_64+0x40/0xe0
   entry_SYSCALL_64_after_hwframe+0x46/0x4e

Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
kernfs_idr_lock.

This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
Combined with the preceding rearrange patch, the net increase is 8 bytes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13 13:10:09 +02:00
..
9p 9p: Use length of data written to the server in preference to error 2024-01-04 13:15:31 +00:00
adfs adfs: remove writepage implementation 2023-12-29 11:58:33 -08:00
affs affs: free affs_sb_info with kfree_rcu() 2024-02-25 02:10:31 -05:00
afs afs: Fix occasional rmdir-then-VNOVNODE with generic/011 2024-03-26 18:17:27 -04:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs mean_and_variance: Drop always failing tests 2024-04-10 16:38:08 +02:00
befs befs: d_obtain_alias(ERR_PTR(...)) will do the right thing 2023-12-21 12:51:02 -05:00
bfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
btrfs btrfs: send: handle path ref underflow in header iterate_inode_ref() 2024-04-13 13:10:02 +02:00
cachefiles cachefiles: fix memory leak in cachefiles_add_cache() 2024-02-20 09:46:07 +01:00
ceph ceph: stop copying to iter at EOF on sync reads 2024-03-26 18:17:36 -04:00
coda dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
configfs
cramfs
crypto fscrypt: document that CephFS supports fscrypt now 2023-12-26 22:55:42 -06:00
debugfs debugfs: fix wait/cancellation handling during remove 2024-04-03 15:32:17 +02:00
devpts fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
dlm dlm: fix user space lkb refcounting 2024-04-03 15:32:20 +02:00
ecryptfs fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
efivarfs efivarfs: Drop 'duplicates' bool parameter on efivar_init() 2024-02-25 09:43:39 +01:00
efs
erofs erofs: fix lockdep false positives on initializing erofs_pseudo_mnt 2024-03-26 18:16:56 -04:00
exfat Description for this pull request: 2024-03-01 12:22:30 -08:00
exportfs
ext2 quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:17:03 -04:00
ext4 ext4: forbid commit inconsistent quota data when errors=remount-ro 2024-04-13 13:10:05 +02:00
f2fs f2fs: truncate page cache before clearing flags when aborting atomic write 2024-04-03 15:32:22 +02:00
fat fat: fix uninitialized field in nostale filehandles 2024-04-03 15:32:06 +02:00
freevxfs freevxfs: lookup: fix function params kernel-doc 2023-12-20 15:02:58 -08:00
fuse fuse: don't unhash root 2024-04-03 15:32:12 +02:00
gfs2 Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" 2024-02-02 17:21:44 +01:00
hfs hfs: really remove hfs_writepage 2023-12-29 11:58:34 -08:00
hfsplus fs/hfsplus: use better @opf description 2024-03-26 18:16:27 -04:00
hostfs hostfs: use d_splice_alias() calling conventions to simplify failure exits 2023-12-21 12:51:00 -05:00
hpfs
hugetlbfs fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super 2024-02-07 21:20:36 -08:00
iomap iomap: clear the per-folio dirty bits on all writeback failures 2024-03-26 18:16:26 -04:00
isofs isofs: handle CDs with bad root inode but good Joliet root directory 2024-04-13 13:10:04 +02:00
jbd2 jbd2: abort journal when detecting metadata writeback error of fs dev 2024-01-04 23:42:21 -05:00
jffs2 jffs2: mark __jffs2_dbg_superblock_counts() static 2023-12-10 17:21:43 -08:00
jfs quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:17:03 -04:00
kernfs kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() 2024-04-13 13:10:09 +02:00
lockd sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
minix minixfs kmap_local_page() switchover and related fixes - very similar to sysv series. 2024-01-11 19:54:18 -08:00
netfs netfs: Fix missing zero-length check in unbuffered write 2024-01-29 14:53:21 +01:00
nfs NFS: Read unlock folio on nfs_page_create_from_folio() error 2024-04-03 15:32:18 +02:00
nfs_common
nfsd nfsd: hold a lighter-weight client reference over CB_RECALL_ANY 2024-04-10 16:38:14 +02:00
nilfs2 nilfs2: prevent kernel bug at submit_bh_wbc() 2024-04-03 15:32:23 +02:00
nls
notify dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ntfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ntfs3 fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS 2024-02-26 09:32:23 -08:00
ocfs2 quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:17:03 -04:00
omfs
openpromfs
orangefs Julia Lawall reported this null pointer dereference, this should fix it. 2024-04-13 13:10:05 +02:00
overlayfs ovl: relax WARN_ON in ovl_verify_area() 2024-03-26 18:17:28 -04:00
proc kbuild: make -Woverride-init warnings more consistent 2024-04-10 16:38:00 +02:00
pstore pstore/zone: Add a null pointer check to the psz_kmsg_read 2024-04-13 13:10:00 +02:00
qnx4 qnx4: Use get_directory_fname() in qnx4_match() 2023-12-13 11:19:18 -08:00
qnx6
quota quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:17:03 -04:00
ramfs mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
reiserfs quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:17:03 -04:00
romfs
smb smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect() 2024-04-10 16:38:21 +02:00
squashfs Squashfs: fix variable overflow triggered by sysbot 2023-12-10 17:21:26 -08:00
sysfs fs/sysfs/dir.c : Fix typo in comment 2023-12-07 11:35:23 +09:00
sysv sysv: don't call sb_bread() with pointers_lock held 2024-04-13 13:10:04 +02:00
tracefs eventfs: Keep all directory links at 1 2024-02-01 11:53:53 -05:00
ubifs ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path 2024-04-03 15:32:07 +02:00
udf misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
ufs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
unicode
vboxsf vboxsf: Avoid an spurious warning if load_nls_xxx() fails 2024-04-10 16:38:03 +02:00
verity Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
xfs xfs: drop experimental warning for FSDAX 2024-02-27 09:53:30 +05:30
zonefs zonefs: Improve error handling 2024-02-16 10:20:35 +09:00
Kconfig vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
Kconfig.binfmt
Makefile vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
aio.c aio: Fix null ptr deref in aio_complete() wakeup 2024-04-10 16:38:19 +02:00
anon_inodes.c
attr.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
backing-file.c fs: factor out backing_file_mmap() helper 2023-12-23 16:35:09 +02:00
bad_inode.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
char_dev.c
compat_binfmt_elf.c
coredump.c iov_iter: get rid of 'copy_mc' flag 2024-03-06 10:52:12 +01:00
d_path.c
dax.c
dcache.c Revert "get rid of DCACHE_GENOCIDE" 2024-02-09 23:31:16 -05:00
direct-io.c
drop_caches.c
eventfd.c eventfd: Remove usage of the deprecated ida_simple_xx() API 2023-12-12 14:24:55 +01:00
eventpoll.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
exec.c exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() 2024-04-03 15:32:39 +02:00
fcntl.c fs: Fix rw_hint validation 2024-03-26 18:16:26 -04:00
fhandle.c do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak 2024-03-26 18:16:25 -04:00
file.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
file_table.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
filesystems.c
fs-writeback.c netfs: Move pinning-for-writeback from fscache to netfs 2023-12-24 15:08:49 +00:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
internal.h dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2023-12-24 15:48:03 -05:00
kernel_read_file.c
libfs.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
locks.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
mbcache.c
mnt_idmapping.c mnt_idmapping: decouple from namespaces 2023-11-28 14:08:47 +01:00
mount.h
mpage.c fs: convert block_write_full_page to block_write_full_folio 2023-12-29 11:58:35 -08:00
namei.c rcu pathwalk: prevent bogus hard errors from may_lookup() 2024-02-25 02:10:31 -05:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-07 21:16:29 +01:00
nsfs.c
open.c vfs-6.8.rw 2024-01-08 11:11:51 -08:00
pipe.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
pnode.c
pnode.h
posix_acl.c fs: fix doc comment typo fs tree wide 2023-12-21 13:17:54 +01:00
proc_namespace.c
read_write.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
readdir.c fsnotify: optionally pass access range in file permission hooks 2023-12-12 16:20:02 +01:00
remap_range.c remap_range: merge do_clone_file_range() into vfs_clone_file_range() 2024-02-06 17:07:21 +01:00
select.c fs/select: rework stack allocation hack for clang 2024-03-26 18:16:27 -04:00
seq_file.c
signalfd.c
splice.c fs: use splice_copy_file_range() inline helper 2023-12-12 16:20:02 +01:00
stack.c
stat.c vfs-6.8.mount 2024-01-08 10:57:34 -08:00
statfs.c
super.c fs/super.c: don't drop ->s_user_ns until we free struct super_block itself 2024-02-25 02:10:31 -05:00
sync.c
sysctls.c fs: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
timerfd.c
userfaultfd.c Generic: 2024-01-17 13:03:37 -08:00
utimes.c
xattr.c