Go to file
Tejun Heo 10b12027f8 kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id()
[ Upstream commit 4207b556e6 ]

The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
programs including e.g. the ones that attach to functions which are holding
the scheduler rq lock.

Consider the following BPF program:

  SEC("fentry/__set_cpus_allowed_ptr_locked")
  int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
  {
	  struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id);

	  if (cgrp) {
		  bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name);
		  bpf_cgroup_release(cgrp);
	  }
	  return 0;
  }

__set_cpus_allowed_ptr_locked() is called with rq lock held and the above
BPF program calls bpf_cgroup_from_id() within leading to the following
lockdep warning:

  =====================================================
  WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
  6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
  -----------------------------------------------------
  repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70

		and this task is already holding:
  ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
  which would create a new lock dependency:
   (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2}
  ...
   Possible interrupt unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(kernfs_idr_lock);
				 local_irq_disable();
				 lock(&rq->__lock);
				 lock(kernfs_idr_lock);
    <Interrupt>
      lock(&rq->__lock);

		 *** DEADLOCK ***
  ...
  Call Trace:
   dump_stack_lvl+0x55/0x70
   dump_stack+0x10/0x20
   __lock_acquire+0x781/0x2a40
   lock_acquire+0xbf/0x1f0
   _raw_spin_lock+0x2f/0x40
   kernfs_find_and_get_node_by_id+0x1e/0x70
   cgroup_get_from_id+0x21/0x240
   bpf_cgroup_from_id+0xe/0x20
   bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
   bpf_trampoline_6442545632+0x4f/0x1000
   __set_cpus_allowed_ptr_locked+0x5/0x5a0
   sched_setaffinity+0x1b3/0x290
   __x64_sys_sched_setaffinity+0x4f/0x60
   do_syscall_64+0x40/0xe0
   entry_SYSCALL_64_after_hwframe+0x46/0x4e

Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
kernfs_idr_lock.

This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
Combined with the preceding rearrange patch, the net increase is 8 bytes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13 13:10:09 +02:00
Documentation x86/bhi: Mitigate KVM by default 2024-04-10 16:38:24 +02:00
LICENSES
arch perf/x86/amd/lbr: Discard erroneous branch entries 2024-04-13 13:10:08 +02:00
block block: prevent division by zero in blk_rq_stat_sum() 2024-04-13 13:10:07 +02:00
certs
crypto Revert "crypto: pkcs7 - remove sha1 support" 2024-04-03 15:32:31 +02:00
drivers bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state 2024-04-13 13:10:08 +02:00
fs kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() 2024-04-13 13:10:09 +02:00
include kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() 2024-04-13 13:10:09 +02:00
init init: open /initrd.image with O_LARGEFILE 2024-04-03 15:32:37 +02:00
io_uring io_uring/kbuf: hold io_buffer_list reference over mmap 2024-04-10 16:38:16 +02:00
ipc
kernel ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment 2024-04-13 13:10:08 +02:00
lib dump_stack: Do not get cpu_sync for panic CPU 2024-04-13 13:09:58 +02:00
mm x86/mm/pat: fix VM_PAT handling in COW mappings 2024-04-10 16:38:19 +02:00
net Bluetooth: Add new quirk for broken read key length on ATS2851 2024-04-13 13:10:02 +02:00
rust
samples work around gcc bugs with 'asm goto' with outputs 2024-02-09 15:57:48 -08:00
scripts kbuild: make -Woverride-init warnings more consistent 2024-04-10 16:38:00 +02:00
security selinux: avoid dereference of garbage after mount failure 2024-04-10 16:38:01 +02:00
sound ALSA: hda/realtek: Add quirk for Lenovo Yoga 9 14IMH9 2024-04-13 13:10:08 +02:00
tools tools: iio: replace seekdir() in iio_generic_buffer 2024-04-13 13:10:08 +02:00
usr
virt KVM: Always flush async #PF workqueue when vCPU is being destroyed 2024-04-03 15:32:03 +02:00
.clang-format
.cocciconfig
.editorconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap drm fixes for 6.8 final 2024-03-08 12:44:56 -08:00
.rustfmt.toml
COPYING
CREDITS MAINTAINERS: supplement of zswap maintainers update 2024-01-25 23:52:21 -08:00
Kbuild
Kconfig
MAINTAINERS drm fixes for 6.8 final 2024-03-08 12:44:56 -08:00
Makefile Linux 6.8.5 2024-04-10 16:38:25 +02:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.