linux-stable/include
Tejun Heo 7657e39585 cgroup: Use separate src/dst nodes when preloading css_sets for migration
commit 07fd5b6cdf upstream.

Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.

Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:

 #1> mkdir -p /sys/fs/cgroup/misc/a/b
 #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
 #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
 #4> PID=$!
 #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
 #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.

After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:

 1. cgroup_migrate_add_src() is called on B for the leader.

 2. cgroup_migrate_add_src() is called on A for the other threads.

 3. cgroup_migrate_prepare_dst() is called. It scans the src list.

 4. It notices that B wants to migrate to A, so it tries to A to the dst
    list but realizes that its ->mg_preload_node is already busy.

 5. and then it notices A wants to migrate to A as it's an identity
    migration, it culls it by list_del_init()'ing its ->mg_preload_node and
    putting references accordingly.

 6. The rest of migration takes place with B on the src list but nothing on
    the dst list.

This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.

This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.

This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de9851 ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-21 21:20:00 +02:00
..
acpi ACPICA: actypes.h: Expand the ACPI_ACCESS_ definitions 2022-01-27 10:54:18 +01:00
asm-generic tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry 2022-04-20 09:23:22 +02:00
clocksource
crypto crypto: drbg - make reseeding from get_random_bytes() synchronous 2022-06-06 08:42:42 +02:00
drm drm: fix EDID struct for old ARM OABI format 2022-06-09 10:20:59 +02:00
dt-bindings
keys
kunit
kvm
linux cgroup: Use separate src/dst nodes when preloading css_sets for migration 2022-07-21 21:20:00 +02:00
math-emu
media
memory memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode 2022-05-09 09:05:02 +02:00
misc
net netfilter: nftables: add nft_parse_register_store() and use it 2022-06-29 08:59:46 +02:00
pcmcia
ras
rdma RDMA/netlink: Add __maybe_unused to static inline in C file 2021-11-26 10:39:21 +01:00
scsi scsi: fcoe: Fix Wstringop-overflow warnings in fcoe_wwn_from_mac() 2022-06-09 10:21:15 +02:00
soc firmware: raspberrypi: Keep count of all consumers 2021-09-15 09:50:41 +02:00
sound ALSA: jack: Access input_dev under mutex 2022-06-09 10:20:51 +02:00
target scsi: target: Fix ordered tag handling 2021-11-26 10:39:11 +01:00
trace net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer 2022-07-21 21:20:00 +02:00
uapi include/uapi/linux/xfrm.h: Fix XFRM_MSG_MAPPING ABI breakage 2022-05-25 09:18:02 +02:00
vdso
video video: of_display_timing.h: include errno.h 2022-07-12 16:32:19 +02:00
xen xen/gnttab: fix gnttab_end_foreign_access() without page specified 2022-03-11 12:11:54 +01:00