linux-stable/kernel
Xiu Jianfeng 88b373d1c5 cgroup: Do not corrupt task iteration when rebinding subsystem
commit 6f363f5aa8 upstream.

We found a refcount UAF bug as follows:

refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148
Workqueue: events cpuset_hotplug_workfn
Call trace:
 refcount_warn_saturate+0xa0/0x148
 __refcount_add.constprop.0+0x5c/0x80
 css_task_iter_advance_css_set+0xd8/0x210
 css_task_iter_advance+0xa8/0x120
 css_task_iter_next+0x94/0x158
 update_tasks_root_domain+0x58/0x98
 rebuild_root_domains+0xa0/0x1b0
 rebuild_sched_domains_locked+0x144/0x188
 cpuset_hotplug_workfn+0x138/0x5a0
 process_one_work+0x1e8/0x448
 worker_thread+0x228/0x3e0
 kthread+0xe0/0xf0
 ret_from_fork+0x10/0x20

then a kernel panic will be triggered as below:

Unable to handle kernel paging request at virtual address 00000000c0000010
Call trace:
 cgroup_apply_control_disable+0xa4/0x16c
 rebind_subsystems+0x224/0x590
 cgroup_destroy_root+0x64/0x2e0
 css_free_rwork_fn+0x198/0x2a0
 process_one_work+0x1d4/0x4bc
 worker_thread+0x158/0x410
 kthread+0x108/0x13c
 ret_from_fork+0x10/0x18

The race that cause this bug can be shown as below:

(hotplug cpu)                | (umount cpuset)
mutex_lock(&cpuset_mutex)    | mutex_lock(&cgroup_mutex)
cpuset_hotplug_workfn        |
 rebuild_root_domains        |  rebind_subsystems
  update_tasks_root_domain   |   spin_lock_irq(&css_set_lock)
   css_task_iter_start       |    list_move_tail(&cset->e_cset_node[ss->id]
   while(css_task_iter_next) |                  &dcgrp->e_csets[ss->id]);
   css_task_iter_end         |   spin_unlock_irq(&css_set_lock)
mutex_unlock(&cpuset_mutex)  | mutex_unlock(&cgroup_mutex)

Inside css_task_iter_start/next/end, css_set_lock is hold and then
released, so when iterating task(left side), the css_set may be moved to
another list(right side), then it->cset_head points to the old list head
and it->cset_pos->next points to the head node of new list, which can't
be used as struct css_set.

To fix this issue, switch from all css_sets to only scgrp's css_sets to
patch in-flight iterators to preserve correct iteration, and then
update it->cset_head as well.

Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://www.spinics.net/lists/cgroups/msg37935.html
Suggested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.com/
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Fixes: 2d8f243a5e ("cgroup: implement cgroup->e_csets[]")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28 10:18:36 +02:00
..
bpf bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields 2023-05-30 12:44:10 +01:00
cgroup cgroup: Do not corrupt task iteration when rebinding subsystem 2023-06-28 10:18:36 +02:00
configs
debug treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2023-04-20 12:07:32 +02:00
dma treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
events treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
gcov gcov: add support for checksum field 2023-01-18 11:41:42 +01:00
irq genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() 2023-05-17 11:35:46 +02:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:22:18 +02:00
locking treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
power PM: hibernate: Fix mistake in kerneldoc comment 2023-01-18 11:40:53 +01:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-15 14:18:08 +02:00
rcu rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait() 2023-03-11 16:43:54 +01:00
sched list: add "list_del_init_careful()" to go with "list_empty_careful()" 2023-06-28 10:18:35 +02:00
time tick/common: Align tick period during sched_timer setup 2023-06-28 10:18:36 +02:00
trace tracing: Add tracing_reset_all_online_cpus_unlocked() function 2023-06-28 10:18:35 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:04:16 +02:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-18 11:41:34 +01:00
async.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
audit.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-15 14:18:04 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:27:38 +02:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:08:16 +02:00
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:17:32 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:30:59 +02:00
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-15 14:18:04 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2023-04-20 12:07:32 +02:00
bounds.c
capability.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:16:42 +02:00
configs.c
context_tracking.c
cpu.c random: clear fast pool, crng, and batches in cpuhp bring up 2022-06-22 14:11:12 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-22 09:31:22 +02:00
crash_core.c
crash_dump.c
cred.c keys: Fix request_key() cache 2020-01-17 19:48:42 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
extable.c kernel/extable.c: use address-of operator on section symbols 2023-06-09 10:29:01 +02:00
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 16:44:15 +01:00
fork.c copy_process(): Move fd_install() out of sighand->siglock critical section 2022-02-23 12:00:00 +01:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c treewide: Remove uninitialized_var() usage 2023-06-09 10:29:01 +02:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-09-03 11:27:10 +02:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-16 08:16:44 +02:00
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:44:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec.c
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:25:56 +01:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:44:10 +02:00
kexec_internal.h
kheaders.c kheaders: Use array declaration instead of char 2023-05-17 11:35:33 +02:00
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-17 10:50:22 +02:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-03-11 16:44:02 +01:00
ksysfs.c
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-12 08:56:39 +02:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:04:16 +02:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-06 07:52:43 +01:00
module_signature.c module: harden ELF info handling 2021-04-07 14:47:38 +02:00
module_signing.c module: harden ELF info handling 2021-04-07 14:47:38 +02:00
notifier.c kernel/notifier.c: intercept duplicate registrations to avoid infinite loops 2020-10-01 13:17:23 +02:00
nsproxy.c
padata.c padata: add separate cpuhp node for CPUHP_PADATA_DEAD 2020-06-17 16:40:22 +02:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:52:50 +01:00
params.c
pid.c
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 12:26:37 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:18:02 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 18:11:24 +02:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-18 19:20:30 +01:00
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-17 11:35:58 +02:00
resource.c /dev/mem: Revoke mappings when a driver claims the region 2020-06-24 17:50:35 +02:00
rseq.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:52:53 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 20:59:27 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:19:39 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 15:35:49 +01:00
smpboot.h
softirq.c
stackleak.c
stacktrace.c stacktrace: Don't skip first entry on noncurrent tasks 2019-11-04 21:19:25 +01:00
stop_machine.c stop_machine: Avoid potential race behaviour 2019-10-17 12:47:12 +02:00
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:17:59 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-09-05 10:27:38 +02:00
sysctl-test.c kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec() 2020-10-01 13:17:10 +02:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-08 11:23:06 +01:00
sysctl_binary.c
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:19:54 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing 2021-07-14 16:53:08 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:59:57 +01:00
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 10:32:58 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-14 09:44:33 +02:00
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:18:37 +02:00
watchdog_hld.c
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-09-28 11:04:09 +02:00
workqueue_internal.h