linux-stable/kernel
Miaohe Lin abdb88dd27 fork: defer linking file vma until vma is fully initialized
commit 35e351780f upstream.

Thorvald reported a WARNING [1]. And the root cause is below race:

 CPU 1					CPU 2
 fork					hugetlbfs_fallocate
  dup_mmap				 hugetlbfs_punch_hole
   i_mmap_lock_write(mapping);
   vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
   i_mmap_unlock_write(mapping);
   hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
					 i_mmap_lock_write(mapping);
   					 hugetlb_vmdelete_list
					  vma_interval_tree_foreach
					   hugetlb_vma_trylock_write -- Vma_lock is cleared.
   tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
					   hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
					 i_mmap_unlock_write(mapping);

hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time.  Fix this
by deferring linking file vma until vma is fully initialized.  Those vmas
should be initialized first before they can be used.

Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb2608 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reported-by: Thorvald Natvig <thorvald@google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Tycho Andersen <tandersen@netflix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-27 17:13:04 +02:00
..
bpf bpf: support deferring bpf_link dealloc to after RCU grace period 2024-04-10 16:38:23 +02:00
cgroup cgroup/cpuset: Fix retval in update_cpumask() 2024-02-29 10:30:35 -10:00
configs hardening: Provide Kconfig fragments for basic options 2023-09-22 09:50:55 -07:00
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-17 17:19:06 +00:00
dma dma-direct: Leak pages on dma_set_decrypted() failure 2024-04-13 13:10:01 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:32:33 +02:00
events uprobes: use pagesize-aligned virtual address when replacing pages 2024-01-25 23:52:20 -08:00
futex futex: Prevent the reuse of stale pi_state 2024-01-19 12:58:17 +01:00
gcov gcov: annotate struct gcov_iterator with __counted_by 2023-10-18 14:43:22 -07:00
irq genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd 2024-04-03 15:32:45 +02:00
kcsan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-09-20 11:24:18 +02:00
locking RCU pull request for v6.8 2024-01-12 16:35:58 -08:00
module Revert "crypto: pkcs7 - remove sha1 support" 2024-04-03 15:32:31 +02:00
power PM: s2idle: Make sure CPUs will wakeup directly on resume 2024-04-17 11:23:25 +02:00
printk dump_stack: Do not get cpu_sync for panic CPU 2024-04-13 13:09:58 +02:00
rcu rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock() 2024-04-13 13:10:04 +02:00
sched sched: Add missing memory barrier in switch_mm_cid 2024-04-27 17:13:01 +02:00
time Fix memory leak in posix_clock_open() 2024-04-03 15:32:32 +02:00
trace tracing: hide unused ftrace_event_id_fops 2024-04-17 11:23:35 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kexec: select CRYPTO from KEXEC_FILE instead of depending on it 2023-12-20 13:46:19 -08:00
Kconfig.locks
Kconfig.preempt
Makefile kernel/numa.c: Move logging out of numa.h 2023-12-20 19:26:30 -05:00
acct.c fs: rename __mnt_{want,drop}_write*() helpers 2023-09-11 15:05:50 +02:00
async.c header cleanups for 6.8 2024-01-10 16:43:55 -08:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2023-11-12 22:33:49 -05:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-14 17:34:27 -05:00
auditfilter.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c bounds: support non-power-of-two CONFIG_NR_CPUS 2024-04-03 15:32:06 +02:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu.c x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n 2024-04-17 11:23:39 +02:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c crash: use macro to add crashk_res into iomem early for specific arch 2024-04-03 15:32:49 +02:00
crash_dump.c
cred.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-15 14:19:48 -08:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) 2024-02-07 21:20:33 -08:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c fork: defer linking file vma until vma is fully initialized 2024-04-27 17:13:04 +02:00
freezer.c Linux 6.7-rc6 2023-12-23 15:52:13 +01:00
gen_kheaders.sh Revert "kheaders: substituting --sort in archive creation" 2023-05-28 16:20:21 +09:00
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2022-11-12 18:47:36 -08:00
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h kallsyms: Add self-test facility 2022-11-15 00:42:02 -08:00
kcmp.c file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
kcov.c kcov: add prototypes for helper functions 2023-06-09 17:44:17 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-10-09 16:59:47 +10:00
kexec_core.c kexec: do syscore_shutdown() in kernel_kexec 2024-01-12 15:20:47 -08:00
kexec_elf.c
kexec_file.c kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down() 2023-12-29 12:22:25 -08:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-04-17 11:23:36 +02:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
numa.c kernel/numa.c: Move logging out of numa.h 2023-12-20 19:26:30 -05:00
padata.c padata: Fix refcnt handling in padata_free_shell() 2023-10-27 18:04:24 +08:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:09:58 +02:00
params.c params: Fix multi-line comment style 2023-12-01 09:51:44 -08:00
pid.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
pid_namespace.c wait: Remove uapi header file from main header file 2023-12-20 19:26:31 -05:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c Quite a lot of kexec work this time around. Many singleton patches in 2024-01-09 11:46:20 -08:00
range.c
reboot.c Thermal control updates for 6.8-rc1 2024-01-09 16:20:17 -08:00
regset.c
relay.c kernel: relay: remove relay_file_splice_read dead code, doesn't work 2023-12-29 12:22:27 -08:00
resource.c Quite a lot of kexec work this time around. Many singleton patches in 2024-01-09 11:46:20 -08:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c scs: add support for dynamic shadow call stacks 2022-11-09 18:06:35 +00:00
seccomp.c file: remove __receive_fd() 2023-12-12 14:24:14 +01:00
signal.c kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake() 2023-12-10 17:21:32 -08:00
smp.c CSD lock commits for v6.7 2023-10-30 17:56:53 -10:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c sched/core: introduce sched_core_idle_cpu() 2023-07-13 15:21:50 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call.c
static_call_inline.c static_call: Add call depth tracking support 2022-10-17 16:41:16 +02:00
stop_machine.c
sys.c prctl: generalize PR_SET_MDWE support check to be per-arch 2024-04-03 15:32:37 +02:00
sys_ni.c lsm/stable-6.8 PR 20240105 2024-01-09 12:57:46 -08:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c taskstats: fill_stats_for_tgid: use for_each_thread() 2023-10-04 10:41:57 -07:00
torture.c torture: Print out torture module parameters 2023-09-24 17:24:01 +02:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user-return-notifier.c
user.c binfmt_misc: enable sandboxed mounts 2023-10-11 08:46:01 -07:00
user_namespace.c mnt_idmapping: decouple from namespaces 2023-11-28 14:08:47 +01:00
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c watch_queue: fix kcalloc() arguments order 2023-12-21 13:17:54 +01:00
watchdog.c watchdog: if panicking and we dumped everything, don't re-enable dumping 2023-12-29 12:22:30 -08:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: add a weak function for an arch to detect if perf can use NMIs 2023-06-09 17:44:21 -07:00
workqueue.c Revert "workqueue.c: Increase workqueue name length" 2024-04-04 20:25:06 +02:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00