linux-stable/kernel
Thomas Gleixner 61fa9f167c futex: Prevent exit livelock
commit 3ef240eaff upstream.

Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			->	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk->futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --> wakes parent
  sys_futex()
    loop infinite because
    tsk->futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk->futex_state = FUTEX_STATE_EXITING;

  --> preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff86 .. ba31c1a485
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stable Team <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05 15:38:28 +01:00
..
bpf bpf: drop refcount if bpf_map_new_fd() fails in map_create() 2019-12-05 15:38:00 +01:00
cgroup cgroup: Fix css_task_iter_advance_css_set() cset skip condition 2019-08-09 17:53:37 +02:00
configs ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES. 2017-08-22 18:43:23 -07:00
debug kdb: Don't back trace on a cpu that didn't round up 2019-02-12 19:46:09 +01:00
events signal: Properly deliver SIGILL from uprobes 2019-11-20 17:59:52 +01:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq: Prevent NULL pointer dereference in resend_irqs() 2019-09-19 09:08:04 +02:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:55:09 +02:00
locking Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-10-11 18:18:36 +02:00
power x86/power: Fix 'nosmt' vs hibernation triple fault during resume 2019-06-11 12:21:48 +02:00
printk printk: fix integer overflow in setup_log_buf() 2019-12-01 09:13:17 +01:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:47:33 -07:00
sched sched/fair: Don't increase sd->balance_interval on newidle balance 2019-12-01 09:14:02 +01:00
time tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-10-11 18:18:46 +02:00
trace tracing: Initialize iter->seq after zeroing in tracing_read_pipe() 2019-11-06 12:43:20 +01:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:31:17 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:23:05 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:52:39 +02:00
audit.h ipc: mqueue: Replace timespec with timespec64 2017-09-03 20:21:24 -04:00
audit_fsnotify.c
audit_tree.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:38:09 +02:00
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:47:25 -07:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:14:03 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c compat: fix 4-byte infoleak via uninitialized struct field 2018-05-16 10:10:26 +02:00
configs.c
context_tracking.c
cpu.c cpu/SMT: State SMT is disabled even with nosmt and without "=force" 2019-11-24 08:23:08 +01:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:28:58 +02:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:18:42 +02:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c futex: Mark the begin of futex exit explicitly 2019-12-05 15:38:27 +01:00
extable.c extable: Enable RCU if it is not watching in kernel_text_address() 2017-09-23 16:50:20 -04:00
fork.c futex: Split futex_mm_release() for exit/exec 2019-12-05 15:38:26 +01:00
freezer.c
futex.c futex: Prevent exit livelock 2019-12-05 15:38:28 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:15:05 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:15:38 +02:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:23:15 +01:00
kexec_file.c
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Don't call BUG_ON() if there is a kprobe in use on free list 2019-11-20 17:59:56 +01:00
ksysfs.c
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:50:21 +02:00
latencytop.c
Makefile y2038: futex: Move compat implementation into futex.c 2019-12-05 15:38:22 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c kernel/module: Fix mem leak in module_add_modinfo_attrs 2019-09-16 08:20:46 +02:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-07-31 07:28:39 +02:00
panic.c panic: ensure preemption is disabled during panic() 2019-10-17 13:43:19 -07:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-21 12:47:31 -07:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-31 07:28:21 +02:00
profile.c
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:54:38 +02:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:03:20 -07:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 18:54:04 +02:00
signal.c signal: Always ignore SIGKILL and SIGSTOP sent to the global init 2019-11-20 17:59:52 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:46:13 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:52 +02:00
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-15 11:54:51 +02:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-25 14:26:21 +01:00
taskstats.c
test_kprobes.c
torture.c torture: Fix typo suppressing CPU-hotplug statistics 2017-07-25 13:04:45 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:51:50 +02:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
umh.c kmod: split out umh code into its own file 2017-09-08 18:26:50 -07:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00