linux-stable/kernel/sched
Valentin Schneider feedbfeb22 sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
[ Upstream commit 49bef33e4b ]

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 13:57:40 +02:00
..
Makefile sched, kcsan: Enable memory barrier instrumentation 2021-12-09 16:42:28 -08:00
autogroup.c
autogroup.h
clock.c
completion.c
core.c sched/core: Export pelt_thermal_tp 2022-04-08 13:57:39 +02:00
core_sched.c sched/core: Accounting forceidle time for all tasks except idle task 2022-01-18 12:09:59 +01:00
cpuacct.c sched/cpuacct: Fix charge percpu cpuusage 2022-04-08 13:57:40 +02:00
cpudeadline.c
cpudeadline.h
cpufreq.c
cpufreq_schedutil.c sched/uclamp: Fix iowait boost escaping uclamp restriction 2022-04-08 13:57:39 +02:00
cpupri.c
cpupri.h
cputime.c Peter Zijlstra says: 2022-01-11 17:14:59 -08:00
deadline.c sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 13:57:40 +02:00
debug.c sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa 2022-04-08 13:57:39 +02:00
fair.c sched/fair: Improve consistency of allowed NUMA balance calculations 2022-04-08 13:57:39 +02:00
features.h
idle.c
isolation.c
loadavg.c
membarrier.c sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask 2022-01-25 22:30:25 +01:00
pelt.c
pelt.h sched/pelt: Relax the sync of util_sum with util_avg 2022-01-18 12:09:58 +01:00
psi.c psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n 2022-01-30 09:56:58 +02:00
rt.c sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 13:57:40 +02:00
sched-pelt.h
sched.h sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max 2022-04-08 13:57:39 +02:00
smp.h
stats.c
stats.h
stop_task.c
swait.c
topology.c
wait.c wait: add wake_up_pollfree() 2021-12-09 10:49:56 -08:00
wait_bit.c