linux-stable/kernel/sched
Valentin Schneider feedbfeb22 sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
[ Upstream commit 49bef33e4b ]

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 13:57:40 +02:00
..
autogroup.c sched/fair: Prevent dead task groups from regaining cfs_rq's 2021-11-11 13:09:33 +01:00
autogroup.h
clock.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
completion.c completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() 2020-03-23 18:40:25 +01:00
core.c sched/core: Export pelt_thermal_tp 2022-04-08 13:57:39 +02:00
core_sched.c sched/core: Accounting forceidle time for all tasks except idle task 2022-01-18 12:09:59 +01:00
cpuacct.c sched/cpuacct: Fix charge percpu cpuusage 2022-04-08 13:57:40 +02:00
cpudeadline.c sched,rt: Use the full cpumask for balancing 2020-11-10 18:39:00 +01:00
cpudeadline.h
cpufreq.c
cpufreq_schedutil.c sched/uclamp: Fix iowait boost escaping uclamp restriction 2022-04-08 13:57:39 +02:00
cpupri.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
cpupri.h sched/cpupri: Add CPUPRI_HIGHER 2020-10-29 11:00:30 +01:00
cputime.c Peter Zijlstra says: 2022-01-11 17:14:59 -08:00
deadline.c sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 13:57:40 +02:00
debug.c sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa 2022-04-08 13:57:39 +02:00
fair.c sched/fair: Improve consistency of allowed NUMA balance calculations 2022-04-08 13:57:39 +02:00
features.h sched: Disable TTWU_QUEUE on RT 2021-10-05 15:52:12 +02:00
idle.c sched/idle: Make the idle timer expire in hard interrupt context 2021-09-09 10:36:16 +02:00
isolation.c sched/isolation: Reconcile rcu_nocbs= and nohz_full= 2021-05-13 14:12:47 +02:00
loadavg.c sched: Make multiple runqueue task counters 32-bit 2021-05-12 21:34:17 +02:00
Makefile sched, kcsan: Enable memory barrier instrumentation 2021-12-09 16:42:28 -08:00
membarrier.c sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask 2022-01-25 22:30:25 +01:00
pelt.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
pelt.h sched/pelt: Relax the sync of util_sum with util_avg 2022-01-18 12:09:58 +01:00
psi.c psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n 2022-01-30 09:56:58 +02:00
rt.c sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 13:57:40 +02:00
sched-pelt.h
sched.h sched/sugov: Ignore 'busy' filter when rq is capped by uclamp_max 2022-04-08 13:57:39 +02:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c sched: Introduce task block time in schedstats 2021-10-05 15:51:48 +02:00
stats.h psi: Fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim 2021-11-17 14:49:00 +01:00
stop_task.c sched: Make struct sched_statistics independent of fair sched class 2021-10-05 15:51:45 +02:00
swait.c sched/swait: Prepare usage in completions 2020-03-21 16:00:23 +01:00
topology.c Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
wait.c wait: add wake_up_pollfree() 2021-12-09 10:49:56 -08:00
wait_bit.c