linux-stable/kernel
Juri Lelli 3cc3d77dc5 sched/deadline: Fix priority inheritance with multiple scheduling classes
commit 2279f540ea upstream.

Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).

 ------------[ cut here ]------------
 kernel BUG at kernel/sched/deadline.c:1462!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
 Hardware name: ...
 RIP: 0010:enqueue_task_dl+0x335/0x910
 Code: ...
 RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002
 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500
 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600
 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078
 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600
 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600
 FS:  00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  ? intel_pstate_update_util_hwp+0x13/0x170
  rt_mutex_setprio+0x1cc/0x4b0
  task_blocks_on_rt_mutex+0x225/0x260
  rt_spin_lock_slowlock_locked+0xab/0x2d0
  rt_spin_lock_slowlock+0x50/0x80
  hrtimer_grab_expiry_lock+0x20/0x30
  hrtimer_cancel+0x13/0x30
  do_nanosleep+0xa0/0x150
  hrtimer_nanosleep+0xe1/0x230
  ? __hrtimer_init_sleeper+0x60/0x60
  __x64_sys_nanosleep+0x8d/0xa0
  do_syscall_64+0x4a/0x100
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fa58b52330d
 ...
 ---[ end trace 0000000000000002 ]—

He also provided a simple reproducer creating the situation below:

 So the execution order of locking steps are the following
 (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
 are mutexes that are enabled * with priority inheritance.)

 Time moves forward as this timeline goes down:

 N1              N2               D1
 |               |                |
 |               |                |
 Lock(M1)        |                |
 |               |                |
 |             Lock(M2)           |
 |               |                |
 |               |              Lock(M2)
 |               |                |
 |             Lock(M1)           |
 |             (!!bug triggered!) |

Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).

Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().

Fix this by keeping track of the original donor and using its parameters
when a task is boosted.

Reported-by: Glenn Elliott <glenn@aurora.tech>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
[Ankit: Regenerated the patch for v4.19.y]
Signed-off-by: Ankit Jain <ankitja@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05 10:26:28 +02:00
..
bpf bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds() 2022-08-25 11:14:55 +02:00
cgroup cgroup: Use separate src/dst nodes when preloading css_sets for migration 2022-07-21 21:09:26 +02:00
configs
debug kdb: Make memory allocations more robust 2021-03-04 09:39:31 +01:00
dma swiotlb: skip swiotlb_bounce when orig_addr is zero 2022-07-02 16:27:40 +02:00
events perf/core: Fix data race between perf_event_set_output() and perf_mmap_close() 2022-07-29 17:10:31 +02:00
gcov gcov: add support for GCC 10.1 2020-09-17 13:45:31 +02:00
irq random: remove unused irq_flags argument from add_interrupt_randomness() 2022-06-25 11:49:01 +02:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:57:10 +02:00
locking locking/lockdep: Avoid RCU-induced noinstr fail 2021-11-26 11:36:04 +01:00
power PM: hibernate: defer device probing when resuming from hibernation 2022-08-25 11:15:00 +02:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-15 14:14:45 +02:00
rcu rcu: Fix missed wakeup of exp_wq waiters 2021-09-26 13:39:46 +02:00
sched sched/deadline: Fix priority inheritance with multiple scheduling classes 2022-09-05 10:26:28 +02:00
time timekeeping: Add raw clock fallback for random_get_entropy() 2022-06-25 11:49:10 +02:00
trace tracing/probes: Have kprobes and uprobes use $COMM too 2022-08-25 11:15:48 +02:00
.gitignore
acct.c acct_on(): don't mess with freeze protection 2019-05-31 06:46:05 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 11:58:38 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:23:13 +01:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:26:28 +02:00
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:14:33 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
compat.c make 'user_access_begin()' do 'access_ok()' 2020-06-22 09:04:58 +02:00
configs.c
context_tracking.c
cpu.c random: clear fast pool, crng, and batches in cpuhp bring up 2022-06-25 11:49:07 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-22 09:05:28 +02:00
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:19:00 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c futex: Mark the begin of futex exit explicitly 2021-01-30 13:32:11 +01:00
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-24 13:27:23 +01:00
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-22 11:48:09 +02:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c mm, futex: fix shared futex pgoff on shmem huge page 2021-07-11 12:49:30 +02:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:16:02 +02:00
iomem.c
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c locking/static_key: Fix false positive warnings on concurrent dec/inc 2021-03-04 09:39:30 +01:00
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-16 08:17:26 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec.c
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:23:23 +01:00
kexec_file.c kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] 2022-07-02 16:27:39 +02:00
kexec_internal.h
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-17 10:48:52 +02:00
kprobes.c kprobes: Forbid probing on trampoline and BPF code areas 2022-08-25 11:15:25 +02:00
ksysfs.c
kthread.c kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() 2021-07-11 12:49:31 +02:00
latencytop.c
Makefile elfcore: fix building with clang 2021-02-10 09:21:06 +01:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
module-internal.h
module.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 11:58:38 +01:00
module_signing.c
notifier.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
nsproxy.c
padata.c padata: add separate cpuhp node for CPUHP_PADATA_DEAD 2021-08-08 08:54:30 +02:00
panic.c kernel/panic.c: do not append newline to the stack protector panic string 2019-12-01 09:17:10 +01:00
params.c
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 11:48:09 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:15:20 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 16:59:14 +02:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-18 19:18:52 +01:00
relay.c kernel/relay.c: fix memleak on destroy relay channel 2020-08-26 10:30:59 +02:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:51:47 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 21:09:32 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:12:50 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 14:48:38 +01:00
smpboot.h
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys.c prctl: allow to setup brk for et_dyn executables 2021-09-26 13:39:47 +02:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-09-05 10:26:28 +02:00
sysctl.c x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting 2022-03-11 10:15:11 +01:00
sysctl_binary.c
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:18:59 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing 2021-07-20 16:15:42 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:58:39 +01:00
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 10:31:21 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-22 10:59:39 +02:00
user-return-notifier.c
user.c userns: use irqsave variant of refcount_dec_and_lock() 2018-08-22 10:52:47 -07:00
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:15:46 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue.c workqueue: make sysfs of unbound kworker cpumask more clever 2021-11-26 11:36:06 +01:00
workqueue_internal.h