linux-stable/kernel
Mel Gorman cb75a0c5d3 sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
commit 0e9f02450d upstream.

A NULL pointer dereference bug was reported on a distribution kernel but
the same issue should be present on mainline kernel. It occured on s390
but should not be arch-specific.  A partial oops looks like:

  Unable to handle kernel pointer dereference in virtual kernel address space
  ...
  Call Trace:
    ...
    try_to_wake_up+0xfc/0x450
    vhost_poll_wakeup+0x3a/0x50 [vhost]
    __wake_up_common+0xbc/0x178
    __wake_up_common_lock+0x9e/0x160
    __wake_up_sync_key+0x4e/0x60
    sock_def_readable+0x5e/0x98

The bug hits any time between 1 hour to 3 days. The dereference occurs
in update_cfs_rq_h_load when accumulating h_load. The problem is that
cfq_rq->h_load_next is not protected by any locking and can be updated
by parallel calls to task_h_load. Depending on the compiler, code may be
generated that re-reads cfq_rq->h_load_next after the check for NULL and
then oops when reading se->avg.load_avg. The dissassembly showed that it
was possible to reread h_load_next after the check for NULL.

While this does not appear to be an issue for later compilers, it's still
an accident if the correct code is generated. Full locking in this path
would have high overhead so this patch uses READ_ONCE to read h_load_next
only once and check for NULL before dereferencing. It was confirmed that
there were no further oops after 10 days of testing.

As Peter pointed out, it is also necessary to use WRITE_ONCE() to avoid any
potential problems with store tearing.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Fixes: 685207963b ("sched: Move h_load calculation to task_h_load()")
Link: https://lkml.kernel.org/r/20190319123610.nsivgf3mjbjjesxb@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17 08:38:53 +02:00
..
bpf bpf: do not restore dst_reg when cur_state is freed 2019-04-03 06:26:30 +02:00
cgroup cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting 2019-04-05 22:33:13 +02:00
configs kconfig: tinyconfig: remove stale stack protector fixups 2018-06-15 07:15:28 +09:00
debug kdb: Don't back trace on a cpu that didn't round up 2019-02-12 19:47:19 +01:00
dma dma-direct: do not include SME mask in the DMA supported check 2019-01-13 09:51:05 +01:00
events perf/aux: Make perf_event accessible to setup_aux() 2019-04-05 22:33:11 +02:00
gcov gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT 2018-06-08 18:56:02 +09:00
irq genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n 2019-04-17 08:38:52 +02:00
livepatch Merge branch 'for-4.19/upstream' into for-linus 2018-08-20 18:33:50 +02:00
locking locking/lockdep: Add debug_locks check in __lock_downgrade() 2019-03-27 14:14:43 +09:00
power Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2018-10-12 12:35:02 +02:00
printk printk: Fix panic caused by passing log_buf_len to command line 2018-11-13 11:08:48 -08:00
rcu kprobes: Prohibit probing on RCU debug routine 2019-04-05 22:33:08 +02:00
sched sched/fair: Do not re-read ->h_load_next during hierarchical load calculation 2019-04-17 08:38:53 +02:00
time alarmtimer: Return correct remaining time 2019-04-17 08:38:50 +02:00
trace tracing: kdb: Fix ftdump to not sleep 2019-04-05 22:32:56 +02:00
.gitignore
acct.c
async.c
audit.c audit: use ktime_get_coarse_real_ts64() for timestamps 2018-07-17 14:45:08 -04:00
audit.h
audit_fsnotify.c fsnotify: add fsnotify_add_inode_mark() wrappers 2018-05-18 14:58:22 +02:00
audit_tree.c \n 2018-08-17 09:41:28 -07:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-07-18 11:43:36 -04:00
auditfilter.c audit: rename FILTER_TYPE to FILTER_EXCLUDE 2018-06-19 10:39:54 -04:00
auditsc.c audit/stable-4.18 PR 20180814 2018-08-15 10:46:54 -07:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:08:47 -08:00
capability.c
compat.c time: Enable get/put_compat_itimerspec64 always 2018-06-24 14:39:47 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Mute hotplug lockdep during init 2019-04-05 22:33:14 +02:00
cpu_pm.c
crash_core.c kernel/crash_core.c: print timestamp using time64_t 2018-08-22 10:52:47 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Use raw_spinlocks 2018-04-27 14:34:51 +02:00
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
elfcore.c
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting 2019-04-05 22:33:13 +02:00
extable.c
fail_function.c bpf/error-inject/kprobes: Clear current_kprobe and enable preempt in kprobe 2018-06-21 12:33:19 +02:00
fork.c fork: record start_time late 2019-01-13 09:51:04 +01:00
freezer.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
futex.c futex: Ensure that futex address is aligned in handle_futex_death() 2019-03-27 14:14:40 +09:00
futex_compat.c
groups.c
hung_task.c kernel/hung_task.c: force console verbose before panic 2019-02-12 19:47:19 +01:00
iomem.c memremap: split devm_memremap_pages() and memremap() infrastructure 2018-05-15 23:08:33 -07:00
irq_work.c
jump_label.c jump_label: Fix typo in warning message 2018-09-10 10:15:48 +02:00
kallsyms.c kallsyms, x86: Export addresses of PTI entry trampolines 2018-08-14 19:12:29 -03:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt kconfig: include kernel/Kconfig.preempt from init/Kconfig 2018-08-02 08:06:54 +09:00
kcov.c kernel/kcov.c: mark write_comp_data() as notrace 2019-02-12 19:47:20 +01:00
kexec.c kexec: add call to LSM hook in original kexec_load syscall 2018-07-16 12:31:57 -07:00
kexec_core.c kexec: yield to scheduler when loading kimage segments 2018-06-15 07:55:24 +09:00
kexec_file.c treewide: Use array_size() in vzalloc() 2018-06-12 16:19:22 -07:00
kexec_internal.h
kmod.c
kprobes.c kprobes: Return error if we fail to reuse kprobe instead of BUG_ON() 2018-11-13 11:08:28 -08:00
ksysfs.c
kthread.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
latencytop.c
Makefile kbuild: move bin2c back to scripts/ from scripts/basic/ 2018-07-18 01:18:05 +09:00
memremap.c mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support 2019-01-13 09:51:04 +01:00
module-internal.h modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
module.c kobject: return error code if writing /sys/.../uevent fails 2019-02-12 19:47:06 +01:00
module_signing.c modsign: log module name in the event of an error 2018-07-02 11:36:17 +02:00
notifier.c
nsproxy.c
padata.c
panic.c panic: avoid deadlocks in re-entrant console drivers 2018-12-29 13:37:57 +01:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c Fix failure path in alloc_pid() 2019-01-13 09:51:06 +01:00
pid_namespace.c
profile.c
ptrace.c ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS 2018-12-05 19:32:03 +01:00
range.c
reboot.c PM / reboot: Eliminate race between reboot and suspend 2018-08-06 12:35:20 +02:00
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:02:35 -07:00
resource.c libnvdimm for 4.18 2018-06-08 17:21:52 -07:00
rseq.c rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
seccomp.c audit/stable-4.18 PR 20180605 2018-06-06 16:34:00 -07:00
signal.c signal: Restore the stop PTRACE_EVENT_EXIT 2019-02-20 10:25:48 +01:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:47:25 +01:00
smpboot.c smpboot: Remove cpumask from the API 2018-07-03 09:20:44 +02:00
smpboot.h
softirq.c nohz: Fix missing tick reprogram when interrupting an inline softirq 2018-08-03 15:52:10 +02:00
stacktrace.c
stop_machine.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-08-13 11:25:07 -07:00
sys.c kernel/sys.c: remove duplicated include 2018-09-20 22:01:11 +02:00
sys_ni.c Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-10 10:17:09 -07:00
sysctl.c sysctl: handle overflow for file-max 2019-04-05 22:32:57 +02:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c
taskstats.c
test_kprobes.c kprobes: Remove jprobe API implementation 2018-06-21 12:33:05 +02:00
torture.c torture: Keep old-school dmesg format 2018-06-25 11:30:10 -07:00
tracepoint.c tracepoint: Fix tracepoint array element size mismatch 2018-10-17 15:35:29 -04:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c umh: fix race condition 2018-06-07 16:56:28 -04:00
up.c
user-return-notifier.c
user.c userns: use irqsave variant of refcount_dec_and_lock() 2018-08-22 10:52:47 -07:00
user_namespace.c userns: also map extents in the reverse map to kernel IDs 2018-11-13 11:09:00 -08:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-08-11 02:05:53 -05:00
watchdog.c watchdog: Respect watchdog cpumask on CPU hotplug 2019-04-03 06:26:29 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-08-30 12:56:40 +02:00
workqueue_internal.h workqueue: Set worker->desc to workqueue name by default 2018-05-18 08:47:13 -07:00