linux-stable/kernel
Balasubramani Vivekanandan a990caf9f6 tick: broadcast-hrtimer: Fix a race in bc_set_next
[ Upstream commit b9023b91dd ]

When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                  and subscribe to
                                                  tick broadcast)
---------------------                             ---------------------

__run_hrtimer()                                   tick_broadcast_enter()

  bc_handler()                                      __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

      raw_spin_lock(&tick_broadcast_lock);

      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
      //next_event for tick broadcast clock
      set to KTIME_MAX since no other cores
      subscribed to tick broadcasting

      raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
      return HRTIMER_NORESTART
    // callback function exits without
       restarting the hrtimer                      //tick_broadcast_lock acquired
                                                   raw_spin_lock(&tick_broadcast_lock);

                                                   tick_broadcast_set_event()

                                                     clockevents_program_event()

                                                       dev->next_event = expires;

                                                       bc_set_next()

                                                         hrtimer_try_to_cancel()
                                                         //returns -1 since the timer
                                                         callback is active. Exits without
                                                         restarting the timer
  cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:18:46 +02:00
..
bpf bpf: fix use after free in prog symbol exposure 2019-10-07 18:55:15 +02:00
cgroup cgroup: Fix css_task_iter_advance_css_set() cset skip condition 2019-08-09 17:53:37 +02:00
configs ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES. 2017-08-22 18:43:23 -07:00
debug kdb: Don't back trace on a cpu that didn't round up 2019-02-12 19:46:09 +01:00
events perf/core: Fix creating kernel counters for PMUs that override event->cpu 2019-08-16 10:13:55 +02:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq: Prevent NULL pointer dereference in resend_irqs() 2019-09-19 09:08:04 +02:00
livepatch livepatch: Nullify obj->mod in klp_module_coming()'s error path 2019-10-07 18:55:09 +02:00
locking Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" 2019-10-11 18:18:36 +02:00
power x86/power: Fix 'nosmt' vs hibernation triple fault during resume 2019-06-11 12:21:48 +02:00
printk printk: Do not lose last line in kmsg buffer dump 2019-10-05 12:48:04 +02:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:47:33 -07:00
sched sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() 2019-10-11 18:18:41 +02:00
time tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-10-11 18:18:46 +02:00
trace ftrace: Check for empty hash and comment the race with registering probes 2019-09-06 10:20:55 +02:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:31:17 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:23:05 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:52:39 +02:00
audit.h ipc: mqueue: Replace timespec with timespec64 2017-09-03 20:21:24 -04:00
audit_fsnotify.c
audit_tree.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:38:09 +02:00
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:47:25 -07:00
auditsc.c audit: fix potential null dereference 'context->module.name' 2018-08-06 16:20:49 +02:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c compat: fix 4-byte infoleak via uninitialized struct field 2018-05-16 10:10:26 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Fix out-of-bounds read when setting fail state 2019-07-21 09:04:40 +02:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-17 09:45:27 +01:00
crash_dump.c
cred.c access: avoid the RCU grace period for the temporary subjective credentials 2019-07-31 07:28:58 +02:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c kernel/elfcore.c: include proper prototypes 2019-10-11 18:18:42 +02:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c cgroup: Call cgroup_release() before __exit_signal() 2019-08-09 17:53:36 +02:00
extable.c extable: Enable RCU if it is not watching in kernel_text_address() 2017-09-23 16:50:20 -04:00
fork.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:32:03 +02:00
freezer.c
futex.c locking/futex: Allow low-level atomic operations to return -EAGAIN 2019-05-10 17:53:15 +02:00
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:15:05 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:15:38 +02:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c
kexec_core.c kexec: bail out upon SIGKILL when allocating memory. 2019-10-07 18:55:23 +02:00
kexec_file.c
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Prohibit probing on BUG() and WARN() address 2019-10-05 12:48:01 +02:00
ksysfs.c
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:50:21 +02:00
latencytop.c
Makefile x86/uaccess, kcov: Disable stack protector 2019-06-19 08:20:56 +02:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c kernel/module: Fix mem leak in module_add_modinfo_attrs 2019-09-16 08:20:46 +02:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-07-31 07:28:39 +02:00
panic.c panic: avoid deadlocks in re-entrant console drivers 2018-12-29 13:39:10 +01:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-21 12:47:31 -07:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-07-31 07:28:21 +02:00
profile.c
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:54:38 +02:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c
relay.c relay: check return of create_buf_file() properly 2019-03-13 14:03:20 -07:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 18:54:04 +02:00
signal.c kernel/signal.c: trace_signal_deliver when signal_group_exit 2019-06-09 09:18:17 +02:00
smp.c cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM 2019-02-12 19:46:13 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-15 11:54:52 +02:00
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-15 11:54:51 +02:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-25 14:26:21 +01:00
taskstats.c
test_kprobes.c
torture.c torture: Fix typo suppressing CPU-hotplug statistics 2017-07-25 13:04:45 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:51:50 +02:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-20 10:10:18 +01:00
umh.c kmod: split out umh code into its own file 2017-09-08 18:26:50 -07:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00