linux-stable/kernel/time
Balasubramani Vivekanandan e5331c37c0 tick: broadcast-hrtimer: Fix a race in bc_set_next
[ Upstream commit b9023b91dd ]

When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                  and subscribe to
                                                  tick broadcast)
---------------------                             ---------------------

__run_hrtimer()                                   tick_broadcast_enter()

  bc_handler()                                      __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

      raw_spin_lock(&tick_broadcast_lock);

      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
      //next_event for tick broadcast clock
      set to KTIME_MAX since no other cores
      subscribed to tick broadcasting

      raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
      return HRTIMER_NORESTART
    // callback function exits without
       restarting the hrtimer                      //tick_broadcast_lock acquired
                                                   raw_spin_lock(&tick_broadcast_lock);

                                                   tick_broadcast_set_event()

                                                     clockevents_program_event()

                                                       dev->next_event = expires;

                                                       bc_set_next()

                                                         hrtimer_try_to_cancel()
                                                         //returns -1 since the timer
                                                         callback is active. Exits without
                                                         restarting the timer
  cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11 18:21:28 +02:00
..
alarmtimer.c alarmtimer: Use EOPNOTSUPP instead of ENOTSUPP 2019-10-05 13:10:07 +02:00
clockevents.c clockevents: Warn if cpu_all_mask is used as cpumask 2018-08-02 14:55:53 +02:00
clocksource.c clocksource: Revert "Remove kthread" 2018-09-06 23:38:35 +02:00
hrtimer.c Merge branch 'fortglx/4.19/time' of https://git.linaro.org/people/john.stultz/linux into timers/core 2018-07-12 22:19:58 +02:00
itimer.c pid: Implement PIDTYPE_TGID 2018-07-21 10:43:12 -05:00
jiffies.c jiffies: Revert bogus conversion of NSEC_PER_SEC to TICK_NSEC 2017-03-07 11:03:28 +01:00
Kconfig sched/isolation: Eliminate NO_HZ_FULL_ALL 2018-02-15 15:40:37 -08:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ntp.c ntp: Limit TAI-UTC offset 2019-07-26 09:14:10 +02:00
ntp_internal.h timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
posix-clock.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
posix-cpu-timers.c posix-cpu-timers: Sanitize bogus WARNONS 2019-10-05 13:09:47 +02:00
posix-stubs.c posix-timers: Use new ktime_get_*_ts64() helpers 2018-06-19 09:56:27 +02:00
posix-timers.c posix-timers: Fix division by zero bug 2018-12-29 13:37:56 +01:00
posix-timers.h posix-timers: Make forward callback return s64 2018-07-02 11:33:25 +02:00
sched_clock.c timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
test_udelay.c
tick-broadcast-hrtimer.c tick: broadcast-hrtimer: Fix a race in bc_set_next 2019-10-11 18:21:28 +02:00
tick-broadcast.c tick/broadcast: Use for_each_cpu() specially on UP kernels 2018-05-15 22:45:54 +02:00
tick-common.c timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
tick-internal.h Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME 2018-04-26 14:53:32 +02:00
tick-oneshot.c clockevents: Fix kernel messages split across multiple lines 2018-04-17 17:18:04 +02:00
tick-sched.c nohz: Fix local_timer_softirq_pending() 2018-07-31 22:08:44 +02:00
tick-sched.h nohz: Gather tick_sched booleans under a common flag field 2018-04-09 11:54:57 +02:00
time.c timekeeping: Force upper bound for setting CLOCK_REALTIME 2019-05-31 06:46:29 -07:00
timeconst.bc time: Introduce jiffies64_to_nsecs() 2017-02-01 09:13:45 +01:00
timeconv.c
timecounter.c clocksource: Use a plain u64 instead of cycle_t 2016-12-25 11:04:12 +01:00
timekeeping.c timekeeping: Use proper ktime_add when adding nsecs in coarse offset 2019-09-16 08:21:42 +02:00
timekeeping.h timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze() 2019-04-27 09:36:38 +02:00
timekeeping_debug.c timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
timekeeping_internal.h timekeeping/ntp: Constify some function arguments 2018-07-19 17:08:05 -07:00
timer.c timer: Read jiffies once when forwarding base clk 2019-10-11 18:20:59 +02:00
timer_list.c timer_list: Guard procfs specific code 2019-07-26 09:14:10 +02:00