linux-stable/kernel/time
Waiman Long cf9b8de201 clocksource: Avoid accidental unstable marking of clocksources
[ Upstream commit c86ff8c55b ]

Since commit db3a34e174 ("clocksource: Retry clock read if long delays
detected") and commit 2e27e793e2 ("clocksource: Reduce clocksource-skew
threshold"), it is found that tsc clocksource fallback to hpet can
sometimes happen on both Intel and AMD systems especially when they are
running stressful benchmarking workloads. Of the 23 systems tested with
a v5.14 kernel, 10 of them have switched to hpet clock source during
the test run.

The result of falling back to hpet is a drastic reduction of performance
when running benchmarks. For example, the fio performance tests can
drop up to 70% whereas the iperf3 performance can drop up to 80%.

4 hpet fallbacks happened during bootup. They were:

  [    8.749399] clocksource: timekeeping watchdog on CPU13: hpet read-back delay of 263750ns, attempt 4, marking unstable
  [   12.044610] clocksource: timekeeping watchdog on CPU19: hpet read-back delay of 186166ns, attempt 4, marking unstable
  [   17.336941] clocksource: timekeeping watchdog on CPU28: hpet read-back delay of 182291ns, attempt 4, marking unstable
  [   17.518565] clocksource: timekeeping watchdog on CPU34: hpet read-back delay of 252196ns, attempt 4, marking unstable

Other fallbacks happen when the systems were running stressful
benchmarks. For example:

  [ 2685.867873] clocksource: timekeeping watchdog on CPU117: hpet read-back delay of 57269ns, attempt 4, marking unstable
  [46215.471228] clocksource: timekeeping watchdog on CPU8: hpet read-back delay of 61460ns, attempt 4, marking unstable

Commit 2e27e793e2 ("clocksource: Reduce clocksource-skew threshold"),
changed the skew margin from 100us to 50us. I think this is too small
and can easily be exceeded when running some stressful workloads on a
thermally stressed system.  So it is switched back to 100us.

Even a maximum skew margin of 100us may be too small in for some systems
when booting up especially if those systems are under thermal stress. To
eliminate the case that the large skew is due to the system being too
busy slowing down the reading of both the watchdog and the clocksource,
an extra consecutive read of watchdog clock is being done to check this.

The consecutive watchdog read delay is compared against
WATCHDOG_MAX_SKEW/2. If the delay exceeds the limit, we assume that
the system is just too busy. A warning will be printed to the console
and the clock skew check is skipped for this round.

Fixes: db3a34e174 ("clocksource: Retry clock read if long delays detected")
Fixes: 2e27e793e2 ("clocksource: Reduce clocksource-skew threshold")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27 11:04:08 +01:00
..
alarmtimer.c alarmtimer: Check RTC features instead of ops 2021-05-11 21:28:04 +02:00
clockevents.c clockevents: Use list_move() instead of list_del()/list_add() 2021-06-22 17:16:46 +02:00
clocksource-wdtest.c clocksource: Make clocksource watchdog test safe for slow-HZ systems 2021-08-28 17:01:32 +02:00
clocksource.c clocksource: Avoid accidental unstable marking of clocksources 2022-01-27 11:04:08 +01:00
hrtimer.c hrtimer: Unbreak hrtimer_force_reprogram() 2021-08-12 22:34:40 +02:00
itimer.c time: Prevent undefined behaviour in timespec64_to_ns() 2020-10-26 11:48:11 +01:00
jiffies.c clocksource: Make clocksource watchdog test safe for slow-HZ systems 2021-08-28 17:01:32 +02:00
Kconfig -----BEGIN PGP SIGNATURE----- 2021-06-29 12:31:16 -07:00
Makefile time: Improve performance of time64_to_tm() 2021-06-24 11:51:59 +02:00
namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
ntp.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
ntp_internal.h ntp: Make the RTC synchronization more reliable 2020-12-11 10:40:52 +01:00
posix-clock.c posix-clocks: Rename the clock_get() callback to clock_get_timespec() 2020-01-14 12:20:49 +01:00
posix-cpu-timers.c posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() 2021-11-18 19:17:14 +01:00
posix-stubs.c posix-timers: Make clock_nanosleep() time namespace aware 2020-01-14 12:20:55 +01:00
posix-timers.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
posix-timers.h posix-clocks: Introduce clock_get_ktime() callback 2020-01-14 12:20:51 +01:00
sched_clock.c time/sched_clock: Mark sched_clock_read_begin/retry() as notrace 2020-10-26 11:34:31 +01:00
test_udelay.c time/debug: Remove dentry pointer for debugfs 2021-03-18 11:20:26 +01:00
tick-broadcast-hrtimer.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
tick-broadcast.c timer_list: Print name of per-cpu wakeup device 2021-05-31 17:04:49 +02:00
tick-common.c timekeeping: Distangle resume and clock-was-set events 2021-08-10 17:57:23 +02:00
tick-internal.h clocksource: Make clocksource watchdog test safe for slow-HZ systems 2021-08-28 17:01:32 +02:00
tick-legacy.c timekeeping: remove xtime_update 2020-10-30 21:57:07 +01:00
tick-oneshot.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
tick-sched.c Updates to the tick/nohz code in this cycle: 2021-06-28 12:22:06 -07:00
tick-sched.h timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
time.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00
time_test.c time/kunit: Add missing MODULE_LICENSE() 2021-06-28 07:40:23 +02:00
timeconst.bc time: Add SPDX license identifiers 2018-11-23 11:51:20 +01:00
timeconv.c time: Improve performance of time64_to_tm() 2021-06-24 11:51:59 +02:00
timecounter.c time/timecounter: Mark 1st argument of timecounter_cyc2time() as const 2021-04-16 21:03:50 +02:00
timekeeping.c timekeeping: Really make sure wall_to_monotonic isn't positive 2021-12-22 09:32:48 +01:00
timekeeping.h asm-generic: cross-architecture timer cleanup 2020-12-16 00:07:17 -08:00
timekeeping_debug.c timekeeping/debug: No need to check return value of debugfs_create functions 2019-01-29 20:08:41 +01:00
timekeeping_internal.h timekeeping/vsyscall: Provide vdso_update_begin/end() 2020-08-06 10:57:30 +02:00
timer.c timers: implement usleep_idle_range() 2021-12-14 10:57:11 +01:00
timer_list.c timer_list: Print name of per-cpu wakeup device 2021-05-31 17:04:49 +02:00
vsyscall.c timekeeping, clocksource: Fix various typos in comments 2021-03-22 23:06:48 +01:00