linux-stable/kernel
Thomas Gleixner 332962f2c8 clocksource: Reselect clocksource when watchdog validated high-res capability
Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC);
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-and-tested-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-05 11:09:28 +02:00
..
cpu idle: Enable interrupts in the weak arch_cpu_idle() implementation 2013-06-14 23:01:05 +02:00
debug kgdb/sysrq: fix inconstistent help message of sysrq key 2013-04-30 17:04:10 -07:00
events hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() 2013-06-20 17:57:01 +02:00
gcov kernel/gcov: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:33 -08:00
irq irqdomain: document the simple domain first_irq 2013-06-08 21:15:09 +01:00
power Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 2013-05-02 19:40:34 -07:00
sched posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting 2013-07-04 18:02:30 +02:00
time clocksource: Reselect clocksource when watchdog validated high-res capability 2013-07-05 11:09:28 +02:00
trace tracing: Fix outputting formats of x86-tsc and counter when use trace_clock 2013-06-11 13:58:46 -04:00
.gitignore kernel/hz.bc: ignore. 2013-04-22 07:09:06 -07:00
acct.c fs: Fix hang with BSD accounting on frozen filesystem 2013-05-04 14:57:58 -04:00
async.c async: rename and redefine async_func_ptr 2013-03-12 13:59:14 -07:00
audit.c audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE 2013-06-12 16:29:45 -07:00
audit.h Merge git://git.infradead.org/users/eparis/audit 2013-05-11 14:29:11 -07:00
audit_tree.c kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() 2013-06-12 16:29:46 -07:00
audit_watch.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
auditfilter.c auditfilter.c: fix kernel-doc warnings 2013-05-24 16:22:52 -07:00
auditsc.c audit: Make testing for a valid loginuid explicit. 2013-05-07 22:27:15 -04:00
backtracetest.c
bounds.c
capability.c Add file_ns_capable() helper function for open-time capability checking 2013-04-14 10:06:31 -07:00
cgroup.c cgroup: warn about mismatching options of a new mount of an existing hierarchy 2013-05-29 07:59:39 +09:00
cgroup_freezer.c cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() 2012-11-19 08:13:38 -08:00
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
configs.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
context_tracking.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-06-20 08:18:35 -10:00
cpu.c CPU hotplug: provide a generic helper to disable/enable CPU hotplug 2013-06-12 16:29:44 -07:00
cpu_pm.c
cpuset.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
crash_dump.c
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-18 10:55:28 -08:00
delayacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c move exit_task_namespaces() outside of exit_notify() 2013-06-15 05:39:08 +04:00
extable.c extable: Flip the sorting message 2013-04-15 13:25:16 +02:00
fork.c Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block 2013-05-08 11:51:05 -07:00
freezer.c freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() 2012-10-26 14:27:49 -07:00
futex.c futex: fix kernel-doc notation and spello 2013-03-12 20:42:10 -07:00
futex_compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-02-23 18:50:11 -08:00
groups.c
hrtimer.c hrtimers: Support resuming with two or more CPUs online (but stopped) 2013-06-28 23:15:06 +02:00
hung_task.c
irq_work.c Merge branch 'nohz/printk-v8' into irq/core 2013-02-05 00:48:46 +01:00
itimer.c
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c kernel: kallsyms: memory override issue, need check destination buffer length 2013-04-15 15:17:26 +09:30
kcmp.c kcmp: include linux/ptrace.h 2012-12-20 17:40:19 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt
kexec.c kexec: Use min() and min_t() to simplify logic 2013-04-30 17:04:07 -07:00
kmod.c usermodehelper: check subprocess_info->path != NULL 2013-05-16 12:01:11 -07:00
kprobes.c kprobes: Fix to free gone and unused optprobes 2013-05-28 10:37:59 +02:00
ksysfs.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 18:10:49 -08:00
kthread.c kthread: implement probe_kthread_data() 2013-04-30 17:04:02 -07:00
latencytop.c
lglock.c
lockdep.c Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block 2013-05-08 11:51:05 -07:00
lockdep_internals.h
lockdep_proc.c lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() 2012-10-24 12:39:09 +02:00
lockdep_states.h
Makefile We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
modsign_certificate.S CONFIG_SYMBOL_PREFIX: cleanup. 2013-03-15 15:09:43 +10:30
modsign_pubkey.c keys: use keyring_alloc() to create module signing keyring 2012-12-20 17:40:21 -08:00
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c kmemleak: No need for scanning specific module sections 2013-05-17 09:53:36 +01:00
module_signing.c MODSIGN: Don't use enum-type bitfields in module signature info block 2012-12-05 11:27:24 +10:30
mutex-debug.c
mutex-debug.h
mutex.c mutex: Back out architecture specific check for negative mutex count 2013-04-19 09:33:36 +02:00
mutex.h
notifier.c
nsproxy.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
padata.c padata: use __this_cpu_read per-cpu helper 2012-12-06 17:16:23 +08:00
panic.c dump_stack: implement arch-specific hardware description in task dumps 2013-04-30 17:04:02 -07:00
params.c params: Fix potential memory leak in add_sysfs_param() 2013-03-18 11:40:21 +00:00
pid.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
pid_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
posix-cpu-timers.c posix_timers: fix racy timer delta caching on task exit 2013-07-03 16:54:42 +02:00
posix-timers.c posix-timers: Remove unused variable 2013-04-18 12:51:19 +02:00
printk.c kmsg: honor dmesg_restrict sysctl on /dev/kmsg 2013-06-12 16:29:44 -07:00
profile.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
ptrace.c Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validation 2013-06-29 11:29:08 -07:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-18 11:32:10 -05:00
rcu.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcupdate.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny_plugin.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcutorture.c rcu: Allow rcutorture to be built at low optimization levels 2013-02-04 12:18:20 -08:00
rcutree.c rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration 2013-06-10 13:37:12 -07:00
rcutree.h rcu: Don't call wakeup() with rcu_node structure ->lock held 2013-06-10 13:37:11 -07:00
rcutree_plugin.h rcu: Don't allocate bootmem from rcu_init() 2013-05-15 10:41:12 -07:00
rcutree_trace.c rcutrace: single_open() leaks 2013-05-05 00:16:35 -04:00
relay.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
res_counter.c res_counter: return amount of charges after res_counter_uncharge() 2012-12-18 15:02:12 -08:00
resource.c mem hotunplug: fix kfree() of bootmem memory 2013-04-29 15:54:40 -07:00
rtmutex-debug.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex-debug.h
rtmutex-tester.c locking/rtmutex/tester: Set correct permissions on sysfs files 2013-04-10 14:48:37 +02:00
rtmutex.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex.h
rtmutex_common.h
rwsem.c Revert "rw_semaphore: remove up/down_read_non_owner" 2013-03-23 15:53:52 -07:00
seccomp.c seccomp: allow BPF_XOR based ALU instructions. 2013-03-26 11:07:19 +11:00
semaphore.c semaphore: use `bool' type for semaphore_waiter's up 2013-04-30 17:04:08 -07:00
signal.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
smp.c kernel/smp.c: cleanups 2013-04-30 17:04:03 -07:00
smpboot.c kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Fix lockup related to stop_machine being stuck in __do_softirq. 2013-06-10 17:46:57 -07:00
spinlock.c
srcu.c srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock() 2013-02-07 15:19:36 -08:00
stacktrace.c
stop_machine.c stop_machine: Mark per cpu stopper enabled early 2013-02-26 22:25:17 +01:00
sys.c reboot: rigrate shutdown/reboot to boot cpu 2013-06-12 16:29:44 -07:00
sys_ni.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
sysctl.c mm: replace hardcoded 3% with admin_reserve_pages knob 2013-04-29 15:54:36 -07:00
sysctl_binary.c switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE 2013-05-09 14:53:20 -04:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c kernel/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
time.c timekeeping: Use inject_offset in warp_clock 2013-03-15 16:50:20 -07:00
timeconst.bc kernel: Replace timeconst.pl with a bc script 2013-02-16 23:17:25 +01:00
timer.c timer: Fix jiffies wrap behavior of round_jiffies_common() 2013-06-28 17:10:11 +02:00
tracepoint.c Tracing updates for Linux 3.10 2013-04-29 13:55:38 -07:00
tsacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
uid16.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
up.c
user-return-notifier.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
utsname.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
utsname_sysctl.c kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block 2013-02-27 19:10:22 -08:00
wait.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
watchdog.c watchdog: Add comments to explain the watchdog_disabled variable 2013-03-14 08:24:05 +01:00
workqueue.c workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init() 2013-05-15 14:24:24 -07:00
workqueue_internal.h workqueue: include workqueue info when printing debug dump of a worker task 2013-04-30 17:04:02 -07:00