linux-stable/arch/x86/kernel/cpu/umwait.c

201 lines
5 KiB
C
Raw Normal View History

x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
// SPDX-License-Identifier: GPL-2.0
#include <linux/syscore_ops.h>
#include <linux/suspend.h>
#include <linux/cpu.h>
#include <asm/msr.h>
#define UMWAIT_C02_ENABLE 0
2019-06-20 01:33:56 +00:00
#define UMWAIT_CTRL_VAL(max_time, c02_disable) \
(((max_time) & MSR_IA32_UMWAIT_CONTROL_TIME_MASK) | \
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
((c02_disable) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE))
/*
* Cache IA32_UMWAIT_CONTROL MSR. This is a systemwide control. By default,
* umwait max time is 100000 in TSC-quanta and C0.2 is enabled
*/
static u32 umwait_control_cached = UMWAIT_CTRL_VAL(100000, UMWAIT_C02_ENABLE);
2019-06-20 01:33:56 +00:00
/*
* Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR in
* the sysfs write functions.
*/
static DEFINE_MUTEX(umwait_lock);
static void umwait_update_control_msr(void * unused)
{
lockdep_assert_irqs_disabled();
wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
}
/*
* The CPU hotplug callback sets the control MSR to the global control
* value.
*
* Disable interrupts so the read of umwait_control_cached and the WRMSR
* are protected against a concurrent sysfs write. Otherwise the sysfs
* write could update the cached value after it had been read on this CPU
* and issue the IPI before the old value had been written. The IPI would
* interrupt, write the new value and after return from IPI the previous
* value would be written by this CPU.
*
* With interrupts disabled the upcoming CPU either sees the new control
* value or the IPI is updating this CPU to the new control value after
* interrupts have been reenabled.
*/
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
static int umwait_cpu_online(unsigned int cpu)
{
2019-06-20 01:33:56 +00:00
local_irq_disable();
umwait_update_control_msr(NULL);
local_irq_enable();
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
return 0;
}
/*
* On resume, restore IA32_UMWAIT_CONTROL MSR on the boot processor which
* is the only active CPU at this time. The MSR is set up on the APs via the
* CPU hotplug callback.
*
* This function is invoked on resume from suspend and hibernation. On
* resume from suspend the restore should be not required, but we neither
* trust the firmware nor does it matter if the same value is written
* again.
*/
static void umwait_syscore_resume(void)
{
2019-06-20 01:33:56 +00:00
umwait_update_control_msr(NULL);
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
}
static struct syscore_ops umwait_syscore_ops = {
.resume = umwait_syscore_resume,
};
2019-06-20 01:33:56 +00:00
/* sysfs interface */
/*
* When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
* Otherwise, C0.2 is enabled.
*/
static inline bool umwait_ctrl_c02_enabled(u32 ctrl)
{
return !(ctrl & MSR_IA32_UMWAIT_CONTROL_C02_DISABLE);
}
static inline u32 umwait_ctrl_max_time(u32 ctrl)
{
return ctrl & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
}
static inline void umwait_update_control(u32 maxtime, bool c02_enable)
{
u32 ctrl = maxtime & MSR_IA32_UMWAIT_CONTROL_TIME_MASK;
if (!c02_enable)
ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;
WRITE_ONCE(umwait_control_cached, ctrl);
/* Propagate to all CPUs */
on_each_cpu(umwait_update_control_msr, NULL, 1);
}
static ssize_t
enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
{
u32 ctrl = READ_ONCE(umwait_control_cached);
return sprintf(buf, "%d\n", umwait_ctrl_c02_enabled(ctrl));
}
static ssize_t enable_c02_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
bool c02_enable;
u32 ctrl;
int ret;
ret = kstrtobool(buf, &c02_enable);
if (ret)
return ret;
mutex_lock(&umwait_lock);
ctrl = READ_ONCE(umwait_control_cached);
if (c02_enable != umwait_ctrl_c02_enabled(ctrl))
umwait_update_control(ctrl, c02_enable);
mutex_unlock(&umwait_lock);
return count;
}
static DEVICE_ATTR_RW(enable_c02);
static ssize_t
max_time_show(struct device *kobj, struct device_attribute *attr, char *buf)
{
u32 ctrl = READ_ONCE(umwait_control_cached);
return sprintf(buf, "%u\n", umwait_ctrl_max_time(ctrl));
}
static ssize_t max_time_store(struct device *kobj,
struct device_attribute *attr,
const char *buf, size_t count)
{
u32 max_time, ctrl;
int ret;
ret = kstrtou32(buf, 0, &max_time);
if (ret)
return ret;
/* bits[1:0] must be zero */
if (max_time & ~MSR_IA32_UMWAIT_CONTROL_TIME_MASK)
return -EINVAL;
mutex_lock(&umwait_lock);
ctrl = READ_ONCE(umwait_control_cached);
if (max_time != umwait_ctrl_max_time(ctrl))
umwait_update_control(max_time, umwait_ctrl_c02_enabled(ctrl));
mutex_unlock(&umwait_lock);
return count;
}
static DEVICE_ATTR_RW(max_time);
2019-06-20 01:33:56 +00:00
static struct attribute *umwait_attrs[] = {
&dev_attr_enable_c02.attr,
&dev_attr_max_time.attr,
2019-06-20 01:33:56 +00:00
NULL
};
static struct attribute_group umwait_attr_group = {
.attrs = umwait_attrs,
.name = "umwait_control",
};
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
static int __init umwait_init(void)
{
2019-06-20 01:33:56 +00:00
struct device *dev;
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
int ret;
if (!boot_cpu_has(X86_FEATURE_WAITPKG))
return -ENODEV;
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait:online",
umwait_cpu_online, NULL);
register_syscore_ops(&umwait_syscore_ops);
2019-06-20 01:33:56 +00:00
/*
* Add umwait control interface. Ignore failure, so at least the
* default values are set up in case the machine manages to boot.
*/
dev = cpu_subsys.dev_root;
return sysfs_create_group(&dev->kobj, &umwait_attr_group);
x86/umwait: Initialize umwait control values umwait or tpause allows the processor to enter a light-weight power/performance optimized state (C0.1 state) or an improved power/performance optimized state (C0.2 state) for a period specified by the instruction or until the system time limit or until a store to the monitored address range in umwait. IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on the processor and to set the maximum time the processor can reside in C0.1 or C0.2. By default C0.2 is enabled so the user wait instructions can enter the C0.2 state to save more power with slower wakeup time. Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by default. A quote from Andy: "What I want to avoid is the case where it works dramatically differently on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may behave a bit differently if the max timeout is hit, and I'd like that path to get exercised widely by making it happen even on default configs." A sysfs interface to adjust the time and the C0.2 enablement is provided in a follow up change. [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as mask throughout the code. Massaged comments and changelog ] Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov" <bp@alien8.de> Cc: "H Peter Anvin" <hpa@zytor.com> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: "Tony Luck" <tony.luck@intel.com> Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
2019-06-20 01:33:55 +00:00
}
device_initcall(umwait_init);