Power management updates for 5.18-rc1

- Allow device_pm_check_callbacks() to be called from interrupt
    context without issues (Dmitry Baryshkov).
 
  - Modify devm_pm_runtime_enable() to automatically handle
    pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
    Anderson).
 
  - Make the schedutil cpufreq governor use to_gov_attr_set() instead
    of open coding it (Kevin Hao).
 
  - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
    cpufreq longhaul driver (Rafael Wysocki).
 
  - Unify show() and store() naming in cpufreq and make it use
    __ATTR_XX (Lianjie Zhang).
 
  - Make the intel_pstate driver use the EPP value set by the firmware
    by default (Srinivas Pandruvada).
 
  - Re-order the init checks in the powernow-k8 cpufreq driver (Mario
    Limonciello).
 
  - Make the ACPI processor idle driver check for architectural
    support for LPI to avoid using it on x86 by mistake (Mario
    Limonciello).
 
  - Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add 'preferred_cstates' module argument to the intel_idle driver
    to work around C1 and C1E handling issue on Sapphire Rapids (Artem
    Bityutskiy).
 
  - Add core C6 optimization on Sapphire Rapids to the intel_idle
    driver (Artem Bityutskiy).
 
  - Optimize the haltpoll cpuidle driver a bit (Li RongQing).
 
  - Remove leftover text from intel_idle() kerneldoc comment and fix
    up white space in intel_idle (Rafael Wysocki).
 
  - Fix load_image_and_restore() error path (Ye Bin).
 
  - Fix typos in comments in the system wakeup hadling code (Tom Rix).
 
  - Clean up non-kernel-doc comments in hibernation code (Jiapeng
    Chong).
 
  - Fix __setup handler error handling in system-wide suspend and
    hibernation core code (Randy Dunlap).
 
  - Add device name to suspend_report_result() (Youngjin Jang).
 
  - Make virtual guests honour ACPI S4 hardware signature by
    default (David Woodhouse).
 
  - Block power off of a parent PM domain unless child is in deepest
    state (Ulf Hansson).
 
  - Use dev_err_probe() to simplify error handling for generic PM
    domains (Ahmad Fatoum).
 
  - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).
 
  - Document Intel uncore frequency scaling (Srinivas Pandruvada).
 
  - Add DTPM hierarchy description (Daniel Lezcano).
 
  - Change the locking scheme in DTPM (Daniel Lezcano).
 
  - Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
    release (Daniel Lezcano).
 
  - Make dtpm_node_callback[] static (kernel test robot).
 
  - Fix spelling mistake "initialze" -> "initialize" in
    dtpm_create_hierarchy() (Colin Ian King).
 
  - Add tracer tool for the amd-pstate driver (Jinzhou Su).
 
  - Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).
 
  - Add AMD P-State support to the cpupower utility (Huang Rui).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmI4pM4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxh5wQAJEz3u55wIHzeov30obtXaD3SxxnvRzR
 p96gRcmNoR2so/Q9D+h+JHZKQkVklbnbqExMXQn1qarceAUN7KPjVMRvagjZsC/f
 J3LtQmx96yqGTCzOTu5n+Ol2ojKLMCMo++no/2873BYhd60TV6oQxRzkNiZx215n
 tT6MKY5ZMX448VKWAWh9vt5rdvbBj9z6cfvpchK/3bziE21lfLz/1iXeFnwqjPGU
 XuA7NYbVAHOfsdHZk19+4qAgm8EYkmjd4/J8HDlb7XouyLuUGy8KJZYhSrJKiQ1C
 f9f2Zw0925/YpBmFXOwxuYWP9KjFKlq7Cdr3SSgVGDOvgyRtpeV4fU8Y6WPFCtEV
 fQdKr9/4KQP6hwUpxJZucSf49wcnyh7hFDMxrwVVcL96yXZef1OqG3ITihJY/n4J
 +wDnpR2VqBeiG5NyECjk3mPROZGFfUlHRsqMd3JOswMpGF5phpEI9nNFcayB262S
 Rkgcb3MacFVsuo/ZBdzCUTZ6ECvjxZn4FGZPxumkp65SJO18gOPbqs8qfGCZ3Tgb
 GDy0CWEOv/KuGnks1CkBGok2Z4q8s2GcZmaOp9BiPjxKJD71i4uPtiGA/5Ahb6cm
 Cu0G7Ub/t2Vc93E7mnTE4hh2IuiAN73yB5teM4YNllHw6f+aqVGlvJktIMpShajo
 eEBNFlkwljyz
 =WlR9
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are mostly fixes and cleanups all over the code and a new piece
  of documentation for Intel uncore frequency scaling.

  Functionality-wise, the intel_idle driver will support Sapphire Rapids
  Xeons natively now (with some extra facilities for controlling
  C-states more precisely on those systems), virtual guests will take
  the ACPI S4 hardware signature into account by default, the
  intel_pstate driver will take the defualt EPP value from the firmware,
  cpupower utility will support the AMD P-state driver added in the
  previous cycle, and there is a new tracer utility for that driver.

  Specifics:

   - Allow device_pm_check_callbacks() to be called from interrupt
     context without issues (Dmitry Baryshkov).

   - Modify devm_pm_runtime_enable() to automatically handle
     pm_runtime_dont_use_autosuspend() at driver exit time (Douglas
     Anderson).

   - Make the schedutil cpufreq governor use to_gov_attr_set() instead
     of open coding it (Kevin Hao).

   - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
     cpufreq longhaul driver (Rafael Wysocki).

   - Unify show() and store() naming in cpufreq and make it use
     __ATTR_XX (Lianjie Zhang).

   - Make the intel_pstate driver use the EPP value set by the firmware
     by default (Srinivas Pandruvada).

   - Re-order the init checks in the powernow-k8 cpufreq driver (Mario
     Limonciello).

   - Make the ACPI processor idle driver check for architectural support
     for LPI to avoid using it on x86 by mistake (Mario Limonciello).

   - Add Sapphire Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy).

   - Add 'preferred_cstates' module argument to the intel_idle driver to
     work around C1 and C1E handling issue on Sapphire Rapids (Artem
     Bityutskiy).

   - Add core C6 optimization on Sapphire Rapids to the intel_idle
     driver (Artem Bityutskiy).

   - Optimize the haltpoll cpuidle driver a bit (Li RongQing).

   - Remove leftover text from intel_idle() kerneldoc comment and fix up
     white space in intel_idle (Rafael Wysocki).

   - Fix load_image_and_restore() error path (Ye Bin).

   - Fix typos in comments in the system wakeup hadling code (Tom Rix).

   - Clean up non-kernel-doc comments in hibernation code (Jiapeng
     Chong).

   - Fix __setup handler error handling in system-wide suspend and
     hibernation core code (Randy Dunlap).

   - Add device name to suspend_report_result() (Youngjin Jang).

   - Make virtual guests honour ACPI S4 hardware signature by default
     (David Woodhouse).

   - Block power off of a parent PM domain unless child is in deepest
     state (Ulf Hansson).

   - Use dev_err_probe() to simplify error handling for generic PM
     domains (Ahmad Fatoum).

   - Fix sleep-in-atomic bug caused by genpd_debug_remove() (Shawn Guo).

   - Document Intel uncore frequency scaling (Srinivas Pandruvada).

   - Add DTPM hierarchy description (Daniel Lezcano).

   - Change the locking scheme in DTPM (Daniel Lezcano).

   - Fix dtpm_cpu cleanup at exit time and missing virtual DTPM pointer
     release (Daniel Lezcano).

   - Make dtpm_node_callback[] static (kernel test robot).

   - Fix spelling mistake "initialze" -> "initialize" in
     dtpm_create_hierarchy() (Colin Ian King).

   - Add tracer tool for the amd-pstate driver (Jinzhou Su).

   - Fix PC6 displaying in turbostat on some systems (Artem Bityutskiy).

   - Add AMD P-State support to the cpupower utility (Huang Rui)"

* tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (58 commits)
  cpufreq: powernow-k8: Re-order the init checks
  cpuidle: intel_idle: Drop redundant backslash at line end
  cpuidle: intel_idle: Update intel_idle() kerneldoc comment
  PM: hibernate: Honour ACPI hardware signature by default for virtual guests
  cpufreq: intel_pstate: Use firmware default EPP
  cpufreq: unify show() and store() naming and use __ATTR_XX
  PM: core: keep irq flags in device_pm_check_callbacks()
  cpuidle: haltpoll: Call cpuidle_poll_state_init() later
  Documentation: amd-pstate: add tracer tool introduction
  tools/power/x86/amd_pstate_tracer: Add tracer tool for AMD P-state
  tools/power/x86/intel_pstate_tracer: make tracer as a module
  cpufreq: amd-pstate: Add more tracepoint for AMD P-State module
  PM: sleep: Add device name to suspend_report_result()
  turbostat: fix PC6 displaying on some systems
  intel_idle: add core C6 optimization for SPR
  intel_idle: add 'preferred_cstates' module argument
  intel_idle: add SPR support
  PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
  ACPI: processor idle: Check for architectural support for LPI
  cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
  ...
This commit is contained in:
Linus Torvalds 2022-03-21 14:26:28 -07:00
commit 02b82b02c3
63 changed files with 1878 additions and 408 deletions

View file

@ -369,6 +369,32 @@ governor (for the policies it is attached to), or by the ``CPUFreq`` core (for t
policies with other scaling governors).
Tracer Tool
-------------
``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
generate performance plots. This utility can be used to debug and tune the
performance of ``amd-pstate`` driver. The tracer tool needs to import intel
pstate tracer.
Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
used in two ways. If trace file is available, then directly parse the file
with command ::
./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
Or generate trace file with root privilege, then parse and plot with command ::
sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
The test result can be found in ``results/test_name``. Following is the example
about part of the output. ::
common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm
CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40
CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264
Reference
===========

View file

@ -0,0 +1,60 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
==============================
Intel Uncore Frequency Scaling
==============================
:Copyright: |copy| 2022 Intel Corporation
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Introduction
------------
The uncore can consume significant amount of power in Intel's Xeon servers based
on the workload characteristics. To optimize the total power and improve overall
performance, SoCs have internal algorithms for scaling uncore frequency. These
algorithms monitor workload usage of uncore and set a desirable frequency.
It is possible that users have different expectations of uncore performance and
want to have control over it. The objective is similar to allowing users to set
the scaling min/max frequencies via cpufreq sysfs to improve CPU performance.
Users may have some latency sensitive workloads where they do not want any
change to uncore frequency. Also, users may have workloads which require
different core and uncore performance at distinct phases and they may want to
use both cpufreq and the uncore scaling interface to distribute power and
improve overall performance.
Sysfs Interface
---------------
To control uncore frequency, a sysfs interface is provided in the directory:
`/sys/devices/system/cpu/intel_uncore_frequency/`.
There is one directory for each package and die combination as the scope of
uncore scaling control is per die in multiple die/package SoCs or per
package for single die per package SoCs. The name represents the
scope of control. For example: 'package_00_die_00' is for package id 0 and
die 0.
Each package_*_die_* contains the following attributes:
``initial_max_freq_khz``
Out of reset, this attribute represent the maximum possible frequency.
This is a read-only attribute. If users adjust max_freq_khz,
they can always go back to maximum using the value from this attribute.
``initial_min_freq_khz``
Out of reset, this attribute represent the minimum possible frequency.
This is a read-only attribute. If users adjust min_freq_khz,
they can always go back to minimum using the value from this attribute.
``max_freq_khz``
This attribute is used to set the maximum uncore frequency.
``min_freq_khz``
This attribute is used to set the minimum uncore frequency.
``current_freq_khz``
This attribute is used to get the current uncore frequency.

View file

@ -15,3 +15,4 @@ Working-State Power Management
cpufreq_drivers
intel_epb
intel-speed-select
intel_uncore_frequency_scaling

View file

@ -1002,6 +1002,7 @@ L: linux-pm@vger.kernel.org
S: Supported
F: Documentation/admin-guide/pm/amd-pstate.rst
F: drivers/cpufreq/amd-pstate*
F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
AMD PTDMA DRIVER
M: Sanjay R Mehta <sanju.mehta@amd.com>

View file

@ -54,6 +54,9 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
struct acpi_lpi_state *lpi;
struct acpi_processor *pr = per_cpu(processors, cpu);
if (unlikely(!pr || !pr->flags.has_lpi))
return -EINVAL;
/*
* If the PSCI cpu_suspend function hook has not been initialized
* idle states must not be enabled, so bail out
@ -61,9 +64,6 @@ static int psci_acpi_cpu_init_idle(unsigned int cpu)
if (!psci_ops.cpu_suspend)
return -EOPNOTSUPP;
if (unlikely(!pr || !pr->flags.has_lpi))
return -EINVAL;
count = pr->power.count - 1;
if (count <= 0)
return -ENODEV;

View file

@ -15,6 +15,7 @@
#include <asm/desc.h>
#include <asm/cacheflush.h>
#include <asm/realmode.h>
#include <asm/hypervisor.h>
#include <linux/ftrace.h>
#include "../../realmode/rm/wakeup.h"
@ -140,9 +141,9 @@ static int __init acpi_sleep_setup(char *str)
acpi_realmode_flags |= 4;
#ifdef CONFIG_HIBERNATION
if (strncmp(str, "s4_hwsig", 8) == 0)
acpi_check_s4_hw_signature(1);
acpi_check_s4_hw_signature = 1;
if (strncmp(str, "s4_nohwsig", 10) == 0)
acpi_check_s4_hw_signature(0);
acpi_check_s4_hw_signature = 0;
#endif
if (strncmp(str, "nonvs", 5) == 0)
acpi_nvs_nosave();
@ -160,3 +161,21 @@ static int __init acpi_sleep_setup(char *str)
}
__setup("acpi_sleep=", acpi_sleep_setup);
#if defined(CONFIG_HIBERNATION) && defined(CONFIG_HYPERVISOR_GUEST)
static int __init init_s4_sigcheck(void)
{
/*
* If running on a hypervisor, honour the ACPI specification
* by default and trigger a clean reboot when the hardware
* signature in FACS is changed after hibernation.
*/
if (acpi_check_s4_hw_signature == -1 &&
!hypervisor_is_type(X86_HYPER_NATIVE))
acpi_check_s4_hw_signature = 1;
return 0;
}
/* This must happen before acpi_init() which is a subsys initcall */
arch_initcall(init_s4_sigcheck);
#endif

View file

@ -1080,6 +1080,11 @@ static int flatten_lpi_states(struct acpi_processor *pr,
return 0;
}
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
return -EOPNOTSUPP;
}
static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
{
int ret, i;
@ -1088,6 +1093,11 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
struct acpi_device *d = NULL;
struct acpi_lpi_states_array info[2], *tmp, *prev, *curr;
/* make sure our architecture has support */
ret = acpi_processor_ffh_lpi_probe(pr->id);
if (ret == -EOPNOTSUPP)
return ret;
if (!osc_pc_lpi_support_confirmed)
return -EOPNOTSUPP;
@ -1139,11 +1149,6 @@ static int acpi_processor_get_lpi_info(struct acpi_processor *pr)
return 0;
}
int __weak acpi_processor_ffh_lpi_probe(unsigned int cpu)
{
return -ENODEV;
}
int __weak acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
{
return -ENODEV;

View file

@ -871,12 +871,7 @@ static inline void acpi_sleep_syscore_init(void) {}
#ifdef CONFIG_HIBERNATION
static unsigned long s4_hardware_signature;
static struct acpi_table_facs *facs;
static int sigcheck = -1; /* Default behaviour is just to warn */
void __init acpi_check_s4_hw_signature(int check)
{
sigcheck = check;
}
int acpi_check_s4_hw_signature = -1; /* Default behaviour is just to warn */
static int acpi_hibernation_begin(pm_message_t stage)
{
@ -1001,7 +996,7 @@ static void acpi_sleep_hibernate_setup(void)
hibernation_set_ops(old_suspend_ordering ?
&acpi_hibernation_ops_old : &acpi_hibernation_ops);
sleep_states[ACPI_STATE_S4] = 1;
if (!sigcheck)
if (!acpi_check_s4_hw_signature)
return;
acpi_get_table(ACPI_SIG_FACS, 1, (struct acpi_table_header **)&facs);
@ -1013,7 +1008,7 @@ static void acpi_sleep_hibernate_setup(void)
*/
s4_hardware_signature = facs->hardware_signature;
if (sigcheck > 0) {
if (acpi_check_s4_hw_signature > 0) {
/*
* If we're actually obeying the ACPI specification
* then the signature is written out as part of the

View file

@ -636,6 +636,18 @@ static int genpd_power_off(struct generic_pm_domain *genpd, bool one_dev_on,
atomic_read(&genpd->sd_count) > 0)
return -EBUSY;
/*
* The children must be in their deepest (powered-off) states to allow
* the parent to be powered off. Note that, there's no need for
* additional locking, as powering on a child, requires the parent's
* lock to be acquired first.
*/
list_for_each_entry(link, &genpd->parent_links, parent_node) {
struct generic_pm_domain *child = link->child;
if (child->state_idx < child->state_count - 1)
return -EBUSY;
}
list_for_each_entry(pdd, &genpd->dev_list, list_node) {
enum pm_qos_flags_status stat;
@ -1073,6 +1085,13 @@ static void genpd_sync_power_off(struct generic_pm_domain *genpd, bool use_lock,
|| atomic_read(&genpd->sd_count) > 0)
return;
/* Check that the children are in their deepest (powered-off) state. */
list_for_each_entry(link, &genpd->parent_links, parent_node) {
struct generic_pm_domain *child = link->child;
if (child->state_idx < child->state_count - 1)
return;
}
/* Choose the deepest state when suspending */
genpd->state_idx = genpd->state_count - 1;
if (_genpd_power_off(genpd, false))
@ -2058,9 +2077,9 @@ static int genpd_remove(struct generic_pm_domain *genpd)
kfree(link);
}
genpd_debug_remove(genpd);
list_del(&genpd->gpd_list_node);
genpd_unlock(genpd);
genpd_debug_remove(genpd);
cancel_work_sync(&genpd->power_off_work);
if (genpd_is_cpu_domain(genpd))
free_cpumask_var(genpd->cpus);
@ -2248,12 +2267,8 @@ int of_genpd_add_provider_simple(struct device_node *np,
/* Parse genpd OPP table */
if (genpd->set_performance_state) {
ret = dev_pm_opp_of_add_table(&genpd->dev);
if (ret) {
if (ret != -EPROBE_DEFER)
dev_err(&genpd->dev, "Failed to add OPP table: %d\n",
ret);
return ret;
}
if (ret)
return dev_err_probe(&genpd->dev, ret, "Failed to add OPP table\n");
/*
* Save table for faster processing while setting performance
@ -2312,9 +2327,8 @@ int of_genpd_add_provider_onecell(struct device_node *np,
if (genpd->set_performance_state) {
ret = dev_pm_opp_of_add_table_indexed(&genpd->dev, i);
if (ret) {
if (ret != -EPROBE_DEFER)
dev_err(&genpd->dev, "Failed to add OPP table for index %d: %d\n",
i, ret);
dev_err_probe(&genpd->dev, ret,
"Failed to add OPP table for index %d\n", i);
goto error;
}
@ -2672,12 +2686,8 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
ret = genpd_add_device(pd, dev, base_dev);
mutex_unlock(&gpd_list_lock);
if (ret < 0) {
if (ret != -EPROBE_DEFER)
dev_err(dev, "failed to add to PM domain %s: %d",
pd->name, ret);
return ret;
}
if (ret < 0)
return dev_err_probe(dev, ret, "failed to add to PM domain %s\n", pd->name);
dev->pm_domain->detach = genpd_dev_pm_detach;
dev->pm_domain->sync = genpd_dev_pm_sync;

View file

@ -485,7 +485,7 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev,
trace_device_pm_callback_start(dev, info, state.event);
error = cb(dev);
trace_device_pm_callback_end(dev, error);
suspend_report_result(cb, error);
suspend_report_result(dev, cb, error);
initcall_debug_report(dev, calltime, cb, error);
@ -1568,7 +1568,7 @@ static int legacy_suspend(struct device *dev, pm_message_t state,
trace_device_pm_callback_start(dev, info, state.event);
error = cb(dev, state);
trace_device_pm_callback_end(dev, error);
suspend_report_result(cb, error);
suspend_report_result(dev, cb, error);
initcall_debug_report(dev, calltime, cb, error);
@ -1855,7 +1855,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
device_unlock(dev);
if (ret < 0) {
suspend_report_result(callback, ret);
suspend_report_result(dev, callback, ret);
pm_runtime_put(dev);
return ret;
}
@ -1960,10 +1960,10 @@ int dpm_suspend_start(pm_message_t state)
}
EXPORT_SYMBOL_GPL(dpm_suspend_start);
void __suspend_report_result(const char *function, void *fn, int ret)
void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret)
{
if (ret)
pr_err("%s(): %pS returns %d\n", function, fn, ret);
dev_err(dev, "%s(): %pS returns %d\n", function, fn, ret);
}
EXPORT_SYMBOL_GPL(__suspend_report_result);
@ -2018,7 +2018,9 @@ static bool pm_ops_is_empty(const struct dev_pm_ops *ops)
void device_pm_check_callbacks(struct device *dev)
{
spin_lock_irq(&dev->power.lock);
unsigned long flags;
spin_lock_irqsave(&dev->power.lock, flags);
dev->power.no_pm_callbacks =
(!dev->bus || (pm_ops_is_empty(dev->bus->pm) &&
!dev->bus->suspend && !dev->bus->resume)) &&
@ -2027,7 +2029,7 @@ void device_pm_check_callbacks(struct device *dev)
(!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) &&
(!dev->driver || (pm_ops_is_empty(dev->driver->pm) &&
!dev->driver->suspend && !dev->driver->resume));
spin_unlock_irq(&dev->power.lock);
spin_unlock_irqrestore(&dev->power.lock, flags);
}
bool dev_pm_skip_suspend(struct device *dev)

View file

@ -1476,11 +1476,16 @@ EXPORT_SYMBOL_GPL(pm_runtime_enable);
static void pm_runtime_disable_action(void *data)
{
pm_runtime_dont_use_autosuspend(data);
pm_runtime_disable(data);
}
/**
* devm_pm_runtime_enable - devres-enabled version of pm_runtime_enable.
*
* NOTE: this will also handle calling pm_runtime_dont_use_autosuspend() for
* you at driver exit time if needed.
*
* @dev: Device to handle.
*/
int devm_pm_runtime_enable(struct device *dev)

View file

@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(dev_pm_disable_wake_irq);
*
* Enables wakeirq conditionally. We need to enable wake-up interrupt
* lazily on the first rpm_suspend(). This is needed as the consumer device
* starts in RPM_SUSPENDED state, and the the first pm_runtime_get() would
* starts in RPM_SUSPENDED state, and the first pm_runtime_get() would
* otherwise try to disable already disabled wakeirq. The wake-up interrupt
* starts disabled with IRQ_NOAUTOEN set.
*

View file

@ -587,7 +587,7 @@ static bool wakeup_source_not_registered(struct wakeup_source *ws)
* @ws: Wakeup source to handle.
*
* Update the @ws' statistics and, if @ws has just been activated, notify the PM
* core of the event by incrementing the counter of of wakeup events being
* core of the event by incrementing the counter of the wakeup events being
* processed.
*/
static void wakeup_source_activate(struct wakeup_source *ws)
@ -733,7 +733,7 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
/*
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
* counter of wakeup events in progress simultaneously.
*/
cec = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
trace_wakeup_source_deactivate(ws->name, cec);

View file

@ -27,6 +27,10 @@ TRACE_EVENT(amd_pstate_perf,
TP_PROTO(unsigned long min_perf,
unsigned long target_perf,
unsigned long capacity,
u64 freq,
u64 mperf,
u64 aperf,
u64 tsc,
unsigned int cpu_id,
bool changed,
bool fast_switch
@ -35,6 +39,10 @@ TRACE_EVENT(amd_pstate_perf,
TP_ARGS(min_perf,
target_perf,
capacity,
freq,
mperf,
aperf,
tsc,
cpu_id,
changed,
fast_switch
@ -44,6 +52,10 @@ TRACE_EVENT(amd_pstate_perf,
__field(unsigned long, min_perf)
__field(unsigned long, target_perf)
__field(unsigned long, capacity)
__field(unsigned long long, freq)
__field(unsigned long long, mperf)
__field(unsigned long long, aperf)
__field(unsigned long long, tsc)
__field(unsigned int, cpu_id)
__field(bool, changed)
__field(bool, fast_switch)
@ -53,15 +65,23 @@ TRACE_EVENT(amd_pstate_perf,
__entry->min_perf = min_perf;
__entry->target_perf = target_perf;
__entry->capacity = capacity;
__entry->freq = freq;
__entry->mperf = mperf;
__entry->aperf = aperf;
__entry->tsc = tsc;
__entry->cpu_id = cpu_id;
__entry->changed = changed;
__entry->fast_switch = fast_switch;
),
TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s",
TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu freq=%llu mperf=%llu aperf=%llu tsc=%llu cpu_id=%u changed=%s fast_switch=%s",
(unsigned long)__entry->min_perf,
(unsigned long)__entry->target_perf,
(unsigned long)__entry->capacity,
(unsigned long long)__entry->freq,
(unsigned long long)__entry->mperf,
(unsigned long long)__entry->aperf,
(unsigned long long)__entry->tsc,
(unsigned int)__entry->cpu_id,
(__entry->changed) ? "true" : "false",
(__entry->fast_switch) ? "true" : "false"

View file

@ -65,6 +65,18 @@ MODULE_PARM_DESC(shared_mem,
static struct cpufreq_driver amd_pstate_driver;
/**
* struct amd_aperf_mperf
* @aperf: actual performance frequency clock count
* @mperf: maximum performance frequency clock count
* @tsc: time stamp counter
*/
struct amd_aperf_mperf {
u64 aperf;
u64 mperf;
u64 tsc;
};
/**
* struct amd_cpudata - private CPU data for AMD P-State
* @cpu: CPU number
@ -81,6 +93,9 @@ static struct cpufreq_driver amd_pstate_driver;
* @min_freq: the frequency that mapped to lowest_perf
* @nominal_freq: the frequency that mapped to nominal_perf
* @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf
* @cur: Difference of Aperf/Mperf/tsc count between last and current sample
* @prev: Last Aperf/Mperf/tsc count value read from register
* @freq: current cpu frequency value
* @boost_supported: check whether the Processor or SBIOS supports boost mode
*
* The amd_cpudata is key private data for each CPU thread in AMD P-State, and
@ -102,6 +117,10 @@ struct amd_cpudata {
u32 nominal_freq;
u32 lowest_nonlinear_freq;
struct amd_aperf_mperf cur;
struct amd_aperf_mperf prev;
u64 freq;
bool boost_supported;
};
@ -211,6 +230,39 @@ static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
max_perf, fast_switch);
}
static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
{
u64 aperf, mperf, tsc;
unsigned long flags;
local_irq_save(flags);
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
tsc = rdtsc();
if (cpudata->prev.mperf == mperf || cpudata->prev.tsc == tsc) {
local_irq_restore(flags);
return false;
}
local_irq_restore(flags);
cpudata->cur.aperf = aperf;
cpudata->cur.mperf = mperf;
cpudata->cur.tsc = tsc;
cpudata->cur.aperf -= cpudata->prev.aperf;
cpudata->cur.mperf -= cpudata->prev.mperf;
cpudata->cur.tsc -= cpudata->prev.tsc;
cpudata->prev.aperf = aperf;
cpudata->prev.mperf = mperf;
cpudata->prev.tsc = tsc;
cpudata->freq = div64_u64((cpudata->cur.aperf * cpu_khz), cpudata->cur.mperf);
return true;
}
static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
u32 des_perf, u32 max_perf, bool fast_switch)
{
@ -226,8 +278,11 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf,
value &= ~AMD_CPPC_MAX_PERF(~0L);
value |= AMD_CPPC_MAX_PERF(max_perf);
trace_amd_pstate_perf(min_perf, des_perf, max_perf,
cpudata->cpu, (value != prev), fast_switch);
if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) {
trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
cpudata->cur.mperf, cpudata->cur.aperf, cpudata->cur.tsc,
cpudata->cpu, (value != prev), fast_switch);
}
if (value == prev)
return;

View file

@ -146,7 +146,7 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
/************************** sysfs interface ************************/
static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -161,7 +161,7 @@ static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
static ssize_t up_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -177,7 +177,7 @@ static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_down_threshold(struct gov_attr_set *attr_set,
static ssize_t down_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -195,7 +195,7 @@ static ssize_t store_down_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -220,7 +220,7 @@ static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_freq_step(struct gov_attr_set *attr_set, const char *buf,
static ssize_t freq_step_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);

View file

@ -27,7 +27,7 @@ static DEFINE_MUTEX(gov_dbs_data_mutex);
/* Common sysfs tunables */
/*
* store_sampling_rate - update sampling rate effective immediately if needed.
* sampling_rate_store - update sampling rate effective immediately if needed.
*
* If new rate is smaller than the old, simply updating
* dbs.sampling_rate might not be appropriate. For example, if the
@ -41,7 +41,7 @@ static DEFINE_MUTEX(gov_dbs_data_mutex);
* This must be called with dbs_data->mutex held, otherwise traversing
* policy_dbs_list isn't safe.
*/
ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -80,7 +80,7 @@ ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
return count;
}
EXPORT_SYMBOL_GPL(store_sampling_rate);
EXPORT_SYMBOL_GPL(sampling_rate_store);
/**
* gov_update_cpu_data - Update CPU load data.

View file

@ -51,7 +51,7 @@ static inline struct dbs_data *to_dbs_data(struct gov_attr_set *attr_set)
}
#define gov_show_one(_gov, file_name) \
static ssize_t show_##file_name \
static ssize_t file_name##_show \
(struct gov_attr_set *attr_set, char *buf) \
{ \
struct dbs_data *dbs_data = to_dbs_data(attr_set); \
@ -60,7 +60,7 @@ static ssize_t show_##file_name \
}
#define gov_show_one_common(file_name) \
static ssize_t show_##file_name \
static ssize_t file_name##_show \
(struct gov_attr_set *attr_set, char *buf) \
{ \
struct dbs_data *dbs_data = to_dbs_data(attr_set); \
@ -68,12 +68,10 @@ static ssize_t show_##file_name \
}
#define gov_attr_ro(_name) \
static struct governor_attr _name = \
__ATTR(_name, 0444, show_##_name, NULL)
static struct governor_attr _name = __ATTR_RO(_name)
#define gov_attr_rw(_name) \
static struct governor_attr _name = \
__ATTR(_name, 0644, show_##_name, store_##_name)
static struct governor_attr _name = __ATTR_RW(_name)
/* Common to all CPUs of a policy */
struct policy_dbs_info {
@ -176,7 +174,7 @@ void od_register_powersave_bias_handler(unsigned int (*f)
(struct cpufreq_policy *, unsigned int, unsigned int),
unsigned int powersave_bias);
void od_unregister_powersave_bias_handler(void);
ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
ssize_t sampling_rate_store(struct gov_attr_set *attr_set, const char *buf,
size_t count);
void gov_update_cpu_data(struct dbs_data *dbs_data);
#endif /* _CPUFREQ_GOVERNOR_H */

View file

@ -8,11 +8,6 @@
#include "cpufreq_governor.h"
static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj)
{
return container_of(kobj, struct gov_attr_set, kobj);
}
static inline struct governor_attr *to_gov_attr(struct attribute *attr)
{
return container_of(attr, struct governor_attr, attr);

View file

@ -202,7 +202,7 @@ static unsigned int od_dbs_update(struct cpufreq_policy *policy)
/************************** sysfs interface ************************/
static struct dbs_governor od_dbs_gov;
static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf,
static ssize_t io_is_busy_store(struct gov_attr_set *attr_set, const char *buf,
size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -220,7 +220,7 @@ static ssize_t store_io_is_busy(struct gov_attr_set *attr_set, const char *buf,
return count;
}
static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
static ssize_t up_threshold_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -237,7 +237,7 @@ static ssize_t store_up_threshold(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
static ssize_t sampling_down_factor_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -265,7 +265,7 @@ static ssize_t store_sampling_down_factor(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
static ssize_t ignore_nice_load_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
@ -290,7 +290,7 @@ static ssize_t store_ignore_nice_load(struct gov_attr_set *attr_set,
return count;
}
static ssize_t store_powersave_bias(struct gov_attr_set *attr_set,
static ssize_t powersave_bias_store(struct gov_attr_set *attr_set,
const char *buf, size_t count)
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);

View file

@ -1692,6 +1692,37 @@ static void intel_pstate_enable_hwp_interrupt(struct cpudata *cpudata)
}
}
static void intel_pstate_update_epp_defaults(struct cpudata *cpudata)
{
cpudata->epp_default = intel_pstate_get_epp(cpudata, 0);
/*
* If this CPU gen doesn't call for change in balance_perf
* EPP return.
*/
if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE)
return;
/*
* If powerup EPP is something other than chipset default 0x80 and
* - is more performance oriented than 0x80 (default balance_perf EPP)
* - But less performance oriented than performance EPP
* then use this as new balance_perf EPP.
*/
if (cpudata->epp_default < HWP_EPP_BALANCE_PERFORMANCE &&
cpudata->epp_default > HWP_EPP_PERFORMANCE) {
epp_values[EPP_INDEX_BALANCE_PERFORMANCE] = cpudata->epp_default;
return;
}
/*
* Use hard coded value per gen to update the balance_perf
* and default EPP.
*/
cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE];
intel_pstate_set_epp(cpudata, cpudata->epp_default);
}
static void intel_pstate_hwp_enable(struct cpudata *cpudata)
{
/* First disable HWP notification interrupt till we activate again */
@ -1705,12 +1736,7 @@ static void intel_pstate_hwp_enable(struct cpudata *cpudata)
if (cpudata->epp_default >= 0)
return;
if (epp_values[EPP_INDEX_BALANCE_PERFORMANCE] == HWP_EPP_BALANCE_PERFORMANCE) {
cpudata->epp_default = intel_pstate_get_epp(cpudata, 0);
} else {
cpudata->epp_default = epp_values[EPP_INDEX_BALANCE_PERFORMANCE];
intel_pstate_set_epp(cpudata, cpudata->epp_default);
}
intel_pstate_update_epp_defaults(cpudata);
}
static int atom_get_min_pstate(void)

View file

@ -668,9 +668,9 @@ static acpi_status longhaul_walk_callback(acpi_handle obj_handle,
u32 nesting_level,
void *context, void **return_value)
{
struct acpi_device *d;
struct acpi_device *d = acpi_fetch_acpi_dev(obj_handle);
if (acpi_bus_get_device(obj_handle, &d))
if (!d)
return 0;
*return_value = acpi_driver_data(d);

View file

@ -1172,14 +1172,14 @@ static int powernowk8_init(void)
unsigned int i, supported_cpus = 0;
int ret;
if (!x86_match_cpu(powernow_k8_ids))
return -ENODEV;
if (boot_cpu_has(X86_FEATURE_HW_PSTATE)) {
__request_acpi_cpufreq();
return -ENODEV;
}
if (!x86_match_cpu(powernow_k8_ids))
return -ENODEV;
cpus_read_lock();
for_each_online_cpu(i) {
smp_call_function_single(i, check_supported_cpu, &ret, 1);

View file

@ -108,11 +108,11 @@ static int __init haltpoll_init(void)
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
return -ENODEV;
cpuidle_poll_state_init(drv);
if (!kvm_para_available() || !haltpoll_want())
return -ENODEV;
cpuidle_poll_state_init(drv);
ret = cpuidle_register_driver(drv);
if (ret < 0)
return ret;

View file

@ -64,6 +64,7 @@ static struct cpuidle_driver intel_idle_driver = {
/* intel_idle.max_cstate=0 disables driver */
static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask;
static unsigned int preferred_states_mask;
static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
@ -121,9 +122,6 @@ static unsigned int mwait_substates __initdata;
* If the local APIC timer is not known to be reliable in the target idle state,
* enable one-shot tick broadcasting for the target CPU before executing MWAIT.
*
* Optionally call leave_mm() for the target CPU upfront to avoid wakeups due to
* flushing user TLBs.
*
* Must be called under local_irq_disable().
*/
static __cpuidle int intel_idle(struct cpuidle_device *dev,
@ -761,6 +759,46 @@ static struct cpuidle_state icx_cstates[] __initdata = {
.enter = NULL }
};
/*
* On Sapphire Rapids Xeon C1 has to be disabled if C1E is enabled, and vice
* versa. On SPR C1E is enabled only if "C1E promotion" bit is set in
* MSR_IA32_POWER_CTL. But in this case there effectively no C1, because C1
* requests are promoted to C1E. If the "C1E promotion" bit is cleared, then
* both C1 and C1E requests end up with C1, so there is effectively no C1E.
*
* By default we enable C1 and disable C1E by marking it with
* 'CPUIDLE_FLAG_UNUSABLE'.
*/
static struct cpuidle_state spr_cstates[] __initdata = {
{
.name = "C1",
.desc = "MWAIT 0x00",
.flags = MWAIT2flg(0x00),
.exit_latency = 1,
.target_residency = 1,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C1E",
.desc = "MWAIT 0x01",
.flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE |
CPUIDLE_FLAG_UNUSABLE,
.exit_latency = 2,
.target_residency = 4,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.name = "C6",
.desc = "MWAIT 0x20",
.flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 290,
.target_residency = 800,
.enter = &intel_idle,
.enter_s2idle = intel_idle_s2idle, },
{
.enter = NULL }
};
static struct cpuidle_state atom_cstates[] __initdata = {
{
.name = "C1E",
@ -1104,6 +1142,12 @@ static const struct idle_cpu idle_cpu_icx __initconst = {
.use_acpi = true,
};
static const struct idle_cpu idle_cpu_spr __initconst = {
.state_table = spr_cstates,
.disable_promotion_to_c1e = true,
.use_acpi = true,
};
static const struct idle_cpu idle_cpu_avn __initconst = {
.state_table = avn_cstates,
.disable_promotion_to_c1e = true,
@ -1166,6 +1210,7 @@ static const struct x86_cpu_id intel_idle_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &idle_cpu_skx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &idle_cpu_icx),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &idle_cpu_spr),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &idle_cpu_knl),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &idle_cpu_bxt),
@ -1353,6 +1398,8 @@ static inline void intel_idle_init_cstates_acpi(struct cpuidle_driver *drv) { }
static inline bool intel_idle_off_by_default(u32 mwait_hint) { return false; }
#endif /* !CONFIG_ACPI_PROCESSOR_CSTATE */
static void c1e_promotion_enable(void);
/**
* ivt_idle_state_table_update - Tune the idle states table for Ivy Town.
*
@ -1523,6 +1570,41 @@ static void __init skx_idle_state_table_update(void)
}
}
/**
* spr_idle_state_table_update - Adjust Sapphire Rapids idle states table.
*/
static void __init spr_idle_state_table_update(void)
{
unsigned long long msr;
/* Check if user prefers C1E over C1. */
if (preferred_states_mask & BIT(2)) {
if (preferred_states_mask & BIT(1))
/* Both can't be enabled, stick to the defaults. */
return;
spr_cstates[0].flags |= CPUIDLE_FLAG_UNUSABLE;
spr_cstates[1].flags &= ~CPUIDLE_FLAG_UNUSABLE;
/* Enable C1E using the "C1E promotion" bit. */
c1e_promotion_enable();
disable_promotion_to_c1e = false;
}
/*
* By default, the C6 state assumes the worst-case scenario of package
* C6. However, if PC6 is disabled, we update the numbers to match
* core C6.
*/
rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
/* Limit value 2 and above allow for PC6. */
if ((msr & 0x7) < 2) {
spr_cstates[2].exit_latency = 190;
spr_cstates[2].target_residency = 600;
}
}
static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
{
unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
@ -1557,6 +1639,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SKYLAKE_X:
skx_idle_state_table_update();
break;
case INTEL_FAM6_SAPPHIRERAPIDS_X:
spr_idle_state_table_update();
break;
}
for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
@ -1629,6 +1714,15 @@ static void auto_demotion_disable(void)
wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
}
static void c1e_promotion_enable(void)
{
unsigned long long msr_bits;
rdmsrl(MSR_IA32_POWER_CTL, msr_bits);
msr_bits |= 0x2;
wrmsrl(MSR_IA32_POWER_CTL, msr_bits);
}
static void c1e_promotion_disable(void)
{
unsigned long long msr_bits;
@ -1798,3 +1892,14 @@ module_param(max_cstate, int, 0444);
*/
module_param_named(states_off, disabled_states_mask, uint, 0444);
MODULE_PARM_DESC(states_off, "Mask of disabled idle states");
/*
* Some platforms come with mutually exclusive C-states, so that if one is
* enabled, the other C-states must not be used. Example: C1 and C1E on
* Sapphire Rapids platform. This parameter allows for selecting the
* preferred C-states among the groups of mutually exclusive C-states - the
* selected C-states will be registered, the other C-states from the mutually
* exclusive group won't be registered. If the platform has no mutually
* exclusive C-states, this parameter has no effect.
*/
module_param_named(preferred_cstates, preferred_states_mask, uint, 0444);
MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");

View file

@ -596,7 +596,7 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state)
int error;
error = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, error);
suspend_report_result(dev, drv->suspend, error);
if (error)
return error;
@ -775,7 +775,7 @@ static int pci_pm_suspend(struct device *dev)
int error;
error = pm->suspend(dev);
suspend_report_result(pm->suspend, error);
suspend_report_result(dev, pm->suspend, error);
if (error)
return error;
@ -821,7 +821,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
int error;
error = pm->suspend_noirq(dev);
suspend_report_result(pm->suspend_noirq, error);
suspend_report_result(dev, pm->suspend_noirq, error);
if (error)
return error;
@ -1010,7 +1010,7 @@ static int pci_pm_freeze(struct device *dev)
int error;
error = pm->freeze(dev);
suspend_report_result(pm->freeze, error);
suspend_report_result(dev, pm->freeze, error);
if (error)
return error;
}
@ -1030,7 +1030,7 @@ static int pci_pm_freeze_noirq(struct device *dev)
int error;
error = pm->freeze_noirq(dev);
suspend_report_result(pm->freeze_noirq, error);
suspend_report_result(dev, pm->freeze_noirq, error);
if (error)
return error;
}
@ -1116,7 +1116,7 @@ static int pci_pm_poweroff(struct device *dev)
int error;
error = pm->poweroff(dev);
suspend_report_result(pm->poweroff, error);
suspend_report_result(dev, pm->poweroff, error);
if (error)
return error;
}
@ -1154,7 +1154,7 @@ static int pci_pm_poweroff_noirq(struct device *dev)
int error;
error = pm->poweroff_noirq(dev);
suspend_report_result(pm->poweroff_noirq, error);
suspend_report_result(dev, pm->poweroff_noirq, error);
if (error)
return error;
}

View file

@ -171,7 +171,7 @@ static int __pnp_bus_suspend(struct device *dev, pm_message_t state)
if (pnp_drv->driver.pm && pnp_drv->driver.pm->suspend) {
error = pnp_drv->driver.pm->suspend(dev);
suspend_report_result(pnp_drv->driver.pm->suspend, error);
suspend_report_result(dev, pnp_drv->driver.pm->suspend, error);
if (error)
return error;
}

View file

@ -46,6 +46,7 @@ config IDLE_INJECT
config DTPM
bool "Power capping for Dynamic Thermal Power Management (EXPERIMENTAL)"
depends on OF
help
This enables support for the power capping for the dynamic
thermal power management userspace engine.
@ -56,4 +57,11 @@ config DTPM_CPU
help
This enables support for CPU power limitation based on
energy model.
config DTPM_DEVFREQ
bool "Add device power capping based on the energy model"
depends on DTPM && ENERGY_MODEL
help
This enables support for device power limitation based on
energy model.
endif

View file

@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_DTPM) += dtpm.o
obj-$(CONFIG_DTPM_CPU) += dtpm_cpu.o
obj-$(CONFIG_DTPM_DEVFREQ) += dtpm_devfreq.o
obj-$(CONFIG_POWERCAP) += powercap_sys.o
obj-$(CONFIG_INTEL_RAPL_CORE) += intel_rapl_common.o
obj-$(CONFIG_INTEL_RAPL) += intel_rapl_msr.o

View file

@ -23,6 +23,9 @@
#include <linux/powercap.h>
#include <linux/slab.h>
#include <linux/mutex.h>
#include <linux/of.h>
#include "dtpm_subsys.h"
#define DTPM_POWER_LIMIT_FLAG 0
@ -48,9 +51,7 @@ static int get_max_power_range_uw(struct powercap_zone *pcz, u64 *max_power_uw)
{
struct dtpm *dtpm = to_dtpm(pcz);
mutex_lock(&dtpm_lock);
*max_power_uw = dtpm->power_max - dtpm->power_min;
mutex_unlock(&dtpm_lock);
return 0;
}
@ -80,14 +81,7 @@ static int __get_power_uw(struct dtpm *dtpm, u64 *power_uw)
static int get_power_uw(struct powercap_zone *pcz, u64 *power_uw)
{
struct dtpm *dtpm = to_dtpm(pcz);
int ret;
mutex_lock(&dtpm_lock);
ret = __get_power_uw(dtpm, power_uw);
mutex_unlock(&dtpm_lock);
return ret;
return __get_power_uw(to_dtpm(pcz), power_uw);
}
static void __dtpm_rebalance_weight(struct dtpm *dtpm)
@ -130,7 +124,16 @@ static void __dtpm_add_power(struct dtpm *dtpm)
}
}
static int __dtpm_update_power(struct dtpm *dtpm)
/**
* dtpm_update_power - Update the power on the dtpm
* @dtpm: a pointer to a dtpm structure to update
*
* Function to update the power values of the dtpm node specified in
* parameter. These new values will be propagated to the tree.
*
* Return: zero on success, -EINVAL if the values are inconsistent
*/
int dtpm_update_power(struct dtpm *dtpm)
{
int ret;
@ -152,26 +155,6 @@ static int __dtpm_update_power(struct dtpm *dtpm)
return ret;
}
/**
* dtpm_update_power - Update the power on the dtpm
* @dtpm: a pointer to a dtpm structure to update
*
* Function to update the power values of the dtpm node specified in
* parameter. These new values will be propagated to the tree.
*
* Return: zero on success, -EINVAL if the values are inconsistent
*/
int dtpm_update_power(struct dtpm *dtpm)
{
int ret;
mutex_lock(&dtpm_lock);
ret = __dtpm_update_power(dtpm);
mutex_unlock(&dtpm_lock);
return ret;
}
/**
* dtpm_release_zone - Cleanup when the node is released
* @pcz: a pointer to a powercap_zone structure
@ -188,48 +171,28 @@ int dtpm_release_zone(struct powercap_zone *pcz)
struct dtpm *dtpm = to_dtpm(pcz);
struct dtpm *parent = dtpm->parent;
mutex_lock(&dtpm_lock);
if (!list_empty(&dtpm->children)) {
mutex_unlock(&dtpm_lock);
if (!list_empty(&dtpm->children))
return -EBUSY;
}
if (parent)
list_del(&dtpm->sibling);
__dtpm_sub_power(dtpm);
mutex_unlock(&dtpm_lock);
if (dtpm->ops)
dtpm->ops->release(dtpm);
else
kfree(dtpm);
if (root == dtpm)
root = NULL;
kfree(dtpm);
return 0;
}
static int __get_power_limit_uw(struct dtpm *dtpm, int cid, u64 *power_limit)
{
*power_limit = dtpm->power_limit;
return 0;
}
static int get_power_limit_uw(struct powercap_zone *pcz,
int cid, u64 *power_limit)
{
struct dtpm *dtpm = to_dtpm(pcz);
int ret;
mutex_lock(&dtpm_lock);
ret = __get_power_limit_uw(dtpm, cid, power_limit);
mutex_unlock(&dtpm_lock);
return ret;
*power_limit = to_dtpm(pcz)->power_limit;
return 0;
}
/*
@ -289,7 +252,7 @@ static int __set_power_limit_uw(struct dtpm *dtpm, int cid, u64 power_limit)
ret = __set_power_limit_uw(child, cid, power);
if (!ret)
ret = __get_power_limit_uw(child, cid, &power);
ret = get_power_limit_uw(&child->zone, cid, &power);
if (ret)
break;
@ -307,8 +270,6 @@ static int set_power_limit_uw(struct powercap_zone *pcz,
struct dtpm *dtpm = to_dtpm(pcz);
int ret;
mutex_lock(&dtpm_lock);
/*
* Don't allow values outside of the power range previously
* set when initializing the power numbers.
@ -320,8 +281,6 @@ static int set_power_limit_uw(struct powercap_zone *pcz,
pr_debug("%s: power limit: %llu uW, power max: %llu uW\n",
dtpm->zone.name, dtpm->power_limit, dtpm->power_max);
mutex_unlock(&dtpm_lock);
return ret;
}
@ -332,11 +291,7 @@ static const char *get_constraint_name(struct powercap_zone *pcz, int cid)
static int get_max_power_uw(struct powercap_zone *pcz, int id, u64 *max_power)
{
struct dtpm *dtpm = to_dtpm(pcz);
mutex_lock(&dtpm_lock);
*max_power = dtpm->power_max;
mutex_unlock(&dtpm_lock);
*max_power = to_dtpm(pcz)->power_max;
return 0;
}
@ -439,8 +394,6 @@ int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent)
if (IS_ERR(pcz))
return PTR_ERR(pcz);
mutex_lock(&dtpm_lock);
if (parent) {
list_add_tail(&dtpm->sibling, &parent->children);
dtpm->parent = parent;
@ -456,19 +409,253 @@ int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent)
pr_debug("Registered dtpm node '%s' / %llu-%llu uW, \n",
dtpm->zone.name, dtpm->power_min, dtpm->power_max);
mutex_unlock(&dtpm_lock);
return 0;
}
static int __init init_dtpm(void)
static struct dtpm *dtpm_setup_virtual(const struct dtpm_node *hierarchy,
struct dtpm *parent)
{
pct = powercap_register_control_type(NULL, "dtpm", NULL);
if (IS_ERR(pct)) {
pr_err("Failed to register control type\n");
return PTR_ERR(pct);
struct dtpm *dtpm;
int ret;
dtpm = kzalloc(sizeof(*dtpm), GFP_KERNEL);
if (!dtpm)
return ERR_PTR(-ENOMEM);
dtpm_init(dtpm, NULL);
ret = dtpm_register(hierarchy->name, dtpm, parent);
if (ret) {
pr_err("Failed to register dtpm node '%s': %d\n",
hierarchy->name, ret);
kfree(dtpm);
return ERR_PTR(ret);
}
return dtpm;
}
static struct dtpm *dtpm_setup_dt(const struct dtpm_node *hierarchy,
struct dtpm *parent)
{
struct device_node *np;
int i, ret;
np = of_find_node_by_path(hierarchy->name);
if (!np) {
pr_err("Failed to find '%s'\n", hierarchy->name);
return ERR_PTR(-ENXIO);
}
for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) {
if (!dtpm_subsys[i]->setup)
continue;
ret = dtpm_subsys[i]->setup(parent, np);
if (ret) {
pr_err("Failed to setup '%s': %d\n", dtpm_subsys[i]->name, ret);
of_node_put(np);
return ERR_PTR(ret);
}
}
of_node_put(np);
/*
* By returning a NULL pointer, we let know the caller there
* is no child for us as we are a leaf of the tree
*/
return NULL;
}
typedef struct dtpm * (*dtpm_node_callback_t)(const struct dtpm_node *, struct dtpm *);
static dtpm_node_callback_t dtpm_node_callback[] = {
[DTPM_NODE_VIRTUAL] = dtpm_setup_virtual,
[DTPM_NODE_DT] = dtpm_setup_dt,
};
static int dtpm_for_each_child(const struct dtpm_node *hierarchy,
const struct dtpm_node *it, struct dtpm *parent)
{
struct dtpm *dtpm;
int i, ret;
for (i = 0; hierarchy[i].name; i++) {
if (hierarchy[i].parent != it)
continue;
dtpm = dtpm_node_callback[hierarchy[i].type](&hierarchy[i], parent);
/*
* A NULL pointer means there is no children, hence we
* continue without going deeper in the recursivity.
*/
if (!dtpm)
continue;
/*
* There are multiple reasons why the callback could
* fail. The generic glue is abstracting the backend
* and therefore it is not possible to report back or
* take a decision based on the error. In any case,
* if this call fails, it is not critical in the
* hierarchy creation, we can assume the underlying
* service is not found, so we continue without this
* branch in the tree but with a warning to log the
* information the node was not created.
*/
if (IS_ERR(dtpm)) {
pr_warn("Failed to create '%s' in the hierarchy\n",
hierarchy[i].name);
continue;
}
ret = dtpm_for_each_child(hierarchy, &hierarchy[i], dtpm);
if (ret)
return ret;
}
return 0;
}
late_initcall(init_dtpm);
/**
* dtpm_create_hierarchy - Create the dtpm hierarchy
* @hierarchy: An array of struct dtpm_node describing the hierarchy
*
* The function is called by the platform specific code with the
* description of the different node in the hierarchy. It creates the
* tree in the sysfs filesystem under the powercap dtpm entry.
*
* The expected tree has the format:
*
* struct dtpm_node hierarchy[] = {
* [0] { .name = "topmost", type = DTPM_NODE_VIRTUAL },
* [1] { .name = "package", .type = DTPM_NODE_VIRTUAL, .parent = &hierarchy[0] },
* [2] { .name = "/cpus/cpu0", .type = DTPM_NODE_DT, .parent = &hierarchy[1] },
* [3] { .name = "/cpus/cpu1", .type = DTPM_NODE_DT, .parent = &hierarchy[1] },
* [4] { .name = "/cpus/cpu2", .type = DTPM_NODE_DT, .parent = &hierarchy[1] },
* [5] { .name = "/cpus/cpu3", .type = DTPM_NODE_DT, .parent = &hierarchy[1] },
* [6] { }
* };
*
* The last element is always an empty one and marks the end of the
* array.
*
* Return: zero on success, a negative value in case of error. Errors
* are reported back from the underlying functions.
*/
int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table)
{
const struct of_device_id *match;
const struct dtpm_node *hierarchy;
struct device_node *np;
int i, ret;
mutex_lock(&dtpm_lock);
if (pct) {
ret = -EBUSY;
goto out_unlock;
}
pct = powercap_register_control_type(NULL, "dtpm", NULL);
if (IS_ERR(pct)) {
pr_err("Failed to register control type\n");
ret = PTR_ERR(pct);
goto out_pct;
}
ret = -ENODEV;
np = of_find_node_by_path("/");
if (!np)
goto out_err;
match = of_match_node(dtpm_match_table, np);
of_node_put(np);
if (!match)
goto out_err;
hierarchy = match->data;
if (!hierarchy) {
ret = -EFAULT;
goto out_err;
}
ret = dtpm_for_each_child(hierarchy, NULL, NULL);
if (ret)
goto out_err;
for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) {
if (!dtpm_subsys[i]->init)
continue;
ret = dtpm_subsys[i]->init();
if (ret)
pr_info("Failed to initialize '%s': %d",
dtpm_subsys[i]->name, ret);
}
mutex_unlock(&dtpm_lock);
return 0;
out_err:
powercap_unregister_control_type(pct);
out_pct:
pct = NULL;
out_unlock:
mutex_unlock(&dtpm_lock);
return ret;
}
EXPORT_SYMBOL_GPL(dtpm_create_hierarchy);
static void __dtpm_destroy_hierarchy(struct dtpm *dtpm)
{
struct dtpm *child, *aux;
list_for_each_entry_safe(child, aux, &dtpm->children, sibling)
__dtpm_destroy_hierarchy(child);
/*
* At this point, we know all children were removed from the
* recursive call before
*/
dtpm_unregister(dtpm);
}
void dtpm_destroy_hierarchy(void)
{
int i;
mutex_lock(&dtpm_lock);
if (!pct)
goto out_unlock;
__dtpm_destroy_hierarchy(root);
for (i = 0; i < ARRAY_SIZE(dtpm_subsys); i++) {
if (!dtpm_subsys[i]->exit)
continue;
dtpm_subsys[i]->exit();
}
powercap_unregister_control_type(pct);
pct = NULL;
root = NULL;
out_unlock:
mutex_unlock(&dtpm_lock);
}
EXPORT_SYMBOL_GPL(dtpm_destroy_hierarchy);

View file

@ -21,6 +21,7 @@
#include <linux/cpuhotplug.h>
#include <linux/dtpm.h>
#include <linux/energy_model.h>
#include <linux/of.h>
#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/units.h>
@ -150,10 +151,17 @@ static int update_pd_power_uw(struct dtpm *dtpm)
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
struct cpufreq_policy *policy;
if (freq_qos_request_active(&dtpm_cpu->qos_req))
freq_qos_remove_request(&dtpm_cpu->qos_req);
policy = cpufreq_cpu_get(dtpm_cpu->cpu);
if (policy) {
for_each_cpu(dtpm_cpu->cpu, policy->related_cpus)
per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL;
}
kfree(dtpm_cpu);
}
@ -176,6 +184,17 @@ static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
}
static int cpuhp_dtpm_cpu_online(unsigned int cpu)
{
struct dtpm_cpu *dtpm_cpu;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return dtpm_update_power(&dtpm_cpu->dtpm);
return 0;
}
static int __dtpm_cpu_setup(int cpu, struct dtpm *parent)
{
struct dtpm_cpu *dtpm_cpu;
struct cpufreq_policy *policy;
@ -183,6 +202,10 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
char name[CPUFREQ_NAME_LEN];
int ret = -ENOMEM;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return 0;
policy = cpufreq_cpu_get(cpu);
if (!policy)
return 0;
@ -191,10 +214,6 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
if (!pd)
return -EINVAL;
dtpm_cpu = per_cpu(dtpm_per_cpu, cpu);
if (dtpm_cpu)
return dtpm_update_power(&dtpm_cpu->dtpm);
dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL);
if (!dtpm_cpu)
return -ENOMEM;
@ -207,7 +226,7 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
snprintf(name, sizeof(name), "cpu%d-cpufreq", dtpm_cpu->cpu);
ret = dtpm_register(name, &dtpm_cpu->dtpm, NULL);
ret = dtpm_register(name, &dtpm_cpu->dtpm, parent);
if (ret)
goto out_kfree_dtpm_cpu;
@ -231,7 +250,18 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu)
return ret;
}
static int __init dtpm_cpu_init(void)
static int dtpm_cpu_setup(struct dtpm *dtpm, struct device_node *np)
{
int cpu;
cpu = of_cpu_node_to_id(np);
if (cpu < 0)
return 0;
return __dtpm_cpu_setup(cpu, dtpm);
}
static int dtpm_cpu_init(void)
{
int ret;
@ -269,4 +299,15 @@ static int __init dtpm_cpu_init(void)
return 0;
}
DTPM_DECLARE(dtpm_cpu, dtpm_cpu_init);
static void dtpm_cpu_exit(void)
{
cpuhp_remove_state_nocalls(CPUHP_AP_ONLINE_DYN);
cpuhp_remove_state_nocalls(CPUHP_AP_DTPM_CPU_DEAD);
}
struct dtpm_subsys_ops dtpm_cpu_ops = {
.name = KBUILD_MODNAME,
.init = dtpm_cpu_init,
.exit = dtpm_cpu_exit,
.setup = dtpm_cpu_setup,
};

View file

@ -0,0 +1,203 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2021 Linaro Limited
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*
* The devfreq device combined with the energy model and the load can
* give an estimation of the power consumption as well as limiting the
* power.
*
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/cpumask.h>
#include <linux/devfreq.h>
#include <linux/dtpm.h>
#include <linux/energy_model.h>
#include <linux/of.h>
#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/units.h>
struct dtpm_devfreq {
struct dtpm dtpm;
struct dev_pm_qos_request qos_req;
struct devfreq *devfreq;
};
static struct dtpm_devfreq *to_dtpm_devfreq(struct dtpm *dtpm)
{
return container_of(dtpm, struct dtpm_devfreq, dtpm);
}
static int update_pd_power_uw(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
dtpm->power_min = pd->table[0].power;
dtpm->power_min *= MICROWATT_PER_MILLIWATT;
dtpm->power_max = pd->table[pd->nr_perf_states - 1].power;
dtpm->power_max *= MICROWATT_PER_MILLIWATT;
return 0;
}
static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
unsigned long freq;
u64 power;
int i;
for (i = 0; i < pd->nr_perf_states; i++) {
power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
if (power > power_limit)
break;
}
freq = pd->table[i - 1].frequency;
dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq);
power_limit = pd->table[i - 1].power * MICROWATT_PER_MILLIWATT;
return power_limit;
}
static void _normalize_load(struct devfreq_dev_status *status)
{
if (status->total_time > 0xfffff) {
status->total_time >>= 10;
status->busy_time >>= 10;
}
status->busy_time <<= 10;
status->busy_time /= status->total_time ? : 1;
status->busy_time = status->busy_time ? : 1;
status->total_time = 1024;
}
static u64 get_pd_power_uw(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
struct devfreq *devfreq = dtpm_devfreq->devfreq;
struct device *dev = devfreq->dev.parent;
struct em_perf_domain *pd = em_pd_get(dev);
struct devfreq_dev_status status;
unsigned long freq;
u64 power;
int i;
mutex_lock(&devfreq->lock);
status = devfreq->last_status;
mutex_unlock(&devfreq->lock);
freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ);
_normalize_load(&status);
for (i = 0; i < pd->nr_perf_states; i++) {
if (pd->table[i].frequency < freq)
continue;
power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
power *= status.busy_time;
power >>= 10;
return power;
}
return 0;
}
static void pd_release(struct dtpm *dtpm)
{
struct dtpm_devfreq *dtpm_devfreq = to_dtpm_devfreq(dtpm);
if (dev_pm_qos_request_active(&dtpm_devfreq->qos_req))
dev_pm_qos_remove_request(&dtpm_devfreq->qos_req);
kfree(dtpm_devfreq);
}
static struct dtpm_ops dtpm_ops = {
.set_power_uw = set_pd_power_limit,
.get_power_uw = get_pd_power_uw,
.update_power_uw = update_pd_power_uw,
.release = pd_release,
};
static int __dtpm_devfreq_setup(struct devfreq *devfreq, struct dtpm *parent)
{
struct device *dev = devfreq->dev.parent;
struct dtpm_devfreq *dtpm_devfreq;
struct em_perf_domain *pd;
int ret = -ENOMEM;
pd = em_pd_get(dev);
if (!pd) {
ret = dev_pm_opp_of_register_em(dev, NULL);
if (ret) {
pr_err("No energy model available for '%s'\n", dev_name(dev));
return -EINVAL;
}
}
dtpm_devfreq = kzalloc(sizeof(*dtpm_devfreq), GFP_KERNEL);
if (!dtpm_devfreq)
return -ENOMEM;
dtpm_init(&dtpm_devfreq->dtpm, &dtpm_ops);
dtpm_devfreq->devfreq = devfreq;
ret = dtpm_register(dev_name(dev), &dtpm_devfreq->dtpm, parent);
if (ret) {
pr_err("Failed to register '%s': %d\n", dev_name(dev), ret);
kfree(dtpm_devfreq);
return ret;
}
ret = dev_pm_qos_add_request(dev, &dtpm_devfreq->qos_req,
DEV_PM_QOS_MAX_FREQUENCY,
PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
if (ret) {
pr_err("Failed to add QoS request: %d\n", ret);
goto out_dtpm_unregister;
}
dtpm_update_power(&dtpm_devfreq->dtpm);
return 0;
out_dtpm_unregister:
dtpm_unregister(&dtpm_devfreq->dtpm);
return ret;
}
static int dtpm_devfreq_setup(struct dtpm *dtpm, struct device_node *np)
{
struct devfreq *devfreq;
devfreq = devfreq_get_devfreq_by_node(np);
if (IS_ERR(devfreq))
return 0;
return __dtpm_devfreq_setup(devfreq, dtpm);
}
struct dtpm_subsys_ops dtpm_devfreq_ops = {
.name = KBUILD_MODNAME,
.setup = dtpm_devfreq_setup,
};

View file

@ -0,0 +1,22 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2022 Linaro Ltd
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*/
#ifndef ___DTPM_SUBSYS_H__
#define ___DTPM_SUBSYS_H__
extern struct dtpm_subsys_ops dtpm_cpu_ops;
extern struct dtpm_subsys_ops dtpm_devfreq_ops;
struct dtpm_subsys_ops *dtpm_subsys[] = {
#ifdef CONFIG_DTPM_CPU
&dtpm_cpu_ops,
#endif
#ifdef CONFIG_DTPM_DEVFREQ
&dtpm_devfreq_ops,
#endif
};
#endif

View file

@ -34,4 +34,12 @@ config ROCKCHIP_PM_DOMAINS
If unsure, say N.
config ROCKCHIP_DTPM
tristate "Rockchip DTPM hierarchy"
depends on DTPM && m
help
Describe the hierarchy for the Dynamic Thermal Power
Management tree on this platform. That will create all the
power capping capable devices.
endif

View file

@ -5,3 +5,4 @@
obj-$(CONFIG_ROCKCHIP_GRF) += grf.o
obj-$(CONFIG_ROCKCHIP_IODOMAIN) += io-domain.o
obj-$(CONFIG_ROCKCHIP_PM_DOMAINS) += pm_domains.o
obj-$(CONFIG_ROCKCHIP_DTPM) += dtpm.o

View file

@ -0,0 +1,65 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright 2021 Linaro Limited
*
* Author: Daniel Lezcano <daniel.lezcano@linaro.org>
*
* DTPM hierarchy description
*/
#include <linux/dtpm.h>
#include <linux/module.h>
#include <linux/of.h>
#include <linux/platform_device.h>
static struct dtpm_node __initdata rk3399_hierarchy[] = {
[0]{ .name = "rk3399",
.type = DTPM_NODE_VIRTUAL },
[1]{ .name = "package",
.type = DTPM_NODE_VIRTUAL,
.parent = &rk3399_hierarchy[0] },
[2]{ .name = "/cpus/cpu@0",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[3]{ .name = "/cpus/cpu@1",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[4]{ .name = "/cpus/cpu@2",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[5]{ .name = "/cpus/cpu@3",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[6]{ .name = "/cpus/cpu@100",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[7]{ .name = "/cpus/cpu@101",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[8]{ .name = "/gpu@ff9a0000",
.type = DTPM_NODE_DT,
.parent = &rk3399_hierarchy[1] },
[9]{ /* sentinel */ }
};
static struct of_device_id __initdata rockchip_dtpm_match_table[] = {
{ .compatible = "rockchip,rk3399", .data = rk3399_hierarchy },
{},
};
static int __init rockchip_dtpm_init(void)
{
return dtpm_create_hierarchy(rockchip_dtpm_match_table);
}
module_init(rockchip_dtpm_init);
static void __exit rockchip_dtpm_exit(void)
{
return dtpm_destroy_hierarchy();
}
module_exit(rockchip_dtpm_exit);
MODULE_SOFTDEP("pre: panfrost cpufreq-dt");
MODULE_DESCRIPTION("Rockchip DTPM driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS("platform:dtpm");
MODULE_AUTHOR("Daniel Lezcano <daniel.lezcano@kernel.org");

View file

@ -446,7 +446,7 @@ static int suspend_common(struct device *dev, bool do_wakeup)
HCD_WAKEUP_PENDING(hcd->shared_hcd))
return -EBUSY;
retval = hcd->driver->pci_suspend(hcd, do_wakeup);
suspend_report_result(hcd->driver->pci_suspend, retval);
suspend_report_result(dev, hcd->driver->pci_suspend, retval);
/* Check again in case wakeup raced with pci_suspend */
if ((retval == 0 && do_wakeup && HCD_WAKEUP_PENDING(hcd)) ||
@ -556,7 +556,7 @@ static int hcd_pci_suspend_noirq(struct device *dev)
dev_dbg(dev, "--> PCI %s\n",
pci_power_name(pci_dev->current_state));
} else {
suspend_report_result(pci_prepare_to_sleep, retval);
suspend_report_result(dev, pci_prepare_to_sleep, retval);
return retval;
}

View file

@ -321,16 +321,6 @@
#define THERMAL_TABLE(name)
#endif
#ifdef CONFIG_DTPM
#define DTPM_TABLE() \
. = ALIGN(8); \
__dtpm_table = .; \
KEEP(*(__dtpm_table)) \
__dtpm_table_end = .;
#else
#define DTPM_TABLE()
#endif
#define KERNEL_DTB() \
STRUCT_ALIGN(); \
__dtb_start = .; \
@ -723,7 +713,6 @@
ACPI_PROBE_TABLE(irqchip) \
ACPI_PROBE_TABLE(timer) \
THERMAL_TABLE(governor) \
DTPM_TABLE() \
EARLYCON_TABLE() \
LSM_TABLE() \
EARLY_LSM_TABLE() \

View file

@ -526,7 +526,7 @@ acpi_status acpi_release_memory(acpi_handle handle, struct resource *res,
int acpi_resources_are_enforced(void);
#ifdef CONFIG_HIBERNATION
void __init acpi_check_s4_hw_signature(int check);
extern int acpi_check_s4_hw_signature;
#endif
#ifdef CONFIG_PM_SLEEP

View file

@ -661,6 +661,11 @@ struct gov_attr_set {
/* sysfs ops for cpufreq governors */
extern const struct sysfs_ops governor_sysfs_ops;
static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj)
{
return container_of(kobj, struct gov_attr_set, kobj);
}
void gov_attr_set_init(struct gov_attr_set *attr_set, struct list_head *list_node);
void gov_attr_set_get(struct gov_attr_set *attr_set, struct list_head *list_node);
unsigned int gov_attr_set_put(struct gov_attr_set *attr_set, struct list_head *list_node);

View file

@ -32,28 +32,25 @@ struct dtpm_ops {
void (*release)(struct dtpm *);
};
typedef int (*dtpm_init_t)(void);
struct device_node;
struct dtpm_descr {
dtpm_init_t init;
struct dtpm_subsys_ops {
const char *name;
int (*init)(void);
void (*exit)(void);
int (*setup)(struct dtpm *, struct device_node *);
};
/* Init section thermal table */
extern struct dtpm_descr __dtpm_table[];
extern struct dtpm_descr __dtpm_table_end[];
enum DTPM_NODE_TYPE {
DTPM_NODE_VIRTUAL = 0,
DTPM_NODE_DT,
};
#define DTPM_TABLE_ENTRY(name, __init) \
static struct dtpm_descr __dtpm_table_entry_##name \
__used __section("__dtpm_table") = { \
.init = __init, \
}
#define DTPM_DECLARE(name, init) DTPM_TABLE_ENTRY(name, init)
#define for_each_dtpm_table(__dtpm) \
for (__dtpm = __dtpm_table; \
__dtpm < __dtpm_table_end; \
__dtpm++)
struct dtpm_node {
enum DTPM_NODE_TYPE type;
const char *name;
struct dtpm_node *parent;
};
static inline struct dtpm *to_dtpm(struct powercap_zone *zone)
{
@ -70,4 +67,7 @@ void dtpm_unregister(struct dtpm *dtpm);
int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent);
int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table);
void dtpm_destroy_hierarchy(void);
#endif

View file

@ -770,11 +770,11 @@ extern int dpm_suspend_late(pm_message_t state);
extern int dpm_suspend(pm_message_t state);
extern int dpm_prepare(pm_message_t state);
extern void __suspend_report_result(const char *function, void *fn, int ret);
extern void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret);
#define suspend_report_result(fn, ret) \
#define suspend_report_result(dev, fn, ret) \
do { \
__suspend_report_result(__func__, fn, ret); \
__suspend_report_result(__func__, dev, fn, ret); \
} while (0)
extern int device_pm_wait_for_dev(struct device *sub, struct device *dev);
@ -814,7 +814,7 @@ static inline int dpm_suspend_start(pm_message_t state)
return 0;
}
#define suspend_report_result(fn, ret) do {} while (0)
#define suspend_report_result(dev, fn, ret) do {} while (0)
static inline int device_pm_wait_for_dev(struct device *a, struct device *b)
{

View file

@ -567,6 +567,10 @@ static inline void pm_runtime_disable(struct device *dev)
* Allow the runtime PM autosuspend mechanism to be used for @dev whenever
* requested (or "autosuspend" will be handled as direct runtime-suspend for
* it).
*
* NOTE: It's important to undo this with pm_runtime_dont_use_autosuspend()
* at driver exit time unless your driver initially enabled pm_runtime
* with devm_pm_runtime_enable() (which handles it for you).
*/
static inline void pm_runtime_use_autosuspend(struct device *dev)
{

View file

@ -689,8 +689,10 @@ static int load_image_and_restore(void)
lock_device_hotplug();
error = create_basic_memory_bitmaps();
if (error)
if (error) {
swsusp_close(FMODE_READ | FMODE_EXCL);
goto Unlock;
}
error = swsusp_read(&flags);
swsusp_close(FMODE_READ | FMODE_EXCL);
@ -1328,7 +1330,7 @@ static int __init resumedelay_setup(char *str)
int rc = kstrtouint(str, 0, &resume_delay);
if (rc)
return rc;
pr_warn("resumedelay: bad option string '%s'\n", str);
return 1;
}

View file

@ -157,22 +157,22 @@ static int __init setup_test_suspend(char *value)
value++;
suspend_type = strsep(&value, ",");
if (!suspend_type)
return 0;
return 1;
repeat = strsep(&value, ",");
if (repeat) {
if (kstrtou32(repeat, 0, &test_repeat_count_max))
return 0;
return 1;
}
for (i = PM_SUSPEND_MIN; i < PM_SUSPEND_MAX; i++)
if (!strcmp(pm_labels[i], suspend_type)) {
test_state_label = pm_labels[i];
return 0;
return 1;
}
printk(warn_bad_state, suspend_type);
return 0;
return 1;
}
__setup("test_suspend", setup_test_suspend);

View file

@ -89,7 +89,7 @@ struct swap_map_page_list {
struct swap_map_page_list *next;
};
/**
/*
* The swap_map_handle structure is used for handling swap in
* a file-alike way
*/
@ -117,7 +117,7 @@ struct swsusp_header {
static struct swsusp_header *swsusp_header;
/**
/*
* The following functions are used for tracing the allocated
* swap pages, so that they can be freed in case of an error.
*/
@ -171,7 +171,7 @@ static int swsusp_extents_insert(unsigned long swap_offset)
return 0;
}
/**
/*
* alloc_swapdev_block - allocate a swap page and register that it has
* been allocated, so that it can be freed in case of an error.
*/
@ -190,7 +190,7 @@ sector_t alloc_swapdev_block(int swap)
return 0;
}
/**
/*
* free_all_swap_pages - free swap pages allocated for saving image data.
* It also frees the extents used to register which swap entries had been
* allocated.

View file

@ -539,7 +539,7 @@ ATTRIBUTE_GROUPS(sugov);
static void sugov_tunables_free(struct kobject *kobj)
{
struct gov_attr_set *attr_set = container_of(kobj, struct gov_attr_set, kobj);
struct gov_attr_set *attr_set = to_gov_attr_set(kobj);
kfree(to_sugov_tunables(attr_set));
}

View file

@ -143,9 +143,9 @@ UTIL_HEADERS = utils/helpers/helpers.h utils/idle_monitor/cpupower-monitor.h \
utils/helpers/bitmask.h \
utils/idle_monitor/idle_monitors.h utils/idle_monitor/idle_monitors.def
LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h
LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c
LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o
LIB_HEADERS = lib/cpufreq.h lib/cpupower.h lib/cpuidle.h lib/acpi_cppc.h
LIB_SRC = lib/cpufreq.c lib/cpupower.c lib/cpuidle.c lib/acpi_cppc.c
LIB_OBJS = lib/cpufreq.o lib/cpupower.o lib/cpuidle.o lib/acpi_cppc.o
LIB_OBJS := $(addprefix $(OUTPUT),$(LIB_OBJS))
override CFLAGS += -pipe

View file

@ -0,0 +1,59 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include "cpupower_intern.h"
#include "acpi_cppc.h"
/* ACPI CPPC sysfs access ***********************************************/
static int acpi_cppc_read_file(unsigned int cpu, const char *fname,
char *buf, size_t buflen)
{
char path[SYSFS_PATH_MAX];
snprintf(path, sizeof(path), PATH_TO_CPU "cpu%u/acpi_cppc/%s",
cpu, fname);
return cpupower_read_sysfs(path, buf, buflen);
}
static const char * const acpi_cppc_value_files[] = {
[HIGHEST_PERF] = "highest_perf",
[LOWEST_PERF] = "lowest_perf",
[NOMINAL_PERF] = "nominal_perf",
[LOWEST_NONLINEAR_PERF] = "lowest_nonlinear_perf",
[LOWEST_FREQ] = "lowest_freq",
[NOMINAL_FREQ] = "nominal_freq",
[REFERENCE_PERF] = "reference_perf",
[WRAPAROUND_TIME] = "wraparound_time"
};
unsigned long acpi_cppc_get_data(unsigned int cpu, enum acpi_cppc_value which)
{
unsigned long long value;
unsigned int len;
char linebuf[MAX_LINE_LEN];
char *endp;
if (which >= MAX_CPPC_VALUE_FILES)
return 0;
len = acpi_cppc_read_file(cpu, acpi_cppc_value_files[which],
linebuf, sizeof(linebuf));
if (len == 0)
return 0;
value = strtoull(linebuf, &endp, 0);
if (endp == linebuf || errno == ERANGE)
return 0;
return value;
}

View file

@ -0,0 +1,21 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef __ACPI_CPPC_H__
#define __ACPI_CPPC_H__
enum acpi_cppc_value {
HIGHEST_PERF,
LOWEST_PERF,
NOMINAL_PERF,
LOWEST_NONLINEAR_PERF,
LOWEST_FREQ,
NOMINAL_FREQ,
REFERENCE_PERF,
WRAPAROUND_TIME,
MAX_CPPC_VALUE_FILES
};
unsigned long acpi_cppc_get_data(unsigned int cpu,
enum acpi_cppc_value which);
#endif /* _ACPI_CPPC_H */

View file

@ -83,20 +83,21 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = {
[STATS_NUM_TRANSITIONS] = "stats/total_trans"
};
static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
enum cpufreq_value which)
unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
const char **table,
unsigned int index,
unsigned int size)
{
unsigned long value;
unsigned int len;
char linebuf[MAX_LINE_LEN];
char *endp;
if (which >= MAX_CPUFREQ_VALUE_READ_FILES)
if (!table || index >= size || !table[index])
return 0;
len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which],
linebuf, sizeof(linebuf));
len = sysfs_cpufreq_read_file(cpu, table[index], linebuf,
sizeof(linebuf));
if (len == 0)
return 0;
@ -109,6 +110,14 @@ static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
return value;
}
static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu,
enum cpufreq_value which)
{
return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files,
which,
MAX_CPUFREQ_VALUE_READ_FILES);
}
/* read access to files which contain one string */
enum cpufreq_string {
@ -124,7 +133,7 @@ static const char *cpufreq_string_files[MAX_CPUFREQ_STRING_FILES] = {
static char *sysfs_cpufreq_get_one_string(unsigned int cpu,
enum cpufreq_string which)
enum cpufreq_string which)
{
char linebuf[MAX_LINE_LEN];
char *result;

View file

@ -203,6 +203,18 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor);
int cpufreq_set_frequency(unsigned int cpu,
unsigned long target_frequency);
/*
* get the sysfs value from specific table
*
* Read the value with the sysfs file name from specific table. Does
* only work if the cpufreq driver has the specific sysfs interfaces.
*/
unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu,
const char **table,
unsigned int index,
unsigned int size);
#ifdef __cplusplus
}
#endif

View file

@ -53,6 +53,9 @@ human\-readable output for the \-f, \-w, \-s and \-y parameters.
\fB\-n\fR \fB\-\-no-rounding\fR
Output frequencies and latencies without rounding off values.
.TP
\fB\-c\fR \fB\-\-perf\fR
Get performances and frequencies capabilities of CPPC, by reading it from hardware (only available on the hardware with CPPC).
.TP
.SH "REMARKS"
.LP
By default only values of core zero are displayed. How to display settings of

View file

@ -4,7 +4,7 @@
cpupower\-idle\-set \- Utility to set cpu idle state specific kernel options
.SH "SYNTAX"
.LP
cpupower [ \-c cpulist ] idle\-info [\fIoptions\fP]
cpupower [ \-c cpulist ] idle\-set [\fIoptions\fP]
.SH "DESCRIPTION"
.LP
The cpupower idle\-set subcommand allows to set cpu idle, also called cpu

View file

@ -84,43 +84,6 @@ static void proc_cpufreq_output(void)
}
static int no_rounding;
static void print_speed(unsigned long speed)
{
unsigned long tmp;
if (no_rounding) {
if (speed > 1000000)
printf("%u.%06u GHz", ((unsigned int) speed/1000000),
((unsigned int) speed%1000000));
else if (speed > 1000)
printf("%u.%03u MHz", ((unsigned int) speed/1000),
(unsigned int) (speed%1000));
else
printf("%lu kHz", speed);
} else {
if (speed > 1000000) {
tmp = speed%10000;
if (tmp >= 5000)
speed += 10000;
printf("%u.%02u GHz", ((unsigned int) speed/1000000),
((unsigned int) (speed%1000000)/10000));
} else if (speed > 100000) {
tmp = speed%1000;
if (tmp >= 500)
speed += 1000;
printf("%u MHz", ((unsigned int) speed/1000));
} else if (speed > 1000) {
tmp = speed%100;
if (tmp >= 50)
speed += 100;
printf("%u.%01u MHz", ((unsigned int) speed/1000),
((unsigned int) (speed%1000)/100));
}
}
return;
}
static void print_duration(unsigned long duration)
{
unsigned long tmp;
@ -183,9 +146,12 @@ static int get_boost_mode_x86(unsigned int cpu)
printf(_(" Supported: %s\n"), support ? _("yes") : _("no"));
printf(_(" Active: %s\n"), active ? _("yes") : _("no"));
if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.family >= 0x10) ||
cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
return 0;
} else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.family >= 0x10) ||
cpupower_cpu_info.vendor == X86_VENDOR_HYGON) {
ret = decode_pstates(cpu, b_states, pstates, &pstate_no);
if (ret)
return ret;
@ -254,11 +220,11 @@ static int get_boost_mode(unsigned int cpu)
if (freqs) {
printf(_(" boost frequency steps: "));
while (freqs->next) {
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf(", ");
freqs = freqs->next;
}
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf("\n");
cpufreq_put_available_frequencies(freqs);
}
@ -277,7 +243,7 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human)
return -EINVAL;
}
if (human) {
print_speed(freq);
print_speed(freq, no_rounding);
} else
printf("%lu", freq);
printf(_(" (asserted by call to kernel)\n"));
@ -296,7 +262,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human)
return -EINVAL;
}
if (human) {
print_speed(freq);
print_speed(freq, no_rounding);
} else
printf("%lu", freq);
printf(_(" (asserted by call to hardware)\n"));
@ -316,9 +282,9 @@ static int get_hardware_limits(unsigned int cpu, unsigned int human)
if (human) {
printf(_(" hardware limits: "));
print_speed(min);
print_speed(min, no_rounding);
printf(" - ");
print_speed(max);
print_speed(max, no_rounding);
printf("\n");
} else {
printf("%lu %lu\n", min, max);
@ -350,9 +316,9 @@ static int get_policy(unsigned int cpu)
return -EINVAL;
}
printf(_(" current policy: frequency should be within "));
print_speed(policy->min);
print_speed(policy->min, no_rounding);
printf(_(" and "));
print_speed(policy->max);
print_speed(policy->max, no_rounding);
printf(".\n ");
printf(_("The governor \"%s\" may decide which speed to use\n"
@ -436,7 +402,7 @@ static int get_freq_stats(unsigned int cpu, unsigned int human)
struct cpufreq_stats *stats = cpufreq_get_stats(cpu, &total_time);
while (stats) {
if (human) {
print_speed(stats->frequency);
print_speed(stats->frequency, no_rounding);
printf(":%.2f%%",
(100.0 * stats->time_in_state) / total_time);
} else
@ -472,6 +438,17 @@ static int get_latency(unsigned int cpu, unsigned int human)
return 0;
}
/* --performance / -c */
static int get_perf_cap(unsigned int cpu)
{
if (cpupower_cpu_info.vendor == X86_VENDOR_AMD &&
cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE)
amd_pstate_show_perf_and_freq(cpu, no_rounding);
return 0;
}
static void debug_output_one(unsigned int cpu)
{
struct cpufreq_available_frequencies *freqs;
@ -486,11 +463,11 @@ static void debug_output_one(unsigned int cpu)
if (freqs) {
printf(_(" available frequency steps: "));
while (freqs->next) {
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf(", ");
freqs = freqs->next;
}
print_speed(freqs->frequency);
print_speed(freqs->frequency, no_rounding);
printf("\n");
cpufreq_put_available_frequencies(freqs);
}
@ -500,6 +477,7 @@ static void debug_output_one(unsigned int cpu)
if (get_freq_hardware(cpu, 1) < 0)
get_freq_kernel(cpu, 1);
get_boost_mode(cpu);
get_perf_cap(cpu);
}
static struct option info_opts[] = {
@ -518,6 +496,7 @@ static struct option info_opts[] = {
{"proc", no_argument, NULL, 'o'},
{"human", no_argument, NULL, 'm'},
{"no-rounding", no_argument, NULL, 'n'},
{"performance", no_argument, NULL, 'c'},
{ },
};
@ -531,7 +510,7 @@ int cmd_freq_info(int argc, char **argv)
int output_param = 0;
do {
ret = getopt_long(argc, argv, "oefwldpgrasmybn", info_opts,
ret = getopt_long(argc, argv, "oefwldpgrasmybnc", info_opts,
NULL);
switch (ret) {
case '?':
@ -554,6 +533,7 @@ int cmd_freq_info(int argc, char **argv)
case 'e':
case 's':
case 'y':
case 'c':
if (output_param) {
output_param = -1;
cont = 0;
@ -660,6 +640,9 @@ int cmd_freq_info(int argc, char **argv)
case 'y':
ret = get_latency(cpu, human);
break;
case 'c':
ret = get_perf_cap(cpu);
break;
}
if (ret)
return ret;

View file

@ -8,7 +8,10 @@
#include <pci/pci.h>
#include "helpers/helpers.h"
#include "cpufreq.h"
#include "acpi_cppc.h"
/* ACPI P-States Helper Functions for AMD Processors ***************/
#define MSR_AMD_PSTATE_STATUS 0xc0010063
#define MSR_AMD_PSTATE 0xc0010064
#define MSR_AMD_PSTATE_LIMIT 0xc0010061
@ -146,4 +149,78 @@ int amd_pci_get_num_boost_states(int *active, int *states)
pci_cleanup(pci_acc);
return 0;
}
/* ACPI P-States Helper Functions for AMD Processors ***************/
/* AMD P-State Helper Functions ************************************/
enum amd_pstate_value {
AMD_PSTATE_HIGHEST_PERF,
AMD_PSTATE_MAX_FREQ,
AMD_PSTATE_LOWEST_NONLINEAR_FREQ,
MAX_AMD_PSTATE_VALUE_READ_FILES,
};
static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = {
[AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf",
[AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq",
[AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq",
};
static unsigned long amd_pstate_get_data(unsigned int cpu,
enum amd_pstate_value value)
{
return cpufreq_get_sysfs_value_from_table(cpu,
amd_pstate_value_files,
value,
MAX_AMD_PSTATE_VALUE_READ_FILES);
}
void amd_pstate_boost_init(unsigned int cpu, int *support, int *active)
{
unsigned long highest_perf, nominal_perf, cpuinfo_min,
cpuinfo_max, amd_pstate_max;
highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF);
nominal_perf = acpi_cppc_get_data(cpu, NOMINAL_PERF);
*support = highest_perf > nominal_perf ? 1 : 0;
if (!(*support))
return;
cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max);
amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ);
*active = cpuinfo_max == amd_pstate_max ? 1 : 0;
}
void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding)
{
printf(_(" AMD PSTATE Highest Performance: %lu. Maximum Frequency: "),
amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF));
/*
* If boost isn't active, the cpuinfo_max doesn't indicate real max
* frequency. So we read it back from amd-pstate sysfs entry.
*/
print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "),
acpi_cppc_get_data(cpu, NOMINAL_PERF));
print_speed(acpi_cppc_get_data(cpu, NOMINAL_FREQ) * 1000,
no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "),
acpi_cppc_get_data(cpu, LOWEST_NONLINEAR_PERF));
print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ),
no_rounding);
printf(".\n");
printf(_(" AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "),
acpi_cppc_get_data(cpu, LOWEST_PERF));
print_speed(acpi_cppc_get_data(cpu, LOWEST_FREQ) * 1000, no_rounding);
printf(".\n");
}
/* AMD P-State Helper Functions ************************************/
#endif /* defined(__i386__) || defined(__x86_64__) */

View file

@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info)
if (ext_cpuid_level >= 0x80000008 &&
cpuid_ebx(0x80000008) & (1 << 4))
cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU;
if (cpupower_amd_pstate_enabled()) {
cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE;
/*
* If AMD P-State is enabled, the firmware will treat
* AMD P-State function as high priority.
*/
cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE;
cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF;
}
}
if (cpu_info->vendor == X86_VENDOR_INTEL) {

View file

@ -11,6 +11,7 @@
#include <libintl.h>
#include <locale.h>
#include <stdbool.h>
#include "helpers/bitmask.h"
#include <cpupower.h>
@ -73,6 +74,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL,
#define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100
#define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200
#define CPUPOWER_CAP_AMD_CPB_MSR 0x00000400
#define CPUPOWER_CAP_AMD_PSTATE 0x00000800
#define CPUPOWER_AMD_CPBDIS 0x02000000
@ -135,6 +137,16 @@ extern int decode_pstates(unsigned int cpu, int boost_states,
extern int cpufreq_has_boost_support(unsigned int cpu, int *support,
int *active, int * states);
/* AMD P-State stuff **************************/
bool cpupower_amd_pstate_enabled(void);
void amd_pstate_boost_init(unsigned int cpu,
int *support, int *active);
void amd_pstate_show_perf_and_freq(unsigned int cpu,
int no_rounding);
/* AMD P-State stuff **************************/
/*
* CPUID functions returning a single datum
*/
@ -167,6 +179,15 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support,
int *active, int * states)
{ return -1; }
static inline bool cpupower_amd_pstate_enabled(void)
{ return false; }
static inline void amd_pstate_boost_init(unsigned int cpu, int *support,
int *active)
{}
static inline void amd_pstate_show_perf_and_freq(unsigned int cpu,
int no_rounding)
{}
/* cpuid and cpuinfo helpers **************************/
static inline unsigned int cpuid_eax(unsigned int op) { return 0; };
@ -184,5 +205,6 @@ extern struct bitmask *offline_cpus;
void get_cpustate(void);
void print_online_cpus(void);
void print_offline_cpus(void);
void print_speed(unsigned long speed, int no_rounding);
#endif /* __CPUPOWERUTILS_HELPERS__ */

View file

@ -3,9 +3,11 @@
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include "helpers/helpers.h"
#include "helpers/sysfs.h"
#include "cpufreq.h"
#if defined(__i386__) || defined(__x86_64__)
@ -39,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active,
if (ret)
return ret;
}
} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) {
amd_pstate_boost_init(cpu, support, active);
} else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA)
*support = *active = 1;
return 0;
@ -83,6 +87,22 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val)
return 0;
}
bool cpupower_amd_pstate_enabled(void)
{
char *driver = cpufreq_get_driver(0);
bool ret = false;
if (!driver)
return ret;
if (!strcmp(driver, "amd-pstate"))
ret = true;
cpufreq_put_driver(driver);
return ret;
}
#endif /* #if defined(__i386__) || defined(__x86_64__) */
/* get_cpustate
@ -144,3 +164,43 @@ void print_offline_cpus(void)
printf(_("cpupower set operation was not performed on them\n"));
}
}
/*
* print_speed
*
* Print the exact CPU frequency with appropriate unit
*/
void print_speed(unsigned long speed, int no_rounding)
{
unsigned long tmp;
if (no_rounding) {
if (speed > 1000000)
printf("%u.%06u GHz", ((unsigned int)speed / 1000000),
((unsigned int)speed % 1000000));
else if (speed > 1000)
printf("%u.%03u MHz", ((unsigned int)speed / 1000),
(unsigned int)(speed % 1000));
else
printf("%lu kHz", speed);
} else {
if (speed > 1000000) {
tmp = speed % 10000;
if (tmp >= 5000)
speed += 10000;
printf("%u.%02u GHz", ((unsigned int)speed / 1000000),
((unsigned int)(speed % 1000000) / 10000));
} else if (speed > 100000) {
tmp = speed % 1000;
if (tmp >= 500)
speed += 1000;
printf("%u MHz", ((unsigned int)speed / 1000));
} else if (speed > 1000) {
tmp = speed % 100;
if (tmp >= 50)
speed += 100;
printf("%u.%01u MHz", ((unsigned int)speed / 1000),
((unsigned int)(speed % 1000) / 100));
}
}
}

View file

@ -0,0 +1,354 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0-only
# -*- coding: utf-8 -*-
#
""" This utility can be used to debug and tune the performance of the
AMD P-State driver. It imports intel_pstate_tracer to analyze AMD P-State
trace event.
Prerequisites:
Python version 2.7.x or higher
gnuplot 5.0 or higher
gnuplot-py 1.8 or higher
(Most of the distributions have these required packages. They may be called
gnuplot-py, phython-gnuplot or phython3-gnuplot, gnuplot-nox, ... )
Kernel config for Linux trace is enabled
see print_help(): for Usage and Output details
"""
from __future__ import print_function
from datetime import datetime
import subprocess
import os
import time
import re
import signal
import sys
import getopt
import Gnuplot
from numpy import *
from decimal import *
sys.path.append('../intel_pstate_tracer')
#import intel_pstate_tracer
import intel_pstate_tracer as ipt
__license__ = "GPL version 2"
MAX_CPUS = 256
# Define the csv file columns
C_COMM = 15
C_ELAPSED = 14
C_SAMPLE = 13
C_DURATION = 12
C_LOAD = 11
C_TSC = 10
C_APERF = 9
C_MPERF = 8
C_FREQ = 7
C_MAX_PERF = 6
C_DES_PERF = 5
C_MIN_PERF = 4
C_USEC = 3
C_SEC = 2
C_CPU = 1
global sample_num, last_sec_cpu, last_usec_cpu, start_time, test_name, trace_file
getcontext().prec = 11
sample_num =0
last_sec_cpu = [0] * MAX_CPUS
last_usec_cpu = [0] * MAX_CPUS
def plot_per_cpu_freq(cpu_index):
""" Plot per cpu frequency """
file_name = 'cpu{:0>3}.csv'.format(cpu_index)
if os.path.exists(file_name):
output_png = "cpu%03d_frequency.png" % cpu_index
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set yrange [0:7]')
g_plot('set ytics 0, 1')
g_plot('set ylabel "CPU Frequency (GHz)"')
g_plot('set title "{} : frequency : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now()))
g_plot('set ylabel "CPU frequency"')
g_plot('set key off')
ipt.set_4_plot_linestyles(g_plot)
g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_FREQ))
def plot_per_cpu_des_perf(cpu_index):
""" Plot per cpu desired perf """
file_name = 'cpu{:0>3}.csv'.format(cpu_index)
if os.path.exists(file_name):
output_png = "cpu%03d_des_perf.png" % cpu_index
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set yrange [0:255]')
g_plot('set ylabel "des perf"')
g_plot('set title "{} : cpu des perf : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now()))
g_plot('set key off')
ipt.set_4_plot_linestyles(g_plot)
g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_DES_PERF))
def plot_per_cpu_load(cpu_index):
""" Plot per cpu load """
file_name = 'cpu{:0>3}.csv'.format(cpu_index)
if os.path.exists(file_name):
output_png = "cpu%03d_load.png" % cpu_index
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set yrange [0:100]')
g_plot('set ytics 0, 10')
g_plot('set ylabel "CPU load (percent)"')
g_plot('set title "{} : cpu load : CPU {:0>3} : {:%F %H:%M}"'.format(test_name, cpu_index, datetime.now()))
g_plot('set key off')
ipt.set_4_plot_linestyles(g_plot)
g_plot('plot "' + file_name + '" using {:d}:{:d} with linespoints linestyle 1 axis x1y1'.format(C_ELAPSED, C_LOAD))
def plot_all_cpu_frequency():
""" Plot all cpu frequencies """
output_png = 'all_cpu_frequencies.png'
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set ylabel "CPU Frequency (GHz)"')
g_plot('set title "{} : cpu frequencies : {:%F %H:%M}"'.format(test_name, datetime.now()))
title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ')
plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 7 ps 1 title i".format(C_ELAPSED, C_FREQ)
g_plot('title_list = "{}"'.format(title_list))
g_plot(plot_str)
def plot_all_cpu_des_perf():
""" Plot all cpu desired perf """
output_png = 'all_cpu_des_perf.png'
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set ylabel "des perf"')
g_plot('set title "{} : cpu des perf : {:%F %H:%M}"'.format(test_name, datetime.now()))
title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ')
plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 255 ps 1 title i".format(C_ELAPSED, C_DES_PERF)
g_plot('title_list = "{}"'.format(title_list))
g_plot(plot_str)
def plot_all_cpu_load():
""" Plot all cpu load """
output_png = 'all_cpu_load.png'
g_plot = ipt.common_gnuplot_settings()
g_plot('set output "' + output_png + '"')
g_plot('set yrange [0:100]')
g_plot('set ylabel "CPU load (percent)"')
g_plot('set title "{} : cpu load : {:%F %H:%M}"'.format(test_name, datetime.now()))
title_list = subprocess.check_output('ls cpu???.csv | sed -e \'s/.csv//\'',shell=True).decode('utf-8').replace('\n', ' ')
plot_str = "plot for [i in title_list] i.'.csv' using {:d}:{:d} pt 255 ps 1 title i".format(C_ELAPSED, C_LOAD)
g_plot('title_list = "{}"'.format(title_list))
g_plot(plot_str)
def store_csv(cpu_int, time_pre_dec, time_post_dec, min_perf, des_perf, max_perf, freq_ghz, mperf, aperf, tsc, common_comm, load, duration_ms, sample_num, elapsed_time, cpu_mask):
""" Store master csv file information """
global graph_data_present
if cpu_mask[cpu_int] == 0:
return
try:
f_handle = open('cpu.csv', 'a')
string_buffer = "CPU_%03u, %05u, %06u, %u, %u, %u, %.4f, %u, %u, %u, %.2f, %.3f, %u, %.3f, %s\n" % (cpu_int, int(time_pre_dec), int(time_post_dec), int(min_perf), int(des_perf), int(max_perf), freq_ghz, int(mperf), int(aperf), int(tsc), load, duration_ms, sample_num, elapsed_time, common_comm)
f_handle.write(string_buffer)
f_handle.close()
except:
print('IO error cpu.csv')
return
graph_data_present = True;
def cleanup_data_files():
""" clean up existing data files """
if os.path.exists('cpu.csv'):
os.remove('cpu.csv')
f_handle = open('cpu.csv', 'a')
f_handle.write('common_cpu, common_secs, common_usecs, min_perf, des_perf, max_perf, freq, mperf, aperf, tsc, load, duration_ms, sample_num, elapsed_time, common_comm')
f_handle.write('\n')
f_handle.close()
def read_trace_data(file_name, cpu_mask):
""" Read and parse trace data """
global current_max_cpu
global sample_num, last_sec_cpu, last_usec_cpu, start_time
try:
data = open(file_name, 'r').read()
except:
print('Error opening ', file_name)
sys.exit(2)
for line in data.splitlines():
search_obj = \
re.search(r'(^(.*?)\[)((\d+)[^\]])(.*?)(\d+)([.])(\d+)(.*?amd_min_perf=)(\d+)(.*?amd_des_perf=)(\d+)(.*?amd_max_perf=)(\d+)(.*?freq=)(\d+)(.*?mperf=)(\d+)(.*?aperf=)(\d+)(.*?tsc=)(\d+)'
, line)
if search_obj:
cpu = search_obj.group(3)
cpu_int = int(cpu)
cpu = str(cpu_int)
time_pre_dec = search_obj.group(6)
time_post_dec = search_obj.group(8)
min_perf = search_obj.group(10)
des_perf = search_obj.group(12)
max_perf = search_obj.group(14)
freq = search_obj.group(16)
mperf = search_obj.group(18)
aperf = search_obj.group(20)
tsc = search_obj.group(22)
common_comm = search_obj.group(2).replace(' ', '')
if sample_num == 0 :
start_time = Decimal(time_pre_dec) + Decimal(time_post_dec) / Decimal(1000000)
sample_num += 1
if last_sec_cpu[cpu_int] == 0 :
last_sec_cpu[cpu_int] = time_pre_dec
last_usec_cpu[cpu_int] = time_post_dec
else :
duration_us = (int(time_pre_dec) - int(last_sec_cpu[cpu_int])) * 1000000 + (int(time_post_dec) - int(last_usec_cpu[cpu_int]))
duration_ms = Decimal(duration_us) / Decimal(1000)
last_sec_cpu[cpu_int] = time_pre_dec
last_usec_cpu[cpu_int] = time_post_dec
elapsed_time = Decimal(time_pre_dec) + Decimal(time_post_dec) / Decimal(1000000) - start_time
load = Decimal(int(mperf)*100)/ Decimal(tsc)
freq_ghz = Decimal(freq)/Decimal(1000000)
store_csv(cpu_int, time_pre_dec, time_post_dec, min_perf, des_perf, max_perf, freq_ghz, mperf, aperf, tsc, common_comm, load, duration_ms, sample_num, elapsed_time, cpu_mask)
if cpu_int > current_max_cpu:
current_max_cpu = cpu_int
# Now separate the main overall csv file into per CPU csv files.
ipt.split_csv(current_max_cpu, cpu_mask)
def signal_handler(signal, frame):
print(' SIGINT: Forcing cleanup before exit.')
if interval:
ipt.disable_trace(trace_file)
ipt.clear_trace_file()
ipt.free_trace_buffer()
sys.exit(0)
trace_file = "/sys/kernel/debug/tracing/events/amd_cpu/enable"
signal.signal(signal.SIGINT, signal_handler)
interval = ""
file_name = ""
cpu_list = ""
test_name = ""
memory = "10240"
graph_data_present = False;
valid1 = False
valid2 = False
cpu_mask = zeros((MAX_CPUS,), dtype=int)
try:
opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="])
except getopt.GetoptError:
ipt.print_help('amd_pstate')
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print()
sys.exit()
elif opt in ("-t", "--trace_file"):
valid1 = True
location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
file_name = os.path.join(location, arg)
elif opt in ("-i", "--interval"):
valid1 = True
interval = arg
elif opt in ("-c", "--cpu"):
cpu_list = arg
elif opt in ("-n", "--name"):
valid2 = True
test_name = arg
elif opt in ("-m", "--memory"):
memory = arg
if not (valid1 and valid2):
ipt.print_help('amd_pstate')
sys.exit()
if cpu_list:
for p in re.split("[,]", cpu_list):
if int(p) < MAX_CPUS :
cpu_mask[int(p)] = 1
else:
for i in range (0, MAX_CPUS):
cpu_mask[i] = 1
if not os.path.exists('results'):
os.mkdir('results')
ipt.fix_ownership('results')
os.chdir('results')
if os.path.exists(test_name):
print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.')
sys.exit()
os.mkdir(test_name)
ipt.fix_ownership(test_name)
os.chdir(test_name)
cur_version = sys.version_info
print('python version (should be >= 2.7):')
print(cur_version)
cleanup_data_files()
if interval:
file_name = "/sys/kernel/debug/tracing/trace"
ipt.clear_trace_file()
ipt.set_trace_buffer_size(memory)
ipt.enable_trace(trace_file)
time.sleep(int(interval))
ipt.disable_trace(trace_file)
current_max_cpu = 0
read_trace_data(file_name, cpu_mask)
if interval:
ipt.clear_trace_file()
ipt.free_trace_buffer()
if graph_data_present == False:
print('No valid data to plot')
sys.exit(2)
for cpu_no in range(0, current_max_cpu + 1):
plot_per_cpu_freq(cpu_no)
plot_per_cpu_des_perf(cpu_no)
plot_per_cpu_load(cpu_no)
plot_all_cpu_des_perf()
plot_all_cpu_frequency()
plot_all_cpu_load()
for root, dirs, files in os.walk('.'):
for f in files:
ipt.fix_ownership(f)
os.chdir('../../')

View file

@ -63,7 +63,7 @@ C_USEC = 3
C_SEC = 2
C_CPU = 1
global sample_num, last_sec_cpu, last_usec_cpu, start_time, testname
global sample_num, last_sec_cpu, last_usec_cpu, start_time, testname, trace_file
# 11 digits covers uptime to 115 days
getcontext().prec = 11
@ -72,17 +72,17 @@ sample_num =0
last_sec_cpu = [0] * MAX_CPUS
last_usec_cpu = [0] * MAX_CPUS
def print_help():
print('intel_pstate_tracer.py:')
def print_help(driver_name):
print('%s_tracer.py:'%driver_name)
print(' Usage:')
print(' If the trace file is available, then to simply parse and plot, use (sudo not required):')
print(' ./intel_pstate_tracer.py [-c cpus] -t <trace_file> -n <test_name>')
print(' ./%s_tracer.py [-c cpus] -t <trace_file> -n <test_name>'%driver_name)
print(' Or')
print(' ./intel_pstate_tracer.py [--cpu cpus] ---trace_file <trace_file> --name <test_name>')
print(' ./%s_tracer.py [--cpu cpus] ---trace_file <trace_file> --name <test_name>'%driver_name)
print(' To generate trace file, parse and plot, use (sudo required):')
print(' sudo ./intel_pstate_tracer.py [-c cpus] -i <interval> -n <test_name> -m <kbytes>')
print(' sudo ./%s_tracer.py [-c cpus] -i <interval> -n <test_name> -m <kbytes>'%driver_name)
print(' Or')
print(' sudo ./intel_pstate_tracer.py [--cpu cpus] --interval <interval> --name <test_name> --memory <kbytes>')
print(' sudo ./%s_tracer.py [--cpu cpus] --interval <interval> --name <test_name> --memory <kbytes>'%driver_name)
print(' Optional argument:')
print(' cpus: comma separated list of CPUs')
print(' kbytes: Kilo bytes of memory per CPU to allocate to the trace buffer. Default: 10240')
@ -323,7 +323,7 @@ def set_4_plot_linestyles(g_plot):
g_plot('set style line 3 linetype 1 linecolor rgb "purple" pointtype -1')
g_plot('set style line 4 linetype 1 linecolor rgb "blue" pointtype -1')
def store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz):
def store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz, cpu_mask):
""" Store master csv file information """
global graph_data_present
@ -342,11 +342,9 @@ def store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _t
graph_data_present = True;
def split_csv():
def split_csv(current_max_cpu, cpu_mask):
""" seperate the all csv file into per CPU csv files. """
global current_max_cpu
if os.path.exists('cpu.csv'):
for index in range(0, current_max_cpu + 1):
if cpu_mask[int(index)] != 0:
@ -381,27 +379,25 @@ def clear_trace_file():
print('IO error clearing trace file ')
sys.exit(2)
def enable_trace():
def enable_trace(trace_file):
""" Enable trace """
try:
open('/sys/kernel/debug/tracing/events/power/pstate_sample/enable'
, 'w').write("1")
open(trace_file,'w').write("1")
except:
print('IO error enabling trace ')
sys.exit(2)
def disable_trace():
def disable_trace(trace_file):
""" Disable trace """
try:
open('/sys/kernel/debug/tracing/events/power/pstate_sample/enable'
, 'w').write("0")
open(trace_file, 'w').write("0")
except:
print('IO error disabling trace ')
sys.exit(2)
def set_trace_buffer_size():
def set_trace_buffer_size(memory):
""" Set trace buffer size """
try:
@ -421,7 +417,7 @@ def free_trace_buffer():
print('IO error freeing trace buffer ')
sys.exit(2)
def read_trace_data(filename):
def read_trace_data(filename, cpu_mask):
""" Read and parse trace data """
global current_max_cpu
@ -481,135 +477,137 @@ def read_trace_data(filename):
tsc_ghz = Decimal(0)
if duration_ms != Decimal(0) :
tsc_ghz = Decimal(tsc)/duration_ms/Decimal(1000000)
store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz)
store_csv(cpu_int, time_pre_dec, time_post_dec, core_busy, scaled, _from, _to, mperf, aperf, tsc, freq_ghz, io_boost, common_comm, load, duration_ms, sample_num, elapsed_time, tsc_ghz, cpu_mask)
if cpu_int > current_max_cpu:
current_max_cpu = cpu_int
# End of for each trace line loop
# Now seperate the main overall csv file into per CPU csv files.
split_csv()
split_csv(current_max_cpu, cpu_mask)
def signal_handler(signal, frame):
print(' SIGINT: Forcing cleanup before exit.')
if interval:
disable_trace()
disable_trace(trace_file)
clear_trace_file()
# Free the memory
free_trace_buffer()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
if __name__ == "__main__":
trace_file = "/sys/kernel/debug/tracing/events/power/pstate_sample/enable"
signal.signal(signal.SIGINT, signal_handler)
interval = ""
filename = ""
cpu_list = ""
testname = ""
memory = "10240"
graph_data_present = False;
interval = ""
filename = ""
cpu_list = ""
testname = ""
memory = "10240"
graph_data_present = False;
valid1 = False
valid2 = False
valid1 = False
valid2 = False
cpu_mask = zeros((MAX_CPUS,), dtype=int)
cpu_mask = zeros((MAX_CPUS,), dtype=int)
try:
opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="])
except getopt.GetoptError:
print_help()
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print()
try:
opts, args = getopt.getopt(sys.argv[1:],"ht:i:c:n:m:",["help","trace_file=","interval=","cpu=","name=","memory="])
except getopt.GetoptError:
print_help('intel_pstate')
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print_help('intel_pstate')
sys.exit()
elif opt in ("-t", "--trace_file"):
valid1 = True
location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
filename = os.path.join(location, arg)
elif opt in ("-i", "--interval"):
valid1 = True
interval = arg
elif opt in ("-c", "--cpu"):
cpu_list = arg
elif opt in ("-n", "--name"):
valid2 = True
testname = arg
elif opt in ("-m", "--memory"):
memory = arg
if not (valid1 and valid2):
print_help('intel_pstate')
sys.exit()
elif opt in ("-t", "--trace_file"):
valid1 = True
location = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
filename = os.path.join(location, arg)
elif opt in ("-i", "--interval"):
valid1 = True
interval = arg
elif opt in ("-c", "--cpu"):
cpu_list = arg
elif opt in ("-n", "--name"):
valid2 = True
testname = arg
elif opt in ("-m", "--memory"):
memory = arg
if not (valid1 and valid2):
print_help()
sys.exit()
if cpu_list:
for p in re.split("[,]", cpu_list):
if int(p) < MAX_CPUS :
cpu_mask[int(p)] = 1
else:
for i in range (0, MAX_CPUS):
cpu_mask[i] = 1
if cpu_list:
for p in re.split("[,]", cpu_list):
if int(p) < MAX_CPUS :
cpu_mask[int(p)] = 1
else:
for i in range (0, MAX_CPUS):
cpu_mask[i] = 1
if not os.path.exists('results'):
os.mkdir('results')
# The regular user needs to own the directory, not root.
fix_ownership('results')
if not os.path.exists('results'):
os.mkdir('results')
os.chdir('results')
if os.path.exists(testname):
print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.')
sys.exit()
os.mkdir(testname)
# The regular user needs to own the directory, not root.
fix_ownership('results')
fix_ownership(testname)
os.chdir(testname)
os.chdir('results')
if os.path.exists(testname):
print('The test name directory already exists. Please provide a unique test name. Test re-run not supported, yet.')
sys.exit()
os.mkdir(testname)
# The regular user needs to own the directory, not root.
fix_ownership(testname)
os.chdir(testname)
# Temporary (or perhaps not)
cur_version = sys.version_info
print('python version (should be >= 2.7):')
print(cur_version)
# Temporary (or perhaps not)
cur_version = sys.version_info
print('python version (should be >= 2.7):')
print(cur_version)
# Left as "cleanup" for potential future re-run ability.
cleanup_data_files()
# Left as "cleanup" for potential future re-run ability.
cleanup_data_files()
if interval:
filename = "/sys/kernel/debug/tracing/trace"
clear_trace_file()
set_trace_buffer_size(memory)
enable_trace(trace_file)
print('Sleeping for ', interval, 'seconds')
time.sleep(int(interval))
disable_trace(trace_file)
if interval:
filename = "/sys/kernel/debug/tracing/trace"
clear_trace_file()
set_trace_buffer_size()
enable_trace()
print('Sleeping for ', interval, 'seconds')
time.sleep(int(interval))
disable_trace()
current_max_cpu = 0
current_max_cpu = 0
read_trace_data(filename, cpu_mask)
read_trace_data(filename)
if interval:
clear_trace_file()
# Free the memory
free_trace_buffer()
if interval:
clear_trace_file()
# Free the memory
free_trace_buffer()
if graph_data_present == False:
print('No valid data to plot')
sys.exit(2)
if graph_data_present == False:
print('No valid data to plot')
sys.exit(2)
for cpu_no in range(0, current_max_cpu + 1):
plot_perf_busy_with_sample(cpu_no)
plot_perf_busy(cpu_no)
plot_durations(cpu_no)
plot_loads(cpu_no)
for cpu_no in range(0, current_max_cpu + 1):
plot_perf_busy_with_sample(cpu_no)
plot_perf_busy(cpu_no)
plot_durations(cpu_no)
plot_loads(cpu_no)
plot_pstate_cpu_with_sample()
plot_pstate_cpu()
plot_load_cpu()
plot_frequency_cpu()
plot_duration_cpu()
plot_scaled_cpu()
plot_boost_cpu()
plot_ghz_cpu()
plot_pstate_cpu_with_sample()
plot_pstate_cpu()
plot_load_cpu()
plot_frequency_cpu()
plot_duration_cpu()
plot_scaled_cpu()
plot_boost_cpu()
plot_ghz_cpu()
# It is preferrable, but not necessary, that the regular user owns the files, not root.
for root, dirs, files in os.walk('.'):
for f in files:
fix_ownership(f)
# It is preferrable, but not necessary, that the regular user owns the files, not root.
for root, dirs, files in os.walk('.'):
for f in files:
fix_ownership(f)
os.chdir('../../')
os.chdir('../../')

View file

@ -2323,7 +2323,7 @@ int skx_pkg_cstate_limits[16] =
};
int icx_pkg_cstate_limits[16] =
{ PCL__0, PCL__2, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV,
{ PCL__0, PCL__2, PCL__6, PCL__6, PCLRSV, PCLRSV, PCLRSV, PCLUNL, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV, PCLRSV,
PCLRSV, PCLRSV
};