Commit Graph

17 Commits

Author SHA1 Message Date
Lukasz Luba bdefd9913b powercap: DTPM: Fix missing cpufreq_cpu_put() calls
The policy returned by cpufreq_cpu_get() has to be released with
the help of cpufreq_cpu_put() to balance its kobject reference counter
properly.

Add the missing calls to cpufreq_cpu_put() in the code.

Fixes: 0aea2e4ec2 ("powercap/dtpm_cpu: Reset per_cpu variable in the release function")
Fixes: 0e8f68d7f0 ("powercap/drivers/dtpm: Add CPU energy model based support")
Cc: v5.16+ <stable@vger.kernel.org> # v5.16+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-05 20:51:24 +01:00
Lukasz Luba b817f1488f powercap: DTPM: Fix unneeded conversions to micro-Watts
The power values coming from the Energy Model are already in uW.

The PowerCap and DTPM frameworks operate on uW, so all places should
just use the values from the EM.

Fix the code by removing all of the conversion to uW still present in it.

Fixes: ae6ccaa650 (PM: EM: convert power field to micro-Watts precision and align drivers)
Cc: 5.19+ <stable@vger.kernel.org> # v5.19+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28 15:15:14 +01:00
Linus Torvalds a771ea6413 Power management updates for 5.20-rc1
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
 
  - Drop unnecessary CPU hotplug locking from store() used by cpufreq
    sysfs attributes (Viresh Kumar).
 
  - Make the ACPI cpufreq driver support the boost control interface on
    Zhaoxin/Centaur processors (Tony W Wang-oc).
 
  - Print a warning message on attempts to free an active cpufreq policy
    which should never happen (Viresh Kumar).
 
  - Fix grammar in the Kconfig help text for the loongson2 cpufreq
    driver (Randy Dunlap).
 
  - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
    governor (Zhao Liu).
 
  - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
    cpuidle driver (Eiichi Tsukata).
 
  - Modify intel_idle to treat C1 and C1E as independent idle states on
    Sapphire Rapids (Artem Bityutskiy).
 
  - Extend support for wakeirq to callback wrappers used during system
    suspend and resume (Ulf Hansson).
 
  - Defer waiting for device probe before loading a hibernation image
    till the first actual device access to avoid possible deadlocks
    reported by syzbot (Tetsuo Handa).
 
  - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
    Helgaas).
 
  - Add Raptor Lake-P to the list of processors supported by the Intel
    RAPL driver (George D Sworo).
 
  - Add Alder Lake-N and Raptor Lake-P to the list of processors for
    which Power Limit4 is supported in the Intel RAPL driver (Sumeet
    Pawnikar).
 
  - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
    attempting to remove it (Hsin-Yi Wang).
 
  - Change the Energy Model code to represent power in micro-Watts and
    adjust its users accordingly (Lukasz Luba).
 
  - Add new devfreq driver for Mediatek CCI (Cache Coherent
    Interconnect) (Johnson Wang).
 
  - Convert the Samsung Exynos SoC Bus bindings to DT schema of
    exynos-bus.c (Krzysztof Kozlowski).
 
  - Address kernel-doc warnings by adding the description for unused
    fucntion parameters in devfreq core (Mauro Carvalho Chehab).
 
  - Use NULL to pass a null pointer rather than zero according to the
    function propotype in imx-bus.c (Colin Ian King).
 
  - Print error message instead of error interger value in
    tegra30-devfreq.c (Dmitry Osipenko).
 
  - Add checks to prevent setting negative frequency QoS limits for
    CPUs (Shivnandan Kumar).
 
  - Update the pm-graph suite of utilities to the latest revision 5.9
    including multiple improvements (Todd Brandt).
 
  - Drop pme_interrupt reference from the PCI power management
    documentation (Mario Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLoKy8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx3+oQAJNVU+W14EaRPWXQRMuwBC5zk3hb6T9q
 JqmMd8coEd+9/4ABAeRAWso1B26rUzB6JyBvw3lGH9OXInpYmvnJEhEPrTpK2h0D
 U9HxEARuGJolrDm0X9NAkn7tKKMC9GnvPS9z2s7s+N97VFFWC/QiU+PHB0SypGNb
 JxRfbVJZQCuxmNG9UeK+xeHFQ9lM2Z9ZdTxR71G0n7nQPPR+sUvnFufFby3Aogf3
 XnBYfia+YNqkUlefxxwB5a0cFwPXOUGsQkIf4d64gZnq1TgZ+71kht1GEF08PDFl
 wV8v1rOWuXEae8dozuf5xszp/eVyAqzgB+IShT9APREOO3Wg6I16XdBm8R1TGwCK
 JTdZqnm6HVKBNqchEwYViJILX69rrNUT+AwHBWhtKKDNh3qeTuwi/JGTeDVN++en
 xf3TNKx3LV31Nq6nWJFzDGLehfZMnAPkhfYohUBI7FNyblpk4mJRVcZ0bYI7UNnS
 als77uoipvb5KdFCtdhxYBHd/y867NvXKa1qsAuDxusAsfJHf4SnlMdbgOepBH2y
 jJg06CGrMDU3TZ8BL+WpqUYk4irQnAMs/159Txh7A6/dOnTjE7S9NHrENCwmt2og
 FrHSLH1eLX6Sa4RSibiGHPC7mNULP2/TOtryf3zFdlIVcjm3NEU3bnfzx7nlJn05
 8t6ObMxgMhWT
 =XeLV
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are mostly minor improvements all over including new CPU IDs for
  the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
  power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
  updates, documentation cleanups and a new version of the pm-graph
  suite of utilities.

  Specifics:

   - Make cpufreq_show_cpus() more straightforward (Viresh Kumar).

   - Drop unnecessary CPU hotplug locking from store() used by cpufreq
     sysfs attributes (Viresh Kumar).

   - Make the ACPI cpufreq driver support the boost control interface on
     Zhaoxin/Centaur processors (Tony W Wang-oc).

   - Print a warning message on attempts to free an active cpufreq
     policy which should never happen (Viresh Kumar).

   - Fix grammar in the Kconfig help text for the loongson2 cpufreq
     driver (Randy Dunlap).

   - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
     governor (Zhao Liu).

   - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
     cpuidle driver (Eiichi Tsukata).

   - Modify intel_idle to treat C1 and C1E as independent idle states on
     Sapphire Rapids (Artem Bityutskiy).

   - Extend support for wakeirq to callback wrappers used during system
     suspend and resume (Ulf Hansson).

   - Defer waiting for device probe before loading a hibernation image
     till the first actual device access to avoid possible deadlocks
     reported by syzbot (Tetsuo Handa).

   - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
     Helgaas).

   - Add Raptor Lake-P to the list of processors supported by the Intel
     RAPL driver (George D Sworo).

   - Add Alder Lake-N and Raptor Lake-P to the list of processors for
     which Power Limit4 is supported in the Intel RAPL driver (Sumeet
     Pawnikar).

   - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
     attempting to remove it (Hsin-Yi Wang).

   - Change the Energy Model code to represent power in micro-Watts and
     adjust its users accordingly (Lukasz Luba).

   - Add new devfreq driver for Mediatek CCI (Cache Coherent
     Interconnect) (Johnson Wang).

   - Convert the Samsung Exynos SoC Bus bindings to DT schema of
     exynos-bus.c (Krzysztof Kozlowski).

   - Address kernel-doc warnings by adding the description for unused
     function parameters in devfreq core (Mauro Carvalho Chehab).

   - Use NULL to pass a null pointer rather than zero according to the
     function propotype in imx-bus.c (Colin Ian King).

   - Print error message instead of error interger value in
     tegra30-devfreq.c (Dmitry Osipenko).

   - Add checks to prevent setting negative frequency QoS limits for
     CPUs (Shivnandan Kumar).

   - Update the pm-graph suite of utilities to the latest revision 5.9
     including multiple improvements (Todd Brandt).

   - Drop pme_interrupt reference from the PCI power management
     documentation (Mario Limonciello)"

* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
  powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
  PM: QoS: Add check to make sure CPU freq is non-negative
  PM: hibernate: defer device probing when resuming from hibernation
  intel_idle: make SPR C1 and C1E be independent
  cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
  cpufreq: loongson2: fix Kconfig "its" grammar
  pm-graph v5.9
  cpufreq: Warn users while freeing active policy
  cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
  firmware: arm_scmi: Get detailed power scale from perf
  Documentation: EM: Switch to micro-Watts scale
  PM: EM: convert power field to micro-Watts precision and align drivers
  PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
  PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
  PM / devfreq: shut up kernel-doc warnings
  dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
  PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
  dt-bindings: interconnect: Add MediaTek CCI dt-bindings
  PM: domains: Ensure genpd_debugfs_dir exists before remove
  PM: runtime: Extend support for wakeirq for force_suspend|resume
  ...
2022-08-02 11:17:00 -07:00
Lukasz Luba ae6ccaa650 PM: EM: convert power field to micro-Watts precision and align drivers
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.

Update all EM users which access 'power' field and assume the value is
in milli-Watts.

Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.

Example calculation which shows the rounding error and impact:

power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz

power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18

power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21

max_freq = 2000MHz

cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000

cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203

The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-15 19:17:30 +02:00
Dietmar Eggemann bb44799949 sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()
effective_cpu_util() already has a `int cpu' parameter which allows to
retrieve the CPU capacity scale factor (or maximum CPU capacity) inside
this function via an arch_scale_cpu_capacity(cpu).

A lot of code calling effective_cpu_util() (or the shim
sched_cpu_util()) needs the maximum CPU capacity, i.e. it will call
arch_scale_cpu_capacity() already.
But not having to pass it into effective_cpu_util() will make the EAS
wake-up code easier, especially when the maximum CPU capacity reduced
by the thermal pressure is passed through the EAS wake-up functions.

Due to the asymmetric CPU capacity support of arm/arm64 architectures,
arch_scale_cpu_capacity(int cpu) is a per-CPU variable read access via
per_cpu(cpu_scale, cpu) on such a system.
On all other architectures it is a a compile-time constant
(SCHED_CAPACITY_SCALE).

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20220621090414.433602-4-vdonnefort@google.com
2022-06-28 09:17:46 +02:00
Lukasz Luba 985a67709a powercap: DTPM: Check for Energy Model type
The Energy Model power values might be artificial. In such case
it's safe to bail out during the registration, since the PowerCap
framework supports only micro-Watts.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-13 16:26:18 +02:00
Daniel Lezcano bfded2ca8f powercap/dtpm_cpu: Add exit function
Now that we can destroy the hierarchy, the code must remove what it
had put in place at the creation. In our case, the cpu hotplug
callbacks.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-6-daniel.lezcano@linaro.org
2022-02-23 19:46:25 +01:00
Daniel Lezcano 0aea2e4ec2 powercap/dtpm_cpu: Reset per_cpu variable in the release function
The release function does not reset the per cpu variable when it is
called. That will prevent creation again as the variable will be
already from the previous creation.

Fix it by resetting them.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-2-daniel.lezcano@linaro.org
2022-02-23 19:45:22 +01:00
Daniel Lezcano 73dbcb6e37 powercap/drivers/dtpm: Add CPU DT initialization support
Based on the previous DT changes in the core code, use the 'setup'
callback to initialize the CPU DTPM backend.

Code is reorganized to stick to the DTPM table description. No
functional changes.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-4-daniel.lezcano@linaro.org
2022-02-04 17:38:09 +01:00
Daniel Lezcano b9794a8222 powercap/drivers/dtpm: Convert the init table section to a simple array
The init table section is freed after the system booted. However the
next changes will make per module the DTPM description, so the table
won't be accessible when the module is loaded.

In order to fix that, we should move the table to the data section
where there are very few entries and that makes strange to add it
there.

The main goal of the table was to keep self-encapsulated code and we
can keep it almost as it by using an array instead.

Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-2-daniel.lezcano@linaro.org
2022-02-04 17:38:09 +01:00
Daniel Lezcano 4d1cd1443d powercap: DTPM: Fix suspend failure and kernel warning
When the ENERGY_MODEL and DTPM_CPU are enabled but actually without
any energy model, at cpu hotplug time, the dead cpuhp callback fails
leading to the warning.

Actually, the check could be simplified and we only do an action if
the dtpm cpu is enabled, otherwise we bail out without error.

Fixes: 7a89d7eacf ("powercap/drivers/dtpm: Simplify the dtpm table")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-10 21:18:14 +01:00
Daniel Lezcano eb82bace89 powercap/drivers/dtpm: Scale the power with the load
Currently the power consumption is based on the current OPP power
assuming the entire performance domain is fully loaded.

That gives very gross power estimation and we can do much better by
using the load to scale the power consumption.

Use the utilization to normalize and scale the power usage over the
max possible power.

Tested on a rock960 with 2 big CPUS, the power consumption estimation
conforms with the expected one.

Before this change:

~$ ~/dhrystone -t 1 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
2260000

After this change:

~$ ~/dhrystone -t 1 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
1130000

~$ ~/dhrystone -t 2 -l 10000&
~$ cat /sys/devices/virtual/powercap/dtpm/dtpm:0/dtpm:0:1/constraint_0_max_power_uw
2260000

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-5-daniel.lezcano@linaro.org
2021-10-21 16:03:31 +02:00
Daniel Lezcano d2cdc6adc3 powercap/drivers/dtpm: Use container_of instead of a private data field
The dtpm framework provides an API to allocate a dtpm node. However
when a backend dtpm driver needs to allocate a dtpm node it must
define its own structure and store the pointer of this structure in
the private field of the dtpm structure.

It is more elegant to use the container_of macro and add the dtpm
structure inside the dtpm backend specific structure. The code will be
able to deal properly with the dtpm structure as a generic entity,
making all this even more self-encapsulated.

The dtpm_alloc() function does no longer make sense as the dtpm
structure will be allocated when allocating the device specific dtpm
structure. The dtpm_init() is provided instead.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-4-daniel.lezcano@linaro.org
2021-10-21 16:03:31 +02:00
Daniel Lezcano 7a89d7eacf powercap/drivers/dtpm: Simplify the dtpm table
The dtpm table is an array of pointers, that forces the user of the
table to define initdata along with the declaration of the table
entry. It is more efficient to create an array of dtpm structure, so
the declaration of the table entry can be done by initializing the
different fields.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-3-daniel.lezcano@linaro.org
2021-10-21 16:03:31 +02:00
Daniel Lezcano 4570ddda43 powercap/drivers/dtpm: Encapsulate even more the code
In order to increase the self-encapsulation of the dtpm generic code,
the following changes are adding a power update ops to the dtpm
ops. That allows the generic code to call directly the dtpm backend
function to update the power values.

The power update function does compute the power characteristics when
the function is invoked. In the case of the CPUs, the power
consumption depends on the number of online CPUs. The online CPUs mask
is not up to date at CPUHP_AP_ONLINE_DYN state in the tear down
callback. That is the reason why the online / offline are at separate
state. As there is already an existing state for DTPM, this one is
only moved to the DEAD state, so there is no addition of new state
with these changes. The dtpm node is not removed when the cpu is
unplugged.

That simplifies the code for the next changes and results in a more
self-encapsulated code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210312130411.29833-1-daniel.lezcano@linaro.org
2021-10-21 16:03:31 +02:00
Colin Ian King 66e713fbbb powercap/drivers/dtpm: Fix size of object being allocated
The kzalloc allocation for dtpm_cpu is currently allocating the size
of the pointer and not the size of the structure. Fix this by using
the correct sizeof argument.

Addresses-Coverity: ("Wrong sizeof argument")
Fixes: 0e8f68d7f0 ("powercap/drivers/dtpm: Add CPU energy model based support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-07 18:16:03 +01:00
Daniel Lezcano 0e8f68d7f0 powercap/drivers/dtpm: Add CPU energy model based support
With the powercap dtpm controller, we are able to plug devices with
power limitation features in the tree.

The following patch introduces the CPU power limitation based on the
energy model and the performance states.

The power limitation is done at the performance domain level. If some
CPUs are unplugged, the corresponding power will be subtracted from
the performance domain total power.

It is up to the platform to initialize the dtpm tree and add the CPU.

Here is an example to create a simple tree with one root node called
"pkg" and the CPU's performance domains.

static int dtpm_register_pkg(struct dtpm_descr *descr)
{
	struct dtpm *pkg;
	int ret;

	pkg = dtpm_alloc(NULL);
	if (!pkg)
		return -ENOMEM;

	ret = dtpm_register(descr->name, pkg, descr->parent);
	if (ret)
		return ret;

	return dtpm_register_cpu(pkg);
}

static struct dtpm_descr descr = {
	.name = "pkg",
	.init = dtpm_register_pkg,
};
DTPM_DECLARE(descr);

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-22 19:50:40 +01:00