Commit Graph

16 Commits

Author SHA1 Message Date
Peter Zijlstra e6a15fa9ea cpuidle: Use local_clock_noinstr()
With the introduction of local_clock_noinstr(), local_clock() itself
is no longer marked noinstr, use the correct function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V
Link: https://lore.kernel.org/r/20230519102716.045980863@infradead.org
2023-06-05 21:11:09 +02:00
Peter Zijlstra 4d627628d7 cpuidle: Fix poll_idle() noinstr annotation
The instrumentation_begin()/end() annotations in poll_idle() were
complete nonsense. Specifically they caused tracing to happen in the
middle of noinstr code, resulting in RCU splats.

Now that local_clock() is noinstr, mark up the rest and let it rip.

Fixes: 00717eb8c9 ("cpuidle: Annotate poll_idle()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/oe-lkp/202301192148.58ece903-oliver.sang@intel.com
Link: https://lore.kernel.org/r/20230126151323.819534689@infradead.org
2023-01-31 15:01:47 +01:00
Peter Zijlstra 00717eb8c9 cpuidle: Annotate poll_idle()
The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.312601331@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra 5e26aa9339 cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls
Make cpuidle_state::enter() methods IRQ state invariant on exit.

Additionally make sure to use raw_local_irq_*() methods since this
cpuidle callback will be called with RCU already disabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.515253662@infradead.org
2023-01-13 11:03:21 +01:00
Rafael J. Wysocki ba1e78a1dc cpuidle: Drop disabled field from struct cpuidle_state
After recent cpuidle updates the "disabled" field in struct
cpuidle_state is only used by two drivers (intel_idle and shmobile
cpuidle) for marking unusable idle states, but that may as well be
achieved with the help of a state flag, so define an "unusable" idle
state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
it instead of the "disabled" field and make the core set
CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
set.

After the above changes, the "disabled" field in struct cpuidle_state
is not used any more, so drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-29 11:48:39 +01:00
Rafael J. Wysocki c1d51f684c cpuidle: Use nanoseconds as the unit of time
Currently, the cpuidle subsystem uses microseconds as the unit of
time which (among other things) causes the idle loop to incur some
integer division overhead for no clear benefit.

In order to allow cpuidle to measure time in nanoseconds, add two
new fields, exit_latency_ns and target_residency_ns, to represent the
exit latency and target residency of an idle state in nanoseconds,
respectively, to struct cpuidle_state and initialize them with the
help of the corresponding values in microseconds provided by drivers.
Additionally, change cpuidle_governor_latency_req() to return the
idle state exit latency constraint in nanoseconds.

Also meeasure idle state residency (last_residency_ns in struct
cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
and update the cpuidle core and governors accordingly.

However, the menu governor still computes typical intervals in
microseconds to avoid integer overflows.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
2019-11-11 21:56:07 +01:00
Marcelo Tosatti 259231a045 cpuidle: add poll_limit_ns to cpuidle_device structure
Add a poll_limit_ns variable to cpuidle_device structure.

Calculate and configure it in the new cpuidle_poll_time
function, in case its zero.

Individual governors are allowed to override this value.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-30 17:27:37 +02:00
Thomas Gleixner 55716d2643 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428
Based on 1 normalized pattern(s):

  this file is released under the gplv2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 68 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:16 +02:00
Doug Smythies 1617971c66 cpuidle: poll_state: Fix default time limit
The default time is declared in units of microsecnds,
but is used as nanoseconds, resulting in significant
accounting errors for idle state 0 time when all idle
states deeper than 0 are disabled.

Under these unusual conditions, we don't really care
about the poll time limit anyhow.

Fixes: 800fb34a99 ("cpuidle: poll_state: Disregard disable idle states")
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-01-30 22:57:42 +01:00
Rafael J. Wysocki 800fb34a99 cpuidle: poll_state: Disregard disable idle states
When computing the limit of time to spend in the loop in poll_idle(),
use the target residency of the first enabled idle state deeper than
state 0 instead of always using the target residency of state 1.

This helps when state 1 is disabled for diagnostics, for instance.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-11 12:07:07 +01:00
Rafael J. Wysocki 01bad1c689 cpuidle: poll_state: Revise loop termination condition
If need_resched() returns "false", breaking out of the loop in
poll_idle() will cause a new idle state to be selected, so in fact
it usually doesn't make sense to spin in it longer than the target
residency of the second state.  [Note that the "polling" state is
used only if there is at least one "real" state defined in addition
to it, so the second state is always there.]  On the other hand,
breaking out of it early (say in case the next state is disabled)
shouldn't hurt as it is polling anyway.

For this reason, make the loop in poll_idle() break if the CPU has
been spinning longer than the target residency of the second state
(the "polling" state can only be state[0]).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-04 19:27:27 +02:00
Rafael J. Wysocki 5f26bdceb9 cpuidle: menu: Fix wakeup statistics updates for polling state
If the CPU exits the "polling" state due to the time limit in the
loop in poll_idle(), this is not a real wakeup and it just means
that the "polling" state selection was not adequate.  The governor
mispredicted short idle duration, but had a more suitable state been
selected, the CPU might have spent more time in it.  In fact, there
is no reason to expect that there would have been a wakeup event
earlier than the next timer in that case.

Handling such cases as regular wakeups in menu_update() may cause the
menu governor to make suboptimal decisions going forward, but ignoring
them altogether would not be correct either, because every time
menu_select() is invoked, it makes a separate new attempt to predict
the idle duration taking distinct time to the closest timer event as
input and the outcomes of all those attempts should be recorded.

For this reason, make menu_update() always assume that if the
"polling" state was exited due to the time limit, the next proper
wakeup event for the CPU would be the next timer event (not
including the tick).

Fixes: a37b969a61 "cpuidle: poll_state: Add time limit to poll_idle()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-10-04 10:23:37 +02:00
Rafael J. Wysocki 4dc2375c1a cpuidle: poll_state: Avoid invoking local_clock() too often
Rik reports that he sees an increase in CPU use in one benchmark
due to commit 612f1a22f067 "cpuidle: poll_state: Add time limit to
poll_idle()" that caused poll_idle() to call local_clock() in every
iteration of the loop.  Utilization increase generally means more
non-idle time with respect to total CPU time (on the average) which
implies reduced CPU frequency.

Doug reports that limiting the rate of local_clock() invocations
in there causes much less power to be drawn during a CPU-intensive
parallel workload (with idle states 1 and 2 disabled to enforce more
state 0 residency).

These two reports together suggest that executing local_clock() on
multiple CPUs in parallel at a high rate may cause chips to get hot
and trigger thermal/power limits on them to kick in, so reduce the
rate of local_clock() invocations in poll_idle() to avoid that issue.

Fixes: 612f1a22f067 "cpuidle: poll_state: Add time limit to poll_idle()"
Reported-by: Rik van Riel <riel@surriel.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
2018-03-29 13:06:08 +02:00
Rafael J. Wysocki a37b969a61 cpuidle: poll_state: Add time limit to poll_idle()
If poll_idle() is allowed to spin until need_resched() returns 'true',
it may actually spin for a much longer time than expected by the idle
governor, since set_tsk_need_resched() is not always called by the
timer interrupt handler.  If that happens, the CPU may spend much
more time than anticipated in the "polling" state.

To prevent that from happening, limit the time of the spinning loop
in poll_idle().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
2018-03-29 13:06:07 +02:00
Rafael J. Wysocki 1b39e3f813 cpuidle: Make drivers initialize polling state
Make the drivers that want to include the polling state into their
states table initialize it explicitly and drop the initialization of
it (which in fact is conditional, but that is not obvious from the
code) from the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30 03:06:33 +02:00
Rafael J. Wysocki 34c2f65b71 cpuidle: Move polling state initialization code to separate file
Move the polling state initialization code to a separate file built
conditionally on CONFIG_ARCH_HAS_CPU_RELAX to get rid of the #ifdef
in driver.c.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-30 03:06:27 +02:00