Commit graph

1153449 commits

Author SHA1 Message Date
Peter Zijlstra
89b3098703 arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
9b461a6faa cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS
objtool to the rescue:

  vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
821ad23d0e cpuidle, intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE
Fix instrumentation bugs objtool found:

  vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() leaves .noinstr.text section
  vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.494977795@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
6d9c7f51b1 cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
So objtool found this bug:

  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f2 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f2 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
2b5a0e425e objtool/idle: Validate __cpuidle code as noinstr
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
00717eb8c9 cpuidle: Annotate poll_idle()
The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.312601331@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
8ce78470bf acpi_idle: Remove tracing
All the idle routines are called with RCU disabled, as such there must
not be any tracing inside.

While there; clean-up the io-port idle thing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.251666856@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
924aed1646 cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()
All callers should still have RCU enabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.190860672@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
a01353cf18 cpuidle: Fix ct_idle_*() usage
The whole disable-RCU, enable-IRQS dance is very intricate since
changing IRQ state is traced, which depends on RCU.

Add two helpers for the cpuidle case that mirror the entry code:

  ct_cpuidle_enter()
  ct_cpuidle_exit()

And fix all the cases where the enter/exit dance was buggy.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.130014793@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
0c5ffc3d7b cpuidle, dt: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is suboptimal.

Notably: this converts all dt_init_idle_driver() and
__CPU_PM_CPU_IDLE_ENTER() users for they are inextrably intertwined.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.068981667@infradead.org
2023-01-13 11:48:00 +01:00
Peter Zijlstra
c3d42418dc cpuidle, OMAP4: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, some *four* times, before going idle is suboptimal.

Notably three times explicitly using RCU_NONIDLE() and once implicitly
through cpu_pm_*().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195540.007918454@infradead.org
2023-01-13 11:47:49 +01:00
Peter Zijlstra
4ce40e9dbe cpuidle, armada: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is suboptimal.

Notably the cpu_pm_*() calls implicitly re-enable RCU for a bit.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.946630819@infradead.org
2023-01-13 11:47:27 +01:00
Peter Zijlstra
4d1be9e745 cpuidle, OMAP3: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then teporarily enable it
again before going idle is suboptimal.

Notably the cpu_pm_*() calls implicitly re-enable RCU for a bit.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.883561913@infradead.org
2023-01-13 11:47:19 +01:00
Peter Zijlstra
b3f46658ce cpuidle, ARM/imx6: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is suboptimal.

Notably both cpu_pm_enter() and cpu_cluster_pm_enter() implicity
re-enable RCU.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.821714572@infradead.org
2023-01-13 11:46:40 +01:00
Peter Zijlstra
e038f7b802 cpuidle, psci: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is suboptimal.

Notably once implicitly through the cpu_pm_*() calls and once
explicitly doing ct_irq_*_irqon().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.760296658@infradead.org
2023-01-13 11:03:22 +01:00
Peter Zijlstra
5fca0d9f5d cpuidle, tegra: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is suboptimal.

Notably once implicitly through the cpu_pm_*() calls and once
explicitly doing RCU_NONIDLE().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.699546331@infradead.org
2023-01-13 11:03:22 +01:00
Peter Zijlstra
8e9ab9e8da cpuidle, riscv: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is suboptimal.

That is, once implicitly through the cpu_pm_*() calls and once
explicitly doing ct_irq_*_irqon().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.637185846@infradead.org
2023-01-13 11:03:22 +01:00
Peter Zijlstra
bb7b112585 cpuidle: Move IRQ state validation
Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.576412812@infradead.org
2023-01-13 11:03:22 +01:00
Peter Zijlstra
5e26aa9339 cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls
Make cpuidle_state::enter() methods IRQ state invariant on exit.

Additionally make sure to use raw_local_irq_*() methods since this
cpuidle callback will be called with RCU already disabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.515253662@infradead.org
2023-01-13 11:03:21 +01:00
Peter Zijlstra
aaa3896b96 x86/idle: Replace 'x86_idle' function pointer with a static_call
Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.453613251@infradead.org
2023-01-13 11:03:21 +01:00
Peter Zijlstra
1f7c232ee0 x86/perf/amd: Remove tracing from perf_lopwr_cb()
The perf_lopwr_cb() function is called from the idle routines; there
is no RCU there, we must not enter tracing.

Use __always_inline, noidle annotations and existing no-trace methods.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.392862891@infradead.org
2023-01-13 11:03:21 +01:00
Mathieu Desnoyers
28c8e08842 rseq: Increase AT_VECTOR_SIZE_BASE to match rseq auxvec entries
Two new auxiliary vector entries are introduced for rseq without
matching increment of the AT_VECTOR_SIZE_BASE, which causes failures
with CONFIG_HARDENED_USERCOPY=y.

Fixes: 317c8194e6 ("rseq: Introduce feature size and alignment ELF auxiliary vector entries")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230104192054.34046-1-mathieu.desnoyers@efficios.com
2023-01-12 11:22:22 +01:00
Mathieu Desnoyers
c91b0bf3e5 selftests/rseq: Revert "selftests/rseq: Add mm_numa_cid to test script"
The mm_numa_cid related rseq patches from the series were not picked up
into the tip tree, so enabling the mm_numa_cid test needs to be
reverted.

This reverts commit b344b8f2d8.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/oe-lkp/202301040903.2dd1e25b-oliver.sang@intel.com
2023-01-12 11:22:22 +01:00
Ingo Molnar
cd9f5c3d30 sched/cputime: Fix IA64 build error of missing arch_vtime_task_switch() prototype
The following commit:

  c89970202a ("cputime: remove cputime_to_nsecs fallback")

Removed an <asm/cputime.h> inclusion from <linux/sched/cputime.h>, but this
broke the IA64 build:

    arch/ia64/kernel/time.c:110:6: warning: no previous prototype for 'arch_vtime_task_switch' [-Wmissing-prototypes]

Add in the missing <asm/cputime.h> header to fix it.

Fixes: c89970202a ("cputime: remove cputime_to_nsecs fallback")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-11 10:31:57 +01:00
Michal Clapinski
d74f87f376 selftests/membarrier: Test MEMBARRIER_CMD_GET_REGISTRATIONS
Keep track of previously issued registrations and compare the result
with MEMBARRIER_CMD_GET_REGISTRATIONS return value.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20221207164338.1535591-3-mclapinski@google.com
2023-01-07 11:29:29 +01:00
Michal Clapinski
544a4f2ecd sched/membarrier: Introduce MEMBARRIER_CMD_GET_REGISTRATIONS
Provide a method to query previously issued registrations.

Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20221207164338.1535591-2-mclapinski@google.com
2023-01-07 11:29:29 +01:00
Lukasz Luba
948fb4c4e9 cpufreq, sched/util: Optimize operations with single CPU capacity lookup
The max CPU capacity is the same for all CPUs sharing frequency domain.
There is a way to avoid heavy operations in a loop for each CPU by
leveraging this knowledge. Thus, simplify the looping code in the
sugov_next_freq_shared() and drop heavy multiplications. Instead, use
simple max() to get the highest utilization from these CPUs.

This is useful for platforms with many (4 or 6) little CPUs. We avoid
heavy 2*PD_CPU_NUM multiplications in that loop, which is called billions
of times, since it's not limited by the schedutil time delta filter in
sugov_should_update_freq(). When there was no need to change frequency
the code bailed out, not updating the sg_policy::last_freq_update_time.
Then every visit after delta_ns time longer than the
sg_policy::freq_update_delay_ns goes through and triggers the next
frequency calculation code. Although, if the next frequency, as outcome
of that, would be the same as current frequency, we won't update the
sg_policy::last_freq_update_time and the story will be repeated (in
a very short period, sometimes a few microseconds).

The max CPU capacity must be fetched every time we are called, due to
difficulties during the policy setup, where we are not able to get the
normalized CPU capacity at the right time.

The fetched CPU capacity value is than used in sugov_iowait_apply() to
calculate the right boost. This required a few changes in the local
functions and arguments. The capacity value should hopefully be fetched
once when needed and then passed over CPU registers to those functions.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221208160256.859-2-lukasz.luba@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
2023-01-07 11:25:38 +01:00
Chengming Zhou
160fb0d83f sched/core: Reorganize ttwu_do_wakeup() and ttwu_do_activate()
ttwu_do_activate() is used for a complete wakeup, in which we will
activate_task() and use ttwu_do_wakeup() to mark the task runnable
and perform wakeup-preemption, also call class->task_woken() callback
and update the rq->idle_stamp.

Since ttwu_runnable() is not a complete wakeup, don't need all those
done in ttwu_do_wakeup(), so we can move those to ttwu_do_activate()
to simplify ttwu_do_wakeup(), making it only mark the task runnable
to be reused in ttwu_runnable() and try_to_wake_up().

This patch should not have any functional changes.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221223103257.4962-2-zhouchengming@bytedance.com
2023-01-07 10:48:38 +01:00
Chengming Zhou
efe0938586 sched/core: Micro-optimize ttwu_runnable()
ttwu_runnable() is used as a fast wakeup path when the wakee task
is running on CPU or runnable on RQ, in both cases we can just
set its state to TASK_RUNNING to prevent a sleep.

If the wakee task is on_cpu running, we don't need to update_rq_clock()
or check_preempt_curr().

But if the wakee task is on_rq && !on_cpu (e.g. an IRQ hit before
the task got to schedule() and the task been preempted), we should
check_preempt_curr() to see if it can preempt the current running.

This also removes the class->task_woken() callback from ttwu_runnable(),
which wasn't required per the RT/DL implementations: any required push
operation would have been queued during class->set_next_task() when p
got preempted.

ttwu_runnable() also loses the update to rq->idle_stamp, as by definition
the rq cannot be idle in this scenario.

Suggested-by: Valentin Schneider <vschneid@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221223103257.4962-1-zhouchengming@bytedance.com
2023-01-07 10:48:38 +01:00
Qais Yousef
acbee592f1 sched/documentation: Document the util clamp feature
Add a document explaining the util clamp feature: what it is and
how to use it. The new document hopefully covers everything one needs to
know about uclamp.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20221216235716.201923-1-qyousef@layalina.io
Cc: Jonathan Corbet <corbet@lwn.net>
2023-01-05 12:08:34 +01:00
Bing Huang
ef90cf2281 sched/topology: Add __init for sched_init_domains()
sched_init_domains() is only used in initialization

Signed-off-by: Bing Huang <huangbing@kylinos.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230105014943.9857-1-huangbing775@126.com
2023-01-05 11:42:13 +01:00
Mathieu Desnoyers
bbd0b03150 sched/rseq: Fix concurrency ID handling of usermodehelper kthreads
sched_mm_cid_after_execve() does not expect NULL t->mm, but it may happen
if a usermodehelper kthread fails when attempting to execute a binary.

sched_mm_cid_fork() can be issued from a usermodehelper kthread, which
has t->flags PF_KTHREAD set.

Fixes: af7f588d8f ("sched: Introduce per-memory-map concurrency ID")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/oe-lkp/202212301353.5c959d72-yujie.liu@intel.com
2023-01-02 16:34:12 +01:00
Nicholas Piggin
c89970202a cputime: remove cputime_to_nsecs fallback
The archs that use cputime_to_nsecs() internally provide their own
definition and don't need the fallback. cputime_to_usecs() unused except
in this fallback, and is not defined anywhere.

This removes the final remnant of the cputime_t code from the kernel.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20221220070705.2958959-1-npiggin@gmail.com
2022-12-27 12:52:17 +01:00
Hao Jia
8589018acc sched/core: Adjusting the order of scanning CPU
When select_idle_capacity() starts scanning for an idle CPU, it starts
with target CPU that has already been checked in select_idle_sibling().
So we start checking from the next CPU and try the target CPU at the end.
Similarly for task_numa_assign(), we have just checked numa_migrate_on
of dst_cpu, so start from the next CPU. This also works for
steal_cookie_task(), the first scan must fail and start directly
from the next one.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20221216062406.7812-3-jiahao.os@bytedance.com
2022-12-27 12:52:17 +01:00
Hao Jia
feaed76376 sched/numa: Stop an exhastive search if an idle core is found
In update_numa_stats() we try to find an idle cpu on the NUMA node,
preferably an idle core. we can stop looking for the next idle core
or idle cpu after finding an idle core. But we can't stop the
whole loop of scanning the CPU, because we need to calculate
approximate NUMA stats at a point in time. For example,
the src and dst nr_running is needed by task_numa_find_cpu().

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20221216062406.7812-2-jiahao.os@bytedance.com
2022-12-27 12:52:16 +01:00
Matthew Wilcox (Oracle)
904cbab71d sched: Make const-safe
With a modified container_of() that preserves constness, the compiler
finds some pointers which should have been marked as const.  task_of()
also needs to become const-preserving for the !FAIR_GROUP_SCHED case so
that cfs_rq_of() can take a const argument.  No change to generated code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221212144946.2657785-1-willy@infradead.org
2022-12-27 12:52:16 +01:00
Mathieu Desnoyers
b344b8f2d8 selftests/rseq: Add mm_numa_cid to test script
Add mm_numa_cid tests to the run_param_test.sh test script.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221216145332.205095-1-mathieu.desnoyers@efficios.com
2022-12-27 12:52:16 +01:00
Mathieu Desnoyers
14a8644d4f tracing/rseq: Add mm_cid field to rseq_update
Add the mm_cid field to the rseq_update event, allowing tracers to
follow which mm_cid is observed by user-space, and whether negative
mm_cid values are visible in case of internal scheduler implementation
issues.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-22-mathieu.desnoyers@efficios.com
2022-12-27 12:52:15 +01:00
Mathieu Desnoyers
a3798e6ffb selftests/rseq: parametrized test: Report/abort on negative concurrency ID
Report and abort when a negative concurrency ID value is observed by the
spinlock test.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-21-mathieu.desnoyers@efficios.com
2022-12-27 12:52:15 +01:00
Mathieu Desnoyers
ee31fff091 selftests/rseq: Implement parametrized mm_cid test
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".

Build a new param_test_mm_cid, param_test_mm_cid_benchmark, and
param_test_mm_cid_compare_twice executables to test the new "mm_cid"
rseq field.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-20-mathieu.desnoyers@efficios.com
2022-12-27 12:52:15 +01:00
Mathieu Desnoyers
cead720627 selftests/rseq: Implement basic percpu ops mm_cid test
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".

Build a new basic_percpu_ops_mm_cid_test to test the new "mm_cid" rseq
field.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-19-mathieu.desnoyers@efficios.com
2022-12-27 12:52:15 +01:00
Mathieu Desnoyers
171586a6ab selftests/rseq: riscv: Template memory ordering and percpu access mode
Introduce a rseq-riscv-bits.h template header which is internally included
to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-18-mathieu.desnoyers@efficios.com
2022-12-27 12:52:14 +01:00
Mathieu Desnoyers
a94af3c584 selftests/rseq: s390: Template memory ordering and percpu access mode
Introduce a rseq-s390-bits.h template header which is internally included
to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-17-mathieu.desnoyers@efficios.com
2022-12-27 12:52:14 +01:00
Mathieu Desnoyers
e61bd94c2b selftests/rseq: ppc: Template memory ordering and percpu access mode
Introduce a rseq-ppc-bits.h template header which is internally included
to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-16-mathieu.desnoyers@efficios.com
2022-12-27 12:52:14 +01:00
Mathieu Desnoyers
431b63285a selftests/rseq: mips: Template memory ordering and percpu access mode
Introduce a rseq-mips-bits.h template header which is internally
included to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-15-mathieu.desnoyers@efficios.com
2022-12-27 12:52:13 +01:00
Mathieu Desnoyers
8d4eeb8bb4 selftests/rseq: arm64: Template memory ordering and percpu access mode
Introduce a rseq-arm64-bits.h template header which is internally
included to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-14-mathieu.desnoyers@efficios.com
2022-12-27 12:52:13 +01:00
Mathieu Desnoyers
5bf4aba38a selftests/rseq: arm: Template memory ordering and percpu access mode
Introduce a rseq-arm-bits.h template header which is internally included
to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-13-mathieu.desnoyers@efficios.com
2022-12-27 12:52:13 +01:00
Mathieu Desnoyers
ae31573843 selftests/rseq: x86: Template memory ordering and percpu access mode
Introduce a rseq-x86-bits.h template header which is internally included
to generate the static inline functions covering:

- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.

This introduces changes to the rseq.h selftests API which require to
update the rseq selftest programs. Similar API/templating changes need
to be done for other architectures.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-12-mathieu.desnoyers@efficios.com
2022-12-27 12:52:12 +01:00
Mathieu Desnoyers
18c2355838 selftests/rseq: Implement rseq mm_cid field support
Add support for the mm_cid field (per-memory-map concurrency ID) of
struct rseq to rseq selftests.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-11-mathieu.desnoyers@efficios.com
2022-12-27 12:52:12 +01:00
Mathieu Desnoyers
72cb1d7f2f selftests/rseq: Remove RSEQ_SKIP_FASTPATH code
This code is not currently build by the test Makefile, adds complexity,
and is not overall useful considering that the abort handling loops to
retry the fast-path.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-10-mathieu.desnoyers@efficios.com
2022-12-27 12:52:12 +01:00