linux-stable/include/linux/sched/isolation.h

74 lines
1.7 KiB
C
Raw Normal View History

#ifndef _LINUX_SCHED_ISOLATION_H
#define _LINUX_SCHED_ISOLATION_H
#include <linux/cpumask.h>
#include <linux/init.h>
#include <linux/tick.h>
enum hk_type {
HK_TYPE_TIMER,
HK_TYPE_RCU,
HK_TYPE_MISC,
HK_TYPE_SCHED,
HK_TYPE_TICK,
HK_TYPE_DOMAIN,
HK_TYPE_WQ,
HK_TYPE_MANAGED_IRQ,
HK_TYPE_KTHREAD,
HK_TYPE_MAX
};
#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
extern int housekeeping_any_cpu(enum hk_type type);
extern const struct cpumask *housekeeping_cpumask(enum hk_type type);
extern bool housekeeping_enabled(enum hk_type type);
extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
extern void __init housekeeping_init(void);
#else
static inline int housekeeping_any_cpu(enum hk_type type)
{
return smp_processor_id();
}
static inline const struct cpumask *housekeeping_cpumask(enum hk_type type)
{
return cpu_possible_mask;
}
static inline bool housekeeping_enabled(enum hk_type type)
KVM: LAPIC: Inject timer interrupt via posted interrupt Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-06 01:26:51 +00:00
{
return false;
}
static inline void housekeeping_affine(struct task_struct *t,
enum hk_type type) { }
sched/isolation: add cpu_is_isolated() API Patch series "memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads". Leonardo has reported [1] that pcp memcg charge draining can interfere with cpu isolated workloads. The said draining is done from a WQ context with a pcp worker scheduled on each CPU which holds any cached charges for a specific memcg hierarchy. Operation is not really a common operation [2]. It can be triggered from the userspace though so some care is definitely due. Leonardo has tried to address the issue by allowing remote charge draining [3]. This approach requires an additional locking to synchronize pcp caches sync from a remote cpu from local pcp consumers. Even though the proposed lock was per-cpu there is still potential for contention and less predictable behavior. This patchset addresses the issue from a different angle. Rather than dealing with a potential synchronization, cpus which are isolated are simply never scheduled to be drained. This means that a small amount of charges could be laying around and waiting for a later use or they are flushed when a different memcg is charged from the same cpu. More details are in patch 2. The first patch from Frederic is implementing an abstraction to tell whether a specific cpu has been isolated and therefore require a special treatment. This patch (of 2): Provide this new API to check if a CPU has been isolated either through isolcpus= or nohz_full= kernel parameter. It aims at avoiding kernel load deemed to be safely spared on CPUs running sensitive workload that can't bear any disturbance, such as pcp cache draining. Link: https://lkml.kernel.org/r/20230317134448.11082-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20230317134448.11082-2-mhocko@kernel.org Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Leonardo Bras <leobras@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-17 13:44:47 +00:00
static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
{
return true;
}
static inline void housekeeping_init(void) { }
#endif /* CONFIG_CPU_ISOLATION */
static inline bool housekeeping_cpu(int cpu, enum hk_type type)
{
#ifdef CONFIG_CPU_ISOLATION
if (static_branch_unlikely(&housekeeping_overridden))
return housekeeping_test_cpu(cpu, type);
#endif
return true;
}
sched/isolation: add cpu_is_isolated() API Patch series "memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads". Leonardo has reported [1] that pcp memcg charge draining can interfere with cpu isolated workloads. The said draining is done from a WQ context with a pcp worker scheduled on each CPU which holds any cached charges for a specific memcg hierarchy. Operation is not really a common operation [2]. It can be triggered from the userspace though so some care is definitely due. Leonardo has tried to address the issue by allowing remote charge draining [3]. This approach requires an additional locking to synchronize pcp caches sync from a remote cpu from local pcp consumers. Even though the proposed lock was per-cpu there is still potential for contention and less predictable behavior. This patchset addresses the issue from a different angle. Rather than dealing with a potential synchronization, cpus which are isolated are simply never scheduled to be drained. This means that a small amount of charges could be laying around and waiting for a later use or they are flushed when a different memcg is charged from the same cpu. More details are in patch 2. The first patch from Frederic is implementing an abstraction to tell whether a specific cpu has been isolated and therefore require a special treatment. This patch (of 2): Provide this new API to check if a CPU has been isolated either through isolcpus= or nohz_full= kernel parameter. It aims at avoiding kernel load deemed to be safely spared on CPUs running sensitive workload that can't bear any disturbance, such as pcp cache draining. Link: https://lkml.kernel.org/r/20230317134448.11082-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20230317134448.11082-2-mhocko@kernel.org Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Leonardo Bras <leobras@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-17 13:44:47 +00:00
static inline bool cpu_is_isolated(int cpu)
{
return !housekeeping_test_cpu(cpu, HK_TYPE_DOMAIN) ||
!housekeeping_test_cpu(cpu, HK_TYPE_TICK);
}
#endif /* _LINUX_SCHED_ISOLATION_H */