linux-stable/include/linux/cpu.h
Linus Torvalds 9244724fbf A large update for SMP management:
- Parallel CPU bringup
 
     The reason why people are interested in parallel bringup is to shorten
     the (kexec) reboot time of cloud servers to reduce the downtime of the
     VM tenants.
 
     The current fully serialized bringup does the following per AP:
 
       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state
 
     There are two significant delays:
 
       #3 The time for an AP to report alive state in start_secondary() on
          x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.
 
       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending on
          the microcode patch size to apply.
 
     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to come
     up and apply microcode. That's more than 80% of the actual onlining
     procedure.
 
     This can be reduced significantly by splitting the bringup mechanism
     into two parts:
 
       1) Run the prepare callbacks and kick the AP alive for each AP which
       	 needs to be brought up.
 
 	 The APs wake up, do their firmware initialization and run the low
       	 level kernel startup code including microcode loading in parallel
       	 up to the first synchronization point. (#1 and #2 above)
 
       2) Run the rest of the bringup code strictly serialized per CPU
       	 (#3 - #5 above) as it's done today.
 
 	 Parallelizing that stage of the CPU bringup might be possible in
 	 theory, but it's questionable whether required surgery would be
 	 justified for a pretty small gain.
 
     If the system is large enough the first AP is already waiting at the
     first synchronization point when the boot CPU finished the wake-up of
     the last AP. That reduces the AP bringup time on that SKL from ~800ms
     to ~80ms, i.e. by a factor ~10x.
 
     The actual gain varies wildly depending on the system, CPU, microcode
     patch size and other factors. There are some opportunities to reduce
     the overhead further, but that needs some deep surgery in the x86 CPU
     bringup code.
 
     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.
 
   - Enhancements for SMP function call tracing so it is possible to locate
     the scheduling and the actual execution points. That allows to measure
     IPI delivery time precisely.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZb/YTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRoOD/9vAiGI3IhGyZcX/RjXxauSHf8Pmqll
 05jUubFi5Vi3tKI1ubMOsnMmJTw2yy5xDyS/iGj7AcbRLq9uQd3iMtsXXHNBzo/X
 FNxnuWTXYUj0vcOYJ+j4puBumFzzpRCprqccMInH0kUnSWzbnaQCeelicZORAf+w
 zUYrswK4HpBXHDOnvPw6Z7MYQe+zyDQSwjSftstLyROzu+lCEw/9KUaysY2epShJ
 wHClxS2XqMnpY4rJ/CmJAlRhD0Plb89zXyo6k9YZYVDWoAcmBZy6vaTO4qoR171L
 37ApqrgsksMkjFycCMnmrFIlkeb7bkrYDQ5y+xqC3JPTlYDKOYmITV5fZ83HD77o
 K7FAhl/CgkPq2Ec+d82GFLVBKR1rijbwHf7a0nhfUy0yMeaJCxGp4uQ45uQ09asi
 a/VG2T38EgxVdseC92HRhcdd3pipwCb5wqjCH/XdhdlQrk9NfeIeP+TxF4QhADhg
 dApp3ifhHSnuEul7+HNUkC6U+Zc8UeDPdu5lvxSTp2ooQ0JwaGgC5PJq3nI9RUi2
 Vv826NHOknEjFInOQcwvp6SJPfcuSTF75Yx6xKz8EZ3HHxpvlolxZLq+3ohSfOKn
 2efOuZO5bEu4S/G2tRDYcy+CBvNVSrtZmCVqSOS039c8quBWQV7cj0334cjzf+5T
 TRiSzvssbYYmaw==
 =Y8if
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
2023-06-26 13:59:56 -07:00

234 lines
7.4 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
/*
* include/linux/cpu.h - generic cpu definition
*
* This is mainly for topological representation. We define the
* basic 'struct cpu' here, which can be embedded in per-arch
* definitions of processors.
*
* Basic handling of the devices is done in drivers/base/cpu.c
*
* CPUs are exported via sysfs in the devices/system/cpu
* directory.
*/
#ifndef _LINUX_CPU_H_
#define _LINUX_CPU_H_
#include <linux/node.h>
#include <linux/compiler.h>
#include <linux/cpumask.h>
#include <linux/cpuhotplug.h>
struct device;
struct device_node;
struct attribute_group;
struct cpu {
int node_id; /* The node which contains the CPU */
int hotpluggable; /* creates sysfs control file if hotpluggable */
struct device dev;
};
extern void boot_cpu_init(void);
extern void boot_cpu_hotplug_init(void);
extern void cpu_init(void);
extern void trap_init(void);
extern int register_cpu(struct cpu *cpu, int num);
extern struct device *get_cpu_device(unsigned cpu);
extern bool cpu_is_hotpluggable(unsigned cpu);
extern bool arch_match_cpu_phys_id(int cpu, u64 phys_id);
extern bool arch_find_n_match_cpu_physical_id(struct device_node *cpun,
int cpu, unsigned int *thread);
extern int cpu_add_dev_attr(struct device_attribute *attr);
extern void cpu_remove_dev_attr(struct device_attribute *attr);
extern int cpu_add_dev_attr_group(struct attribute_group *attrs);
extern void cpu_remove_dev_attr_group(struct attribute_group *attrs);
extern ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_spectre_v1(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_spectre_v2(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_spec_store_bypass(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_l1tf(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_mds(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_tsx_async_abort(struct device *dev,
struct device_attribute *attr,
char *buf);
extern ssize_t cpu_show_itlb_multihit(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_srbds(struct device *dev, struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_mmio_stale_data(struct device *dev,
struct device_attribute *attr,
char *buf);
extern ssize_t cpu_show_retbleed(struct device *dev,
struct device_attribute *attr, char *buf);
extern __printf(4, 5)
struct device *cpu_device_create(struct device *parent, void *drvdata,
const struct attribute_group **groups,
const char *fmt, ...);
#ifdef CONFIG_HOTPLUG_CPU
extern void unregister_cpu(struct cpu *cpu);
extern ssize_t arch_cpu_probe(const char *, size_t);
extern ssize_t arch_cpu_release(const char *, size_t);
#endif
/*
* These states are not related to the core CPU hotplug mechanism. They are
* used by various (sub)architectures to track internal state
*/
#define CPU_ONLINE 0x0002 /* CPU is up */
#define CPU_UP_PREPARE 0x0003 /* CPU coming up */
#define CPU_DEAD 0x0007 /* CPU dead */
#define CPU_DEAD_FROZEN 0x0008 /* CPU timed out on unplug */
#define CPU_POST_DEAD 0x0009 /* CPU successfully unplugged */
#define CPU_BROKEN 0x000B /* CPU did not die properly */
#ifdef CONFIG_SMP
extern bool cpuhp_tasks_frozen;
int add_cpu(unsigned int cpu);
int cpu_device_up(struct device *dev);
void notify_cpu_starting(unsigned int cpu);
extern void cpu_maps_update_begin(void);
extern void cpu_maps_update_done(void);
int bringup_hibernate_cpu(unsigned int sleep_cpu);
void bringup_nonboot_cpus(unsigned int setup_max_cpus);
#else /* CONFIG_SMP */
#define cpuhp_tasks_frozen 0
static inline void cpu_maps_update_begin(void)
{
}
static inline void cpu_maps_update_done(void)
{
}
static inline int add_cpu(unsigned int cpu) { return 0;}
#endif /* CONFIG_SMP */
extern struct bus_type cpu_subsys;
extern int lockdep_is_cpus_held(void);
#ifdef CONFIG_HOTPLUG_CPU
extern void cpus_write_lock(void);
extern void cpus_write_unlock(void);
extern void cpus_read_lock(void);
extern void cpus_read_unlock(void);
extern int cpus_read_trylock(void);
extern void lockdep_assert_cpus_held(void);
extern void cpu_hotplug_disable(void);
extern void cpu_hotplug_enable(void);
void clear_tasks_mm_cpumask(int cpu);
int remove_cpu(unsigned int cpu);
int cpu_device_down(struct device *dev);
extern void smp_shutdown_nonboot_cpus(unsigned int primary_cpu);
#else /* CONFIG_HOTPLUG_CPU */
static inline void cpus_write_lock(void) { }
static inline void cpus_write_unlock(void) { }
static inline void cpus_read_lock(void) { }
static inline void cpus_read_unlock(void) { }
static inline int cpus_read_trylock(void) { return true; }
static inline void lockdep_assert_cpus_held(void) { }
static inline void cpu_hotplug_disable(void) { }
static inline void cpu_hotplug_enable(void) { }
static inline int remove_cpu(unsigned int cpu) { return -EPERM; }
static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { }
#endif /* !CONFIG_HOTPLUG_CPU */
#ifdef CONFIG_PM_SLEEP_SMP
extern int freeze_secondary_cpus(int primary);
extern void thaw_secondary_cpus(void);
static inline int suspend_disable_secondary_cpus(void)
{
int cpu = 0;
if (IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU))
cpu = -1;
return freeze_secondary_cpus(cpu);
}
static inline void suspend_enable_secondary_cpus(void)
{
return thaw_secondary_cpus();
}
#else /* !CONFIG_PM_SLEEP_SMP */
static inline void thaw_secondary_cpus(void) {}
static inline int suspend_disable_secondary_cpus(void) { return 0; }
static inline void suspend_enable_secondary_cpus(void) { }
#endif /* !CONFIG_PM_SLEEP_SMP */
void __noreturn cpu_startup_entry(enum cpuhp_state state);
void cpu_idle_poll_ctrl(bool enable);
bool cpu_in_idle(unsigned long pc);
void arch_cpu_idle(void);
void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
void arch_cpu_idle_exit(void);
void __noreturn arch_cpu_idle_dead(void);
#ifdef CONFIG_ARCH_HAS_CPU_FINALIZE_INIT
void arch_cpu_finalize_init(void);
#else
static inline void arch_cpu_finalize_init(void) { }
#endif
void cpu_set_state_online(int cpu);
void play_idle_precise(u64 duration_ns, u64 latency_ns);
static inline void play_idle(unsigned long duration_us)
{
play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX);
}
#ifdef CONFIG_HOTPLUG_CPU
void cpuhp_report_idle_dead(void);
#else
static inline void cpuhp_report_idle_dead(void) { }
#endif /* #ifdef CONFIG_HOTPLUG_CPU */
enum cpuhp_smt_control {
CPU_SMT_ENABLED,
CPU_SMT_DISABLED,
CPU_SMT_FORCE_DISABLED,
CPU_SMT_NOT_SUPPORTED,
CPU_SMT_NOT_IMPLEMENTED,
};
#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT)
extern enum cpuhp_smt_control cpu_smt_control;
extern void cpu_smt_disable(bool force);
extern void cpu_smt_check_topology(void);
extern bool cpu_smt_possible(void);
extern int cpuhp_smt_enable(void);
extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
#else
# define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED)
static inline void cpu_smt_disable(bool force) { }
static inline void cpu_smt_check_topology(void) { }
static inline bool cpu_smt_possible(void) { return false; }
static inline int cpuhp_smt_enable(void) { return 0; }
static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; }
#endif
extern bool cpu_mitigations_off(void);
extern bool cpu_mitigations_auto_nosmt(void);
#endif /* _LINUX_CPU_H_ */