Commit Graph

279 Commits

Author SHA1 Message Date
Valentin Schneider c0cdc89072 irqchip/gic-v3-its: Give the percpu rdist struct its own flags field
Later patches will require tracking some per-rdist status. Reuse the bytes
"lost" to padding within the __percpu rdist struct as a flags field, and
re-encode ->lpi_enabled within said flags.

No change in functionality intended.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211027151506.2085066-2-valentin.schneider@arm.com
2021-12-16 13:21:11 +00:00
Kaige Fu 280bef5129 irqchip/gic-v3-its: Fix potential VPE leak on error
In its_vpe_irq_domain_alloc, when its_vpe_init() returns an error,
there is an off-by-one in the number of VPEs to be freed.

Fix it by simply passing the number of VPEs allocated, which is the
index of the loop iterating over the VPEs.

Fixes: 7d75bbb4bc ("irqchip/gic-v3-its: Add VPE irq domain allocation/teardown")
Signed-off-by: Kaige Fu <kaige.fu@linux.alibaba.com>
[maz: fixed commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/d9e36dee512e63670287ed9eff884a5d8d6d27f2.1631672311.git.kaige.fu@linux.alibaba.com
2021-09-22 14:37:04 +01:00
Andy Shevchenko ff5fe8867a irqchip/gic-v3: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210618151657.65305-4-andriy.shevchenko@linux.intel.com
2021-07-26 18:04:01 +01:00
Zhen Lei 944a1a17d3 irqchip/gic-v3-its: Remove unnecessary oom message
Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210609140643.14531-1-thunder.leizhen@huawei.com
2021-06-11 14:19:39 +01:00
Linus Torvalds 152d32aa84 ARM:
- Stage-2 isolation for the host kernel when running in protected mode
 
 - Guest SVE support when running in nVHE mode
 
 - Force W^X hypervisor mappings in nVHE mode
 
 - ITS save/restore for guests using direct injection with GICv4.1
 
 - nVHE panics now produce readable backtraces
 
 - Guest support for PTP using the ptp_kvm driver
 
 - Performance improvements in the S2 fault handler
 
 x86:
 
 - Optimizations and cleanup of nested SVM code
 
 - AMD: Support for virtual SPEC_CTRL
 
 - Optimizations of the new MMU code: fast invalidation,
   zap under read lock, enable/disably dirty page logging under
   read lock
 
 - /dev/kvm API for AMD SEV live migration (guest API coming soon)
 
 - support SEV virtual machines sharing the same encryption context
 
 - support SGX in virtual machines
 
 - add a few more statistics
 
 - improved directed yield heuristics
 
 - Lots and lots of cleanups
 
 Generic:
 
 - Rework of MMU notifier interface, simplifying and optimizing
 the architecture-specific code
 
 - Some selftests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
 y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
 c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
 Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
 +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
 M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
 =AXUi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This is a large update by KVM standards, including AMD PSP (Platform
  Security Processor, aka "AMD Secure Technology") and ARM CoreSight
  (debug and trace) changes.

  ARM:

   - CoreSight: Add support for ETE and TRBE

   - Stage-2 isolation for the host kernel when running in protected
     mode

   - Guest SVE support when running in nVHE mode

   - Force W^X hypervisor mappings in nVHE mode

   - ITS save/restore for guests using direct injection with GICv4.1

   - nVHE panics now produce readable backtraces

   - Guest support for PTP using the ptp_kvm driver

   - Performance improvements in the S2 fault handler

  x86:

   - AMD PSP driver changes

   - Optimizations and cleanup of nested SVM code

   - AMD: Support for virtual SPEC_CTRL

   - Optimizations of the new MMU code: fast invalidation, zap under
     read lock, enable/disably dirty page logging under read lock

   - /dev/kvm API for AMD SEV live migration (guest API coming soon)

   - support SEV virtual machines sharing the same encryption context

   - support SGX in virtual machines

   - add a few more statistics

   - improved directed yield heuristics

   - Lots and lots of cleanups

  Generic:

   - Rework of MMU notifier interface, simplifying and optimizing the
     architecture-specific code

   - a handful of "Get rid of oprofile leftovers" patches

   - Some selftests improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
  KVM: selftests: Speed up set_memory_region_test
  selftests: kvm: Fix the check of return value
  KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
  KVM: SVM: Skip SEV cache flush if no ASIDs have been used
  KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
  KVM: SVM: Drop redundant svm_sev_enabled() helper
  KVM: SVM: Move SEV VMCB tracking allocation to sev.c
  KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
  KVM: SVM: Unconditionally invoke sev_hardware_teardown()
  KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
  KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
  KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
  KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
  KVM: SVM: Move SEV module params/variables to sev.c
  KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
  KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
  KVM: SVM: Zero out the VMCB array used to track SEV ASID association
  x86/sev: Drop redundant and potentially misleading 'sev_enabled'
  KVM: x86: Move reverse CPUID helpers to separate header file
  KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
  ...
2021-05-01 10:14:08 -07:00
Shenming Lu c21bc068cd irqchip/gic-v3-its: Drop the setting of PTZ altogether
GICv4.1 gives a way to get the VLPI state, which needs to map the
vPE first, and after the state read, we may remap the vPE back while
the VPT is not empty. So we can't assume that the VPT is empty at
the first map. Besides, the optimization of PTZ is probably limited
since the HW should be fairly efficient to parse the empty VPT. Let's
drop the setting of PTZ altogether.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-3-lushenming@huawei.com
2021-03-24 18:12:20 +00:00
Marc Zyngier 301beaf197 irqchip/gic-v3-its: Add a cache invalidation right after vPE unmapping
In order to be able to manipulate the VPT once a vPE has been
unmapped, perform the required CMO to invalidate the CPU view
of the VPT.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Link: https://lore.kernel.org/r/20210322060158.1584-2-lushenming@huawei.com
2021-03-24 18:05:52 +00:00
Ingo Molnar a359f75796 irq: Fix typos in comments
Fix ~36 single-word typos in the IRQ, irqchip and irqdomain code comments.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-22 04:23:14 +01:00
Linus Torvalds 6a447b0e31 ARM:
* PSCI relay at EL2 when "protected KVM" is enabled
 * New exception injection code
 * Simplification of AArch32 system register handling
 * Fix PMU accesses when no PMU is enabled
 * Expose CSV3 on non-Meltdown hosts
 * Cache hierarchy discovery fixes
 * PV steal-time cleanups
 * Allow function pointers at EL2
 * Various host EL2 entry cleanups
 * Simplification of the EL2 vector allocation
 
 s390:
 * memcg accouting for s390 specific parts of kvm and gmap
 * selftest for diag318
 * new kvm_stat for when async_pf falls back to sync
 
 x86:
 * Tracepoints for the new pagetable code from 5.10
 * Catch VFIO and KVM irqfd events before userspace
 * Reporting dirty pages to userspace with a ring buffer
 * SEV-ES host support
 * Nested VMX support for wait-for-SIPI activity state
 * New feature flag (AVX512 FP16)
 * New system ioctl to report Hyper-V-compatible paravirtualization features
 
 Generic:
 * Selftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
 lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
 BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
 XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
 bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
 =ISud
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Much x86 work was pushed out to 5.12, but ARM more than made up for it.

  ARM:
   - PSCI relay at EL2 when "protected KVM" is enabled
   - New exception injection code
   - Simplification of AArch32 system register handling
   - Fix PMU accesses when no PMU is enabled
   - Expose CSV3 on non-Meltdown hosts
   - Cache hierarchy discovery fixes
   - PV steal-time cleanups
   - Allow function pointers at EL2
   - Various host EL2 entry cleanups
   - Simplification of the EL2 vector allocation

  s390:
   - memcg accouting for s390 specific parts of kvm and gmap
   - selftest for diag318
   - new kvm_stat for when async_pf falls back to sync

  x86:
   - Tracepoints for the new pagetable code from 5.10
   - Catch VFIO and KVM irqfd events before userspace
   - Reporting dirty pages to userspace with a ring buffer
   - SEV-ES host support
   - Nested VMX support for wait-for-SIPI activity state
   - New feature flag (AVX512 FP16)
   - New system ioctl to report Hyper-V-compatible paravirtualization features

  Generic:
   - Selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: SVM: fix 32-bit compilation
  KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
  KVM: SVM: Provide support to launch and run an SEV-ES guest
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Do not report support for SMM for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Support string IO operations for an SEV-ES guest
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Create trace events for VMGEXIT processing
  ...
2020-12-20 10:44:05 -08:00
Marc Zyngier 5fe71d271d irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
The ITS already has some notion of "shared" devices. Let's map the
MSI_ALLOC_FLAGS_PROXY_DEVICE flag onto this internal property.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20201129135208.680293-3-maz@kernel.org
2020-12-11 14:47:50 +00:00
Shenming Lu 0b39498230 irqchip/gic-v4.1: Reduce the delay when polling GICR_VPENDBASER.Dirty
The 10us delay of the poll on the GICR_VPENDBASER.Dirty bit is too
high, which might greatly affect the total scheduling latency of a
vCPU in our measurement. So we reduce it to 1 to lessen the impact.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201128141857.983-2-lushenming@huawei.com
2020-12-11 14:47:10 +00:00
Shenming Lu 57e3cebd02 KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bit
In order to reduce the impact of the VPT parsing happening on the GIC,
we can split the vcpu reseidency in two phases:

- programming GICR_VPENDBASER: this still happens in vcpu_load()
- checking for the VPT parsing to be complete: this can happen
  on vcpu entry (in kvm_vgic_flush_hwstate())

This allows the GIC and the CPU to work in parallel, rewmoving some
of the entry overhead.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201128141857.983-3-lushenming@huawei.com
2020-11-30 11:18:29 +00:00
Thomas Gleixner 7032908cd5 irqchip fixes for Linux 5.10, take #2
- Fix Exiu driver trigger type when using ACPI
 - Fix GICv3 ITS suspend/resume to use the in-kernel path
   at all times, sidestepping braindead firmware support
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+6Yf8PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD/+QP/R6DppA5eVK98xAt/qXia0eUbady8pib/a+8
 UGKFJ8d5CPhaP/EuuUk1zcN4ry2WnRSXVxesVkEQMLu8nKLsgG8lEgojJAel+w92
 gEbLuVnWJz6kiW/P7NUR/B1mYCacbi5IY+v3hFcoa6XzHbNHulzidUIkAiVcq9E/
 aqr08O62BFcBWzxRaaK6om2gTI3oZvD+v/oHTfMzIA59oAQGmgUYnc0CUOjVR5Bl
 aGYsdq2RqkvSdQbb2oVMPW6Tb22cfUQT1TnY1JR+Q5a5vEqTu67bTs6rpviR2Fj8
 CepcmoYUFC1tDW/tfbj+PgdDoyE2rbFUYyYiPX74caMi6rgLvSf8t1zK+sh/Q6x7
 Lnriw6xRbCMtkIczM5A3OrMQ/PWdcxHkyNC/3x2+qkkZVOBbZJ+cSWsJm1d3yky+
 IuaOTP2UYnToc+/uGHLysqUAehnMsSZnOvuYUVE4RVgMfbJvOX7rZdV1m+pic1GE
 Rm4TudcWni9b+14rFae/lOXXGDmY725898TI3Nud8bjKvyYcTMDO+R201VNFgNcB
 m/cWvPhbYP5OLR2cnvbWx0bg4WeBOY3m4RFIPHb2gCiPcWECLSCBaDfm7BG0eQvN
 DMLFIDePsj74vXejQbe5Txo8+dCe5vZsuzLRdFYElq6pOXVMjBtxF4s7In7RnqNB
 NPNENkqY
 =IjDX
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

 - Fix Exiu driver trigger type when using ACPI

 - Fix GICv3 ITS suspend/resume to use the in-kernel path
   at all times, sidestepping braindead firmware support

Link: https://lore.kernel.org/r/20201122184752.553990-1-maz@kernel.org
2020-11-25 00:56:28 +01:00
Xu Qiang 74cde1a533 irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend
On systems without HW-based collections (i.e. anything except GIC-500),
we rely on firmware to perform the ITS save/restore. This doesn't
really work, as although FW can properly save everything, it cannot
fully restore the state of the command queue (the read-side is reset
to the head of the queue). This results in the ITS consuming previously
processed commands, potentially corrupting the state.

Instead, let's always save the ITS state on suspend, disabling it in the
process, and restore the full state on resume. This saves us from broken
FW as long as it doesn't enable the ITS by itself (for which we can't do
anything).

This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE.

Signed-off-by: Xu Qiang <xuqiang36@huawei.com>
[maz: added warning on resume, rewrote commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201107104226.14282-1-xuqiang36@huawei.com
2020-11-22 12:58:35 +00:00
Linus Torvalds cf1d2b44f6 ACPI updates for 5.10-rc1
- Add support for generic initiator-only proximity domains to
    the ACPI NUMA code and the architectures using it (Jonathan
    Cameron).
 
  - Clean up some non-ACPICA code referring to debug facilities from
    ACPICA that are not actually used in there (Hanjun Guo).
 
  - Add new DPTF driver for the PCH FIVR participant (Srinivas
    Pandruvada).
 
  - Reduce overhead related to accessing GPE registers in ACPICA and
    the OS interface layer and make it possible to access GPE registers
    using logical addresses if they are memory-mapped (Rafael Wysocki).
 
  - Update the ACPICA code in the kernel to upstream revision 20200925
    including changes as follows:
    * Add predefined names from the SMBus sepcification (Bob Moore).
    * Update acpi_help UUID list (Bob Moore).
    * Return exceptions for string-to-integer conversions in iASL (Bob
      Moore).
    * Add a new "ALL <NameSeg>" debugger command (Bob Moore).
    * Add support for 64 bit risc-v compilation (Colin Ian King).
    * Do assorted cleanups (Bob Moore, Colin Ian King, Randy Dunlap).
 
  - Add new ACPI backlight whitelist entry for HP 635 Notebook (Alex
    Hung).
 
  - Move TPS68470 OpRegion driver to drivers/acpi/pmic/ and split out
    Kconfig and Makefile specific for ACPI PMIC (Andy Shevchenko).
 
  - Clean up the ACPI SoC driver for AMD SoCs (Hanjun Guo).
 
  - Add missing config_item_put() to fix refcount leak (Hanjun Guo).
 
  - Drop lefrover field from struct acpi_memory_device (Hanjun Guo).
 
  - Make the ACPI extlog driver check for RDMSR failures (Ben
    Hutchings).
 
  - Fix handling of lid state changes in the ACPI button driver when
    input device is closed (Dmitry Torokhov).
 
  - Fix several assorted build issues (Barnabás Pőcze, John Garry,
    Nathan Chancellor, Tian Tao).
 
  - Drop unused inline functions and reduce code duplication by using
    kobj_to_dev() in the NFIT parsing code (YueHaibing, Wang Qing).
 
  - Serialize tools/power/acpi Makefile (Thomas Renninger).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+F4IkSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx1gIQAIZrt09fquEIZhYulGZAkuYhSX2U/DZt
 poow5+TiGk36JNHlbZS19kZ3F0tJ1wA6CKSfF/bYyULxL+gYaUjdLXzv2kArTSAj
 nzDXQ2CystpySZI/sEkl4QjsMg0xuZlBhlnCfNHzJw049TgdsJHnxMkJXb8T90A+
 l2JKm2OpBkNvQGNpwd3djLg8xSDnHUmuevsWZPHDp92/fLMF9DUBk8dVuEwa0ndF
 hAUpWm+EL1tJQnhNwtfV/Akd9Ypqgk/7ROFWFHGDtHMZGnBjpyXZw68vHMX7SL6N
 Ej90GWGPHSJs/7Fsg4Hiaxxcph9WFNLPcpck5lVAMIrNHMKANjqQzCsmHavV/WTG
 STC9/qwJauA1EOjovlmlCFHctjKE/ya6Hm299WTlfBqB+Lu1L3oMR2CC+Uj0YfyG
 sv3264rJCsaSw610iwQOG807qHENopASO2q5DuKG0E9JpcaBUwn1N4qP5svvQciq
 4aA8Ma6xM/QHCO4CS0Se9C0+WSVtxWwOUichRqQmU4E6u1sXvKJxTeWo79rV7PAh
 L6BwoOxBLabEiyzpi6HPGs6DoKj/N6tOQenBh4ibdwpAwMtq7hIlBFa0bp19c2wT
 vx8F2Raa8vbQ2zZ1QEiPZnPLJUoy2DgaCtKJ6E0FTDXNs3VFlWgyhIUlIRqk5BS9
 OnAwVAUrTMkJ
 =feLU
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These add support for generic initiator-only proximity domains to the
  ACPI NUMA code and the architectures using it, clean up some
  non-ACPICA code referring to debug facilities from ACPICA, reduce the
  overhead related to accessing GPE registers, add a new DPTF (Dynamic
  Power and Thermal Framework) participant driver, update the ACPICA
  code in the kernel to upstream revision 20200925, add a new ACPI
  backlight whitelist entry, fix a few assorted issues and clean up some
  code.

  Specifics:

   - Add support for generic initiator-only proximity domains to the
     ACPI NUMA code and the architectures using it (Jonathan Cameron)

   - Clean up some non-ACPICA code referring to debug facilities from
     ACPICA that are not actually used in there (Hanjun Guo)

   - Add new DPTF driver for the PCH FIVR participant (Srinivas
     Pandruvada)

   - Reduce overhead related to accessing GPE registers in ACPICA and
     the OS interface layer and make it possible to access GPE registers
     using logical addresses if they are memory-mapped (Rafael Wysocki)

   - Update the ACPICA code in the kernel to upstream revision 20200925
     including changes as follows:
      + Add predefined names from the SMBus sepcification (Bob Moore)
      + Update acpi_help UUID list (Bob Moore)
      + Return exceptions for string-to-integer conversions in iASL (Bob
        Moore)
      + Add a new "ALL <NameSeg>" debugger command (Bob Moore)
      + Add support for 64 bit risc-v compilation (Colin Ian King)
      + Do assorted cleanups (Bob Moore, Colin Ian King, Randy Dunlap)

   - Add new ACPI backlight whitelist entry for HP 635 Notebook (Alex
     Hung)

   - Move TPS68470 OpRegion driver to drivers/acpi/pmic/ and split out
     Kconfig and Makefile specific for ACPI PMIC (Andy Shevchenko)

   - Clean up the ACPI SoC driver for AMD SoCs (Hanjun Guo)

   - Add missing config_item_put() to fix refcount leak (Hanjun Guo)

   - Drop lefrover field from struct acpi_memory_device (Hanjun Guo)

   - Make the ACPI extlog driver check for RDMSR failures (Ben
     Hutchings)

   - Fix handling of lid state changes in the ACPI button driver when
     input device is closed (Dmitry Torokhov)

   - Fix several assorted build issues (Barnabás Pőcze, John Garry,
     Nathan Chancellor, Tian Tao)

   - Drop unused inline functions and reduce code duplication by using
     kobj_to_dev() in the NFIT parsing code (YueHaibing, Wang Qing)

   - Serialize tools/power/acpi Makefile (Thomas Renninger)"

* tag 'acpi-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
  ACPICA: Update version to 20200925 Version 20200925
  ACPICA: Remove unnecessary semicolon
  ACPICA: Debugger: Add a new command: "ALL <NameSeg>"
  ACPICA: iASL: Return exceptions for string-to-integer conversions
  ACPICA: acpi_help: Update UUID list
  ACPICA: Add predefined names found in the SMBus sepcification
  ACPICA: Tree-wide: fix various typos and spelling mistakes
  ACPICA: Drop the repeated word "an" in a comment
  ACPICA: Add support for 64 bit risc-v compilation
  ACPI: button: fix handling lid state changes when input device closed
  tools/power/acpi: Serialize Makefile
  ACPI: scan: Replace ACPI_DEBUG_PRINT() with pr_debug()
  ACPI: memhotplug: Remove 'state' from struct acpi_memory_device
  ACPI / extlog: Check for RDMSR failure
  ACPI: Make acpi_evaluate_dsm() prototype consistent
  docs: mm: numaperf.rst Add brief description for access class 1.
  node: Add access1 class to represent CPU to memory characteristics
  ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
  ACPI: Let ACPI know we support Generic Initiator Affinity Structures
  x86: Support Generic Initiator only proximity domains
  ...
2020-10-14 11:42:04 -07:00
Mike Rapoport 9f3d5eaa3c memblock: implement for_each_reserved_mem_region() using __next_mem_region()
Iteration over memblock.reserved with for_each_reserved_mem_region() used
__next_reserved_mem_region() that implemented a subset of
__next_mem_region().

Use __for_each_mem_range() and, essentially, __next_mem_region() with
appropriate parameters to reduce code duplication.

While on it, rename for_each_reserved_mem_region() to
for_each_reserved_mem_range() for consistency.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00
Jonathan Cameron 95ac5bf4e4 irq-chip/gic-v3-its: Fix crash if ITS is in a proximity domain without processor or memory
Note this crash is present before any of the patches in this series, but
as explained below it is highly unlikely anyone is shipping a firmware that
causes it. Tests were done using an overriden SRAT.

On ARM64, the gic-v3 driver directly parses SRAT to locate GIC Interrupt
Translation Service (ITS) Affinity Structures. This is done much later
in the boot than the parses of SRAT which identify proximity domains.

As a result, an ITS placed in a proximity domain that is not defined by
another SRAT structure will result in a NUMA node that is not completely
configured and a crash.

ITS [mem 0x202100000-0x20211ffff]
ITS@0x0000000202100000: Using ITS number 0
Unable to handle kernel paging request at virtual address 0000000000001a08
...

Call trace:
  __alloc_pages_nodemask+0xe8/0x338
  alloc_pages_node.constprop.0+0x34/0x40
  its_probe_one+0x2f8/0xb18
  gic_acpi_parse_madt_its+0x108/0x150
  acpi_table_parse_entries_array+0x17c/0x264
  acpi_table_parse_entries+0x48/0x6c
  acpi_table_parse_madt+0x30/0x3c
  its_init+0x1c4/0x644
  gic_init_bases+0x4b8/0x4ec
  gic_acpi_init+0x134/0x264
  acpi_match_madt+0x4c/0x84
  acpi_table_parse_entries_array+0x17c/0x264
  acpi_table_parse_entries+0x48/0x6c
  acpi_table_parse_madt+0x30/0x3c
  __acpi_probe_device_table+0x8c/0xe8
  irqchip_init+0x3c/0x48
  init_IRQ+0xcc/0x100
  start_kernel+0x33c/0x548

ACPI 6.3 allows any set of Affinity Structures in SRAT to define a proximity
domain.  However, as we do not see this crash, we can conclude that no
firmware is currently placing an ITS in a node that is separate from
those containing memory and / or processors.

We could modify the SRAT parsing behavior to identify the existence
of Proximity Domains unique to the ITS structures, and handle them as
a special case of a generic initiator (once support for those merges).

This patch avoids the complexity that would be needed to handle this corner
case, by not allowing the ITS entry parsing code to instantiate new NUMA
Nodes.  If one is encountered that does not already exist, then NO_NUMA_NODE
is assigned and a warning printed just as if the value had been greater than
allowed NUMA Nodes.

"SRAT: Invalid NUMA node -1 in ITS affinity"

Whilst this does not provide the full flexibility allowed by ACPI,
it does fix the problem.  We can revisit a more sophisticated solution if
needed by future platforms.

Change is simply to replace acpi_map_pxm_to_node with pxm_to_node reflecting
the fact a new mapping is not created.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24 12:57:38 +02:00
Marc Zyngier eff65bd439 Merge remote-tracking branch 'origin/irq/gic-retrigger' into irq/irqchip-next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-17 16:50:02 +01:00
Marc Zyngier 5f774f5e12 irqchip/git-v3-its: Implement irq_retrigger callback for device-triggered LPIs
It is pretty easy to provide a retrigger callback for the ITS,
as it we already have the required support in terms of
irq_set_irqchip_state().

Note that this only works for device-generated LPIs, and not
the GICv4 doorbells, which should never have to be retriggered
anyway.

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-06 18:26:13 +01:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds f8b036a7fc The usual boring updates from the interrupt subsystem:
- Infrastructure to allow building irqchip drivers as modules
 
  - Consolidation of irqchip ACPI probing
 
  - Removal of the EOI-preflow interrupt handler which was required for
    SPARC support and became obsolete after SPARC was converted to
    use sparse interrupts.
 
  - Cleanups, fixes and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pDL0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRTFEACYvH2LnSu1GlXB0XtL3+XyV8bWN3Yr
 Qfcp9JbIibx65YkJjcyvfBNA6GjXoogMr9vOHeRVnPtOwzl/7n/lnh/43d6+YPot
 7UvIjGtpH3E/lF0kJKfuEsM8CX8DcVhn6dV/T+dJ00m69dAVQHNRsVqAi1/iWEeT
 9vBBELoJL79BU2g83NQZ7V0UrqiA5QlPYLpbSffliE6UWjG6XTH2CPM5XucuySNQ
 es3szxQ55rtPEzqCHVL0YW75vV39bmKZPqoApA/XQDJrp3bgftjdldoTe7YPQfSG
 MXAvB+6axPD+mdeag7/XZFC1DcMx8CnistZSJKpdYZe7mQ7iunfeJRhkEzb+DrO1
 WdcDcYOm0rLHhPrUZItJdACjuPNmN9pMaK1PbabsivnHVWzMYYKmMwbW+AEsygGW
 nnlsZP1Nr61Mo7O8+EKmxDdox4Qjk3lmQl4SdQgUKNKsI5yFYjvt2CfCjWLQJNBa
 w7YiLnL9IChXwrvdGqMIoEueUi0pC3gGbZ/bjDbxI4NJxJgEEav49m/prxM2A2Pl
 gfNdwlM1xgNydIBgt/jij/a8Lmv555RuZmvDV7QV7fFwaIqt3Qb5cs0Roq+GlzZR
 e0wuikGl0r/Bdow62rle7EysbBBGosAYf6K/kaGhd8v/kx2ByDnPPWzOqtxc+K+i
 Iw/daEQRsSnWuw==
 =KA8b
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "The usual boring updates from the interrupt subsystem:

   - Infrastructure to allow building irqchip drivers as modules

   - Consolidation of irqchip ACPI probing

   - Removal of the EOI-preflow interrupt handler which was required for
     SPARC support and became obsolete after SPARC was converted to use
     sparse interrupts.

   - Cleanups, fixes and improvements all over the place"

* tag 'irq-core-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  irqchip/loongson-pch-pic: Fix the misused irq flow handler
  irqchip/loongson-htvec: Support 8 groups of HT vectors
  irqchip/loongson-liointc: Fix misuse of gc->mask_cache
  dt-bindings: interrupt-controller: Update Loongson HTVEC description
  irqchip/imx-intmux: Fix irqdata regs save in imx_intmux_runtime_suspend()
  irqchip/imx-intmux: Implement intmux runtime power management
  irqchip/gic-v4.1: Use GFP_ATOMIC flag in allocate_vpe_l1_table()
  irqchip: Fix IRQCHIP_PLATFORM_DRIVER_* compilation by including module.h
  irqchip/stm32-exti: Map direct event to irq parent
  irqchip/mtk-cirq: Convert to a platform driver
  irqchip/mtk-sysirq: Convert to a platform driver
  irqchip/qcom-pdc: Switch to using IRQCHIP_PLATFORM_DRIVER helper macros
  irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros
  irqchip: irq-bcm2836.h: drop a duplicated word
  irqchip/gic-v4.1: Ensure accessing the correct RD when writing INVALLR
  irqchip/irq-bcm7038-l1: Guard uses of cpu_logical_map
  irqchip/gic-v3: Remove unused register definition
  irqchip/qcom-pdc: Allow QCOM_PDC to be loadable as a permanent module
  genirq: Export irq_chip_retrigger_hierarchy and irq_chip_set_vcpu_affinity_parent
  irqdomain: Export irq_domain_update_bus_token
  ...
2020-08-04 18:11:58 -07:00
Thomas Gleixner f0c7baca18 genirq/affinity: Make affinity setting if activated opt-in
John reported that on a RK3288 system the perf per CPU interrupts are all
affine to CPU0 and provided the analysis:

 "It looks like what happens is that because the interrupts are not per-CPU
  in the hardware, armpmu_request_irq() calls irq_force_affinity() while
  the interrupt is deactivated and then request_irq() with IRQF_PERCPU |
  IRQF_NOBALANCING.  

  Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls
  irq_setup_affinity() which returns early because IRQF_PERCPU and
  IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."

This was broken by the recent commit which blocked interrupt affinity
setting in hardware before activation of the interrupt. While this works in
general, it does not work for this particular case. As contrary to the
initial analysis not all interrupt chip drivers implement an activate
callback, the safe cure is to make the deferred interrupt affinity setting
at activation time opt-in.

Implement the necessary core logic and make the two irqchip implementations
for which this is required opt-in. In hindsight this would have been the
right thing to do, but ...

Fixes: baedb87d1b ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27 16:20:40 +02:00
Zenghui Yu d1bd7e0ba5 irqchip/gic-v4.1: Use GFP_ATOMIC flag in allocate_vpe_l1_table()
Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled
box, I get the following kernel splat:

[    0.053766] BUG: sleeping function called from invalid context at mm/slab.h:567
[    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
[    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
[    0.053770] Call trace:
[    0.053774]  dump_backtrace+0x0/0x218
[    0.053775]  show_stack+0x2c/0x38
[    0.053777]  dump_stack+0xc4/0x10c
[    0.053779]  ___might_sleep+0xfc/0x140
[    0.053780]  __might_sleep+0x58/0x90
[    0.053782]  slab_pre_alloc_hook+0x7c/0x90
[    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
[    0.053785]  its_cpu_init+0x6f4/0xe40
[    0.053786]  gic_starting_cpu+0x24/0x38
[    0.053788]  cpuhp_invoke_callback+0xa0/0x710
[    0.053789]  notify_cpu_starting+0xcc/0xd8
[    0.053790]  secondary_start_kernel+0x148/0x200

 # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
its_cpu_init+0x6f4/0xe40:
allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
(inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
(inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166

It turned out that we're allocating memory using GFP_KERNEL (may sleep)
within the CPU hotplug notifier, which is indeed an atomic context. Bad
thing may happen if we're playing on a system with more than a single
CommonLPIAff group. Avoid it by turning this into an atomic allocation.

Fixes: 5e5168461c ("irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200630133746.816-1-yuzenghui@huawei.com
2020-07-27 08:55:04 +01:00
Zenghui Yu 3af9571cd5 irqchip/gic-v4.1: Ensure accessing the correct RD when writing INVALLR
The GICv4.1 spec tells us that it's CONSTRAINED UNPREDICTABLE to issue a
register-based invalidation operation for a vPEID not mapped to that RD,
or another RD within the same CommonLPIAff group.

To follow this rule, commit f3a059219b ("irqchip/gic-v4.1: Ensure mutual
exclusion between vPE affinity change and RD access") tried to address the
race between the RD accesses and the vPE affinity change, but somehow
forgot to take GICR_INVALLR into account. Let's take the vpe_lock before
evaluating vpe->col_idx to fix it.

Fixes: f3a059219b ("irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200720092328.708-1-yuzenghui@huawei.com
2020-07-27 08:55:03 +01:00
Linus Torvalds bfe91da29b Bugfixes and a one-liner patch to silence sparse.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8DWosUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO8cAf/UskNg8qoLGG17rQwhFpmigSllbiJ
 TAyi3tpb1Y0Z2MfYeGkeiEb1L34bS28Cxl929DoqI3hrXy1wDCmsHPB5c3URXrzd
 aswvr7pJtQV9iH1ykaS2woFJnOUovMFsFYMhj46yUPoAvdKOZKvuqcduxbogYHFw
 YeRhS+1lGfiP2A0j3O/nnNJ0wq+FxKO46G3CgWeqG75+FSL6y/tl0bZJUMKKajQZ
 GNaOv/CYCHAfUdvgy0ZitRD8lV8yxng3dYGjm+a52Kmn2ZWiFlxNrnxzHySk16Rn
 Lq6MfFOqgrYpoZv7SnsFYnRE05U5bEFQ8BGr22fImQ+ktKDgq+9gv6cKwA==
 =+DN/
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes and a one-liner patch to silence a sparse warning"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: arm64: Stop clobbering x0 for HVC_SOFT_RESTART
  KVM: arm64: PMU: Fix per-CPU access in preemptible context
  KVM: VMX: Use KVM_POSSIBLE_CR*_GUEST_BITS to initialize guest/host masks
  KVM: x86: Mark CR4.TSD as being possibly owned by the guest
  KVM: x86: Inject #GP if guest attempts to toggle CR4.LA57 in 64-bit mode
  kvm: use more precise cast and do not drop __user
  KVM: x86: bit 8 of non-leaf PDPEs is not reserved
  KVM: X86: Fix async pf caused null-ptr-deref
  KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbell
  KVM: arm64: pvtime: Ensure task delay accounting is enabled
  KVM: arm64: Fix kvm_reset_vcpu() return code being incorrect with SVE
  KVM: arm64: Annotate hyp NMI-related functions as __always_inline
  KVM: s390: reduce number of IO pins to 1
2020-07-06 12:48:04 -07:00
Marc Zyngier a3f574cd65 KVM: arm64: vgic-v4: Plug race between non-residency and v4.1 doorbell
When making a vPE non-resident because it has hit a blocking WFI,
the doorbell can fire at any time after the write to the RD.
Crucially, it can fire right between the write to GICR_VPENDBASER
and the write to the pending_last field in the its_vpe structure.

This means that we would overwrite pending_last with stale data,
and potentially not wakeup until some unrelated event (such as
a timer interrupt) puts the vPE back on the CPU.

GICv4 isn't affected by this as we actively mask the doorbell on
entering the guest, while GICv4.1 automatically manages doorbell
delivery without any hypervisor-driven masking.

Use the vpe_lock to synchronize such update, which solves the
problem altogether.

Fixes: ae699ad348 ("irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer")
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-06-23 11:24:39 +01:00
Zenghui Yu 31dbb6b1d0 irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic
readx_poll_timeout() can sleep if @sleep_us is specified by the caller,
and is therefore unsafe to be used inside the atomic context, which is
this case when we use it to poll the GICR_VPENDBASER.Dirty bit in
irq_set_vcpu_affinity() callback.

Let's convert to its atomic version instead which helps to get the v4.1
board back to life!

Fixes: 96806229ca ("irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200605052345.1494-1-yuzenghui@huawei.com
2020-06-21 15:13:11 +01:00
Marc Zyngier c5d6082d35 irqchip/gic-v3-its: Balance initial LPI affinity across CPUs
When mapping a LPI, the ITS driver picks the first possible
affinity, which is in most cases CPU0, assuming that if
that's not suitable, someone will come and set the affinity
to something more interesting.

It apparently isn't the case, and people complain of poor
performance when many interrupts are glued to the same CPU.
So let's place the interrupts by finding the "least loaded"
CPU (that is, the one that has the fewer LPIs mapped to it).
So called 'managed' interrupts are an interesting case where
the affinity is actually dictated by the kernel itself, and
we should honor this.

Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1575642904-58295-1-git-send-email-john.garry@huawei.com
Link: https://lore.kernel.org/r/20200515165752.121296-3-maz@kernel.org
2020-05-20 11:00:00 +01:00
Marc Zyngier 2f13ff1d1d irqchip/gic-v3-its: Track LPI distribution on a per CPU basis
In order to improve the distribution of LPIs among CPUs, let start by
tracking the number of LPIs assigned to CPUs, both for managed and
non-managed interrupts (as separate counters).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20200515165752.121296-2-maz@kernel.org
2020-05-18 10:30:42 +01:00
Marc Zyngier 4b2dfe1e77 irqchip/gic-v4.1: Update effective affinity of virtual SGIs
Although the vSGIs are not directly visible to the host, they still
get moved around by the CPU hotplug, for example. This results in
the kernel moaning on the console, such as:

  genirq: irq_chip GICv4.1-sgi did not update eff. affinity mask of irq 38

Updating the effective affinity on set_affinity() fixes it.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-16 10:28:34 +01:00
Marc Zyngier 96806229ca irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling
When a vPE is made resident, the GIC starts parsing the virtual pending
table to deliver pending interrupts. This takes place asynchronously,
and can at times take a long while. Long enough that the vcpu enters
the guest and hits WFI before any interrupt has been signaled yet.
The vcpu then exits, blocks, and now gets a doorbell. Rince, repeat.

In order to avoid the above, a (optional on GICv4, mandatory on v4.1)
feature allows the GIC to feedback to the hypervisor whether it is
done parsing the VPT by clearing the GICR_VPENDBASER.Dirty bit.
The hypervisor can then wait until the GIC is ready before actually
running the vPE.

Plug the detection code as well as polling on vPE schedule. While
at it, tidy-up the kernel message that displays the GICv4 optional
features.

Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-16 10:28:12 +01:00
Marc Zyngier 771df8cf0b Merge branch 'irq/gic-v4.1' into irq/irqchip-next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-03-24 12:43:47 +00:00
Marc Zyngier 009384b380 irqchip/gic-v4.1: Eagerly vmap vPEs
Now that we have HW-accelerated SGIs being delivered to VPEs, it
becomes required to map the VPEs on all ITSs instead of relying
on the lazy approach that we would use when using the ITS-list
mechanism.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-17-maz@kernel.org
2020-03-24 12:15:51 +00:00
Marc Zyngier 05d32df13c irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
Just like for vLPIs, there is some configuration information that cannot
be directly communicated through the normal irqchip API, and we have to
use our good old friend set_vcpu_affinity as a side-band communication
mechanism.

This is used to configure group and priority for a given vSGI.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-13-maz@kernel.org
2020-03-24 12:15:51 +00:00
Marc Zyngier 7017ff0ee1 irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
To implement the get/set_irqchip_state callbacks (limited to the
PENDING state), we have to use a particular set of hacks:

- Reading the pending state is done by using a pair of new redistributor
  registers (GICR_VSGIR, GICR_VSGIPENDR), which allow the 16 interrupts
  state to be retrieved.
- Setting the pending state is done by generating it as we'd otherwise do
  for a guest (writing to GITS_SGIR).
- Clearing the pending state is done by emitting a VSGI command with the
  "clear" bit set.

This requires some interesting locking though:
- When talking to the redistributor, we must make sure that the VPE
  affinity doesn't change, hence taking the VPE lock.
- At the same time, we must ensure that nobody accesses the same
  redistributor's GICR_VSGIR registers for a different VPE, which
  would corrupt the reading of the pending bits. We thus take the
  per-RD spinlock. Much fun.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-12-maz@kernel.org
2020-03-24 12:05:09 +00:00
Marc Zyngier b4e8d644ec irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
Implement mask/unmask for virtual SGIs by calling into the
configuration helper.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-11-maz@kernel.org
2020-03-24 12:05:09 +00:00
Marc Zyngier e252cf8a34 irqchip/gic-v4.1: Add initial SGI configuration
The GICv4.1 ITS has yet another new command (VSGI) which allows
a VPE-targeted SGI to be configured (or have its pending state
cleared). Add support for this command and plumb it into the
activate irqdomain callback so that it is ready to be used.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-10-maz@kernel.org
2020-03-24 12:05:08 +00:00
Marc Zyngier 166cba7181 irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
Since GICv4.1 has the capability to inject 16 SGIs into each VPE,
and that I'm keen not to invent too many specific interfaces to
manipulate these interrupts, let's pretend that each of these SGIs
is an actual Linux interrupt.

For that matter, let's introduce a minimal irqchip and irqdomain
setup that will get fleshed up in the following patches.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-9-maz@kernel.org
2020-03-24 12:05:04 +00:00
Heyi Guo b2cb11f4f7 irqchip/gic-v4: Use Inner-Shareable attributes for virtual pending tables
There is no special reason to set virtual LPI pending table as
non-shareable. If we choose to hard code the shareability without
probing, Inner-Shareable is likely to be a better choice, as the
VPEs can move around and benefit from having the redistributors
snooping each other's cache, if that's something they can do.

Furthermore, Hisilicon hip08 ends up with unspecified errors when
mixing shareability attributes. So let's move to IS attributes for
the VPT. This has also been tested on D05 and didn't show any
regression.

Signed-off-by: Heyi Guo <guoheyi@huawei.com>
[maz: rewrote commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20191130073849.38378-1-guoheyi@huawei.com
2020-03-21 09:40:47 +00:00
Marc Zyngier 5e46a48413 irqchip/gic-v4.1: Map the ITS SGIR register page
One of the new features of GICv4.1 is to allow virtual SGIs to be
directly signaled to a VPE. For that, the ITS has grown a new
64kB page containing only a single register that is used to
signal a SGI to a given VPE.

Add a second mapping covering this new 64kB range, and take this
opportunity to limit the original mapping to 64kB, which is enough
to cover the span of the ITS registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-8-maz@kernel.org
2020-03-20 17:48:38 +00:00
Marc Zyngier 3c40706d05 irqchip/gic-v4.1: Advertise support v4.1 to KVM
Tell KVM that we support v4.1. Nothing uses this information so far.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-7-maz@kernel.org
2020-03-20 17:48:38 +00:00
Marc Zyngier 9058a4e980 irqchip/gic-v4.1: Ensure mutual exclusion betwen invalidations on the same RD
The GICv4.1 spec says that it is CONTRAINED UNPREDICTABLE to write to
any of the GICR_INV{LPI,ALL}R registers if GICR_SYNCR.Busy == 1.

To deal with it, we must ensure that only a single invalidation can
happen at a time for a given redistributor. Add a per-RD lock to that
effect and take it around the invalidation/syncr-read to deal with this.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-6-maz@kernel.org
2020-03-20 17:48:21 +00:00
Zenghui Yu b978c25f6e irqchip/gic-v4.1: Wait for completion of redistributor's INVALL operation
In GICv4.1, we emulate a guest-issued INVALL command by a direct write
to GICR_INVALLR.  Before we finish the emulation and go back to guest,
let's make sure the physical invalidate operation is actually completed
and no stale data will be left in redistributor. Per the specification,
this can be achieved by polling the GICR_SYNCR.Busy bit (to zero).

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200302092145.899-1-yuzenghui@huawei.com
Link: https://lore.kernel.org/r/20200304203330.4967-5-maz@kernel.org
2020-03-20 17:48:09 +00:00
Marc Zyngier f3a059219b irqchip/gic-v4.1: Ensure mutual exclusion between vPE affinity change and RD access
Before GICv4.1, all operations would be serialized with the affinity
changes by virtue of using the same ITS command queue. With v4.1, things
change, as invalidations (and a number of other operations) are issued
using the redistributor MMIO frame.

We must thus make sure that these redistributor accesses cannot race
against aginst the affinity change, or we may end-up talking to the
wrong redistributor.

To ensure this, we expand the irq_to_cpuid() helper to take a spinlock
when the LPI is mapped to a vLPI (a new per-VPE lock) on each operation
that requires mutual exclusion.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-4-maz@kernel.org
2020-03-19 11:21:58 +00:00
Marc Zyngier 28d160de51 irqchip/gic-v4.1: Skip absent CPUs while iterating over redistributors
In a system that is only sparsly populated with CPUs, we can end-up with
redistributors structures that are not initialized. Let's make sure we
don't try and access those when iterating over them (in this case when
checking we have a L2 VPE table).

Fixes: 4e6437f12d ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-3-maz@kernel.org
2020-03-19 11:21:58 +00:00
Marc Zyngier 7809f7011c irqchip/gic-v4: Provide irq_retrigger to avoid circular locking dependency
On a very heavily loaded D05 with GICv4, I managed to trigger the
following lockdep splat:

[ 6022.598864] ======================================================
[ 6022.605031] WARNING: possible circular locking dependency detected
[ 6022.611200] 5.6.0-rc4-00026-geee7c7b0f498 #680 Tainted: G            E
[ 6022.618061] ------------------------------------------------------
[ 6022.624227] qemu-system-aar/7569 is trying to acquire lock:
[ 6022.629789] ffff042f97606808 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x54/0x7a0
[ 6022.637102]
[ 6022.637102] but task is already holding lock:
[ 6022.642921] ffff002fae424cf0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x5c/0x98
[ 6022.651350]
[ 6022.651350] which lock already depends on the new lock.
[ 6022.651350]
[ 6022.659512]
[ 6022.659512] the existing dependency chain (in reverse order) is:
[ 6022.666980]
[ 6022.666980] -> #2 (&irq_desc_lock_class){-.-.}:
[ 6022.672983]        _raw_spin_lock_irqsave+0x50/0x78
[ 6022.677848]        __irq_get_desc_lock+0x5c/0x98
[ 6022.682453]        irq_set_vcpu_affinity+0x40/0xc0
[ 6022.687236]        its_make_vpe_non_resident+0x6c/0xb8
[ 6022.692364]        vgic_v4_put+0x54/0x70
[ 6022.696273]        vgic_v3_put+0x20/0xd8
[ 6022.700183]        kvm_vgic_put+0x30/0x48
[ 6022.704182]        kvm_arch_vcpu_put+0x34/0x50
[ 6022.708614]        kvm_sched_out+0x34/0x50
[ 6022.712700]        __schedule+0x4bc/0x7f8
[ 6022.716697]        schedule+0x50/0xd8
[ 6022.720347]        kvm_arch_vcpu_ioctl_run+0x5f0/0x978
[ 6022.725473]        kvm_vcpu_ioctl+0x3d4/0x8f8
[ 6022.729820]        ksys_ioctl+0x90/0xd0
[ 6022.733642]        __arm64_sys_ioctl+0x24/0x30
[ 6022.738074]        el0_svc_common.constprop.3+0xa8/0x1e8
[ 6022.743373]        do_el0_svc+0x28/0x88
[ 6022.747198]        el0_svc+0x14/0x40
[ 6022.750761]        el0_sync_handler+0x124/0x2b8
[ 6022.755278]        el0_sync+0x140/0x180
[ 6022.759100]
[ 6022.759100] -> #1 (&rq->lock){-.-.}:
[ 6022.764143]        _raw_spin_lock+0x38/0x50
[ 6022.768314]        task_fork_fair+0x40/0x128
[ 6022.772572]        sched_fork+0xe0/0x210
[ 6022.776484]        copy_process+0x8c4/0x18d8
[ 6022.780742]        _do_fork+0x88/0x6d8
[ 6022.784478]        kernel_thread+0x64/0x88
[ 6022.788563]        rest_init+0x30/0x270
[ 6022.792390]        arch_call_rest_init+0x14/0x1c
[ 6022.796995]        start_kernel+0x498/0x4c4
[ 6022.801164]
[ 6022.801164] -> #0 (&p->pi_lock){-.-.}:
[ 6022.806382]        __lock_acquire+0xdd8/0x15c8
[ 6022.810813]        lock_acquire+0xd0/0x218
[ 6022.814896]        _raw_spin_lock_irqsave+0x50/0x78
[ 6022.819761]        try_to_wake_up+0x54/0x7a0
[ 6022.824018]        wake_up_process+0x1c/0x28
[ 6022.828276]        wakeup_softirqd+0x38/0x40
[ 6022.832533]        __tasklet_schedule_common+0xc4/0xf0
[ 6022.837658]        __tasklet_schedule+0x24/0x30
[ 6022.842176]        check_irq_resend+0xc8/0x158
[ 6022.846609]        irq_startup+0x74/0x128
[ 6022.850606]        __enable_irq+0x6c/0x78
[ 6022.854602]        enable_irq+0x54/0xa0
[ 6022.858431]        its_make_vpe_non_resident+0xa4/0xb8
[ 6022.863557]        vgic_v4_put+0x54/0x70
[ 6022.867469]        kvm_arch_vcpu_blocking+0x28/0x38
[ 6022.872336]        kvm_vcpu_block+0x48/0x490
[ 6022.876594]        kvm_handle_wfx+0x18c/0x310
[ 6022.880938]        handle_exit+0x138/0x198
[ 6022.885022]        kvm_arch_vcpu_ioctl_run+0x4d4/0x978
[ 6022.890148]        kvm_vcpu_ioctl+0x3d4/0x8f8
[ 6022.894494]        ksys_ioctl+0x90/0xd0
[ 6022.898317]        __arm64_sys_ioctl+0x24/0x30
[ 6022.902748]        el0_svc_common.constprop.3+0xa8/0x1e8
[ 6022.908046]        do_el0_svc+0x28/0x88
[ 6022.911871]        el0_svc+0x14/0x40
[ 6022.915434]        el0_sync_handler+0x124/0x2b8
[ 6022.919951]        el0_sync+0x140/0x180
[ 6022.923773]
[ 6022.923773] other info that might help us debug this:
[ 6022.923773]
[ 6022.931762] Chain exists of:
[ 6022.931762]   &p->pi_lock --> &rq->lock --> &irq_desc_lock_class
[ 6022.931762]
[ 6022.942101]  Possible unsafe locking scenario:
[ 6022.942101]
[ 6022.948007]        CPU0                    CPU1
[ 6022.952523]        ----                    ----
[ 6022.957039]   lock(&irq_desc_lock_class);
[ 6022.961036]                                lock(&rq->lock);
[ 6022.966595]                                lock(&irq_desc_lock_class);
[ 6022.973109]   lock(&p->pi_lock);
[ 6022.976324]
[ 6022.976324]  *** DEADLOCK ***

This is happening because we have a pending doorbell that requires
retrigger. As SW retriggering is done in a tasklet, we trigger the
circular dependency above.

The easy cop-out is to provide a retrigger callback that doesn't
require acquiring any extra lock.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200310184921.23552-5-maz@kernel.org
2020-03-16 15:48:55 +00:00
Marc Zyngier d5df9dc96e irqchip/gic-v3-its: Probe ITS page size for all GITS_BASERn registers
The GICv3 ITS driver assumes that once it has latched on a page size for
a given BASER register, it can use the same page size as the maximum
page size for all subsequent BASER registers.

Although it worked so far, nothing in the architecture guarantees this,
and Nianyao Tang hit this problem on some undisclosed implementation.

Let's bite the bullet and probe the the supported page size on all BASER
registers before starting to populate the tables. This simplifies the
setup a bit, at the expense of a few additional MMIO accesses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Tested-by: Nianyao Tang <tangnianyao@huawei.com>
Link: https://lore.kernel.org/r/1584089195-63897-1-git-send-email-zhangshaokun@hisilicon.com
2020-03-16 15:48:54 +00:00
Heyi Guo 04d80dbe85 irqchip/gic-v3-its: Fix access width for gicr_syncr
GICR_SYNCR is a 32bit register, so it is better to access it with
32bit access width, though we have not seen any real problem.

Signed-off-by: Heyi Guo <guoheyi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200225090023.28020-1-guoheyi@huawei.com
2020-03-08 14:25:46 +00:00
Marc Zyngier 490d332ea4 irqchip/gic-v4.1: Avoid 64bit division for the sake of 32bit ARM
In order to allow the GICv4 code to link properly on 32bit ARM,
make sure we don't use 64bit divisions when it isn't strictly
necessary.

Fixes: 4e6437f12d ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-09 15:47:37 -08:00
Zenghui Yu 5186a6cc3e irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors
V{PEND,PROP}BASER registers are actually located in VLPI_base frame
of the *redistributor*. Rename their accessors to reflect this fact.

No functional changes.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-7-yuzenghui@huawei.com
2020-02-08 10:01:33 +00:00