linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-02 23:27:06 +00:00

Author	SHA1	Message	Date
Alex Williamson	c9082e6565	PCI/PM: Extend D3hot delay for NVIDIA HDA controllers [ Upstream commit `a5a6dd2624` ] Assignment of NVIDIA Ampere-based GPUs have seen a regression since the below referenced commit, where the reduced D3hot transition delay appears to introduce a small window where a D3hot->D0 transition followed by a bus reset can wedge the device. The entire device is subsequently unavailable, returning -1 on config space read and is unrecoverable without a host reset. This has been observed with RTX A2000 and A5000 GPU and audio functions assigned to a Windows VM, where shutdown of the VM places the devices in D3hot prior to vfio-pci performing a bus reset when userspace releases the devices. The issue has roughly a 2-3% chance of occurring per shutdown. Restoring the HDA controller d3hot_delay to the effective value before the below commit has been shown to resolve the issue. NVIDIA confirms this change should be safe for all of their HDA controllers. Fixes: `3e347969a5` ("PCI/PM: Reduce D3hot delay with usleep_range()") Link: https://lore.kernel.org/r/20230413194042.605768-1-alex.williamson@redhat.com Reported-by: Zhiyi Guo <zhguo@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Tarun Gupta <targupta@nvidia.com> Cc: Abhishek Sahu <abhsahu@nvidia.com> Cc: Tarun Gupta <targupta@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-11 23:11:23 +09:00
Kuppuswamy Sathyanarayanan	7cfae90e5c	PCI/EDR: Clear Device Status after EDR error recovery [ Upstream commit `c441b1e03d` ] During EDR recovery, the OS must clear error status of the port that triggered DPC even if firmware retains control of DPC and AER (see the implementation note in the PCI Firmware spec r3.3, sec 4.6.12). Prior to `068c29a248` ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER"), the port Device Status was cleared in this path: edr_handle_event dpc_process_error(dev) # "dev" triggered DPC pcie_do_recovery(dev, dpc_reset_link) dpc_reset_link # exit DPC pcie_clear_device_status(dev) # clear Device Status After `068c29a248`, pcie_do_recovery() no longer clears Device Status when firmware controls AER, so the error bit remains set even after recovery. Per the "Downstream Port Containment configuration control" bit in the returned _OSC Control Field (sec 4.5.1), the OS is allowed to clear error status until it evaluates _OST, so clear Device Status in edr_handle_event() if the error recovery was successful. [bhelgaas: commit log] Fixes: `068c29a248` ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER") Link: https://lore.kernel.org/r/20230315235449.1279209-1-sathyanarayanan.kuppuswamy@linux.intel.com Reported-by: Tsaur Erwin <erwin.tsaur@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-11 23:11:22 +09:00
H. Nikolaus Schaller	cd090996c7	PCI: imx6: Install the fault handler only on compatible match [ Upstream commit `5f5ac460df` ] commit `bb38919ec5` ("PCI: imx6: Add support for i.MX6 PCIe controller") added a fault hook to this driver in the probe function. So it was only installed if needed. commit `bde4a5a00e` ("PCI: imx6: Allow probe deferral by reset GPIO") moved it from probe to driver init which installs the hook unconditionally as soon as the driver is compiled into a kernel. When this driver is compiled as a module, the hook is not registered until after the driver has been matched with a .compatible and loaded. commit `415b6185c5` ("PCI: imx6: Fix config read timeout handling") extended the fault handling code. commit `2d8ed461db` ("PCI: imx6: Add support for i.MX8MQ") added some protection for non-ARM architectures, but this does not protect non-i.MX ARM architectures. Since fault handlers can be triggered on any architecture for different reasons, there is no guarantee that they will be triggered only for the assumed situation, leading to improper error handling (i.MX6-specific imx6q_pcie_abort_handler) on foreign systems. I had seen strange L3 imprecise external abort messages several times on OMAP4 and OMAP5 devices and couldn't make sense of them until I realized they were related to this unused imx6q driver because I had CONFIG_PCI_IMX6=y. Note that CONFIG_PCI_IMX6=y is useful for kernel binaries that are designed to run on different ARM SoC and be differentiated only by device tree binaries. So turning off CONFIG_PCI_IMX6 is not a solution. Therefore we check the compatible in the init function before registering the fault handler. Link: https://lore.kernel.org/r/e1bcfc3078c82b53aa9b78077a89955abe4ea009.1678380991.git.hns@goldelico.com Fixes: `bde4a5a00e` ("PCI: imx6: Allow probe deferral by reset GPIO") Fixes: `415b6185c5` ("PCI: imx6: Fix config read timeout handling") Fixes: `2d8ed461db` ("PCI: imx6: Add support for i.MX8MQ") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-05-11 23:11:21 +09:00
Manivannan Sadhasivam	d34761c4d6	PCI: qcom: Fix the incorrect register usage in v2.7.0 config commit `2542e16c39` upstream. Qcom PCIe IP version v2.7.0 and its derivatives don't contain the PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT register. Instead, they have the new PCIE20_PARF_AXI_MSTR_WR_ADDR_HALT_V2 register. So fix the incorrect register usage which is modifying a different register. Also in this IP version, this register change doesn't depend on MSI being enabled. So remove that check also. Link: https://lore.kernel.org/r/20230316081117.14288-2-manivannan.sadhasivam@linaro.org Fixes: `ed8cc3b1fc` ("PCI: qcom: Add support for SDM845 PCIe controller") Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: <stable@vger.kernel.org> # 5.6+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-11 23:10:49 +09:00
Lukas Wunner	25fd557e75	PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock commit `f5eff5591b` upstream. In 2013, commits `2e35afaefe` ("PCI: pciehp: Add reset_slot() method") `608c388122` ("PCI: Add slot reset option to pci_dev_reset()") amended PCIe hotplug to mask Presence Detect Changed events during a Secondary Bus Reset. The reset thus no longer causes gratuitous slot bringdown and bringup. However the commits neglected to serialize reset with code paths reading slot registers. For instance, a slot bringup due to an earlier hotplug event may see the Presence Detect State bit cleared during a concurrent Secondary Bus Reset. In 2018, commit `5b3f7b7d06` ("PCI: pciehp: Avoid slot access during reset") retrofitted the missing locking. It introduced a reset_lock which serializes a Secondary Bus Reset with other parts of pciehp. Unfortunately the locking turns out to be overzealous: reset_lock is held for the entire enumeration and de-enumeration of hotplugged devices, including driver binding and unbinding. Driver binding and unbinding acquires device_lock while the reset_lock of the ancestral hotplug port is held. A concurrent Secondary Bus Reset acquires the ancestral reset_lock while already holding the device_lock. The asymmetric locking order in the two code paths can lead to AB-BA deadlocks. Michael Haeuptle reports such deadlocks on simultaneous hot-removal and vfio release (the latter implies a Secondary Bus Reset): pciehp_ist() # down_read(reset_lock) pciehp_handle_presence_or_link_change() pciehp_disable_slot() __pciehp_disable_slot() remove_board() pciehp_unconfigure_device() pci_stop_and_remove_bus_device() pci_stop_bus_device() pci_stop_dev() device_release_driver() device_release_driver_internal() __device_driver_lock() # device_lock() SYS_munmap() vfio_device_fops_release() vfio_device_group_close() vfio_device_close() vfio_device_last_close() vfio_pci_core_close_device() vfio_pci_core_disable() # device_lock() __pci_reset_function_locked() pci_reset_bus_function() pci_dev_reset_slot_function() pci_reset_hotplug_slot() pciehp_reset_slot() # down_write(reset_lock) Ian May reports the same deadlock on simultaneous hot-removal and an AER-induced Secondary Bus Reset: aer_recover_work_func() pcie_do_recovery() aer_root_reset() pci_bus_error_reset() pci_slot_reset() pci_slot_lock() # device_lock() pci_reset_hotplug_slot() pciehp_reset_slot() # down_write(reset_lock) Fix by releasing the reset_lock during driver binding and unbinding, thereby splitting and shrinking the critical section. Driver binding and unbinding is protected by the device_lock() and thus serialized with a Secondary Bus Reset. There's no need to additionally protect it with the reset_lock. However, pciehp does not bind and unbind devices directly, but rather invokes PCI core functions which also perform certain enumeration and de-enumeration steps. The reset_lock's purpose is to protect slot registers, not enumeration and de-enumeration of hotplugged devices. That would arguably be the job of the PCI core, not the PCIe hotplug driver. After all, an AER-induced Secondary Bus Reset may as well happen during boot-time enumeration of the PCI hierarchy and there's no locking to prevent that either. Exempting de-enumeration from the reset_lock is relatively harmless: A concurrent Secondary Bus Reset may foil config space accesses such as PME interrupt disablement. But if the device is physically gone, those accesses are pointless anyway. If the device is physically present and only logically removed through an Attention Button press or the sysfs "power" attribute, PME interrupts as well as DMA cannot come through because pciehp_unconfigure_device() disables INTx and Bus Master bits. That's still protected by the reset_lock in the present commit. Exempting enumeration from the reset_lock also has limited impact: The exempted call to pci_bus_add_device() may perform device accesses through pcibios_bus_add_device() and pci_fixup_device() which are now no longer protected from a concurrent Secondary Bus Reset. Otherwise there should be no impact. In essence, the present commit seeks to fix the AB-BA deadlocks while still retaining a best-effort reset protection for enumeration and de-enumeration of hotplugged devices -- until a general solution is implemented in the PCI core. Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@canonical.com Link: https://lore.kernel.org/linux-pci/ce878dab-c0c4-5bd0-a725-9805a075682d@amd.com Link: https://lore.kernel.org/linux-pci/ed831249-384a-6d35-0831-70af191e9bce@huawei.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590 Fixes: `5b3f7b7d06` ("PCI: pciehp: Avoid slot access during reset") Link: https://lore.kernel.org/r/fef2b2e9edf245c049a8c5b94743c0f74ff5008a.1681191902.git.lukas@wunner.de Reported-by: Michael Haeuptle <michael.haeuptle@hpe.com> Reported-by: Ian May <ian.may@canonical.com> Reported-by: Andrey Grodzovsky <andrey2805@gmail.com> Reported-by: Rahul Kumar <rahul.kumar1@amd.com> Reported-by: Jialin Zhang <zhangjialin11@huawei.com> Tested-by: Anatoli Antonovitch <Anatoli.Antonovitch@amd.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.19+ Cc: Dan Stein <dstein@hpe.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Alex Michon <amichon@kalrayinc.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-11 23:10:49 +09:00
Josh Triplett	03c4c656a5	PCI: kirin: Select REGMAP_MMIO commit `3a2776e8a0` upstream. pcie-kirin uses regmaps, and needs to pull them in; otherwise, with CONFIG_PCIE_KIRIN=y and without CONFIG_REGMAP_MMIO pcie-kirin produces a linker failure looking for __devm_regmap_init_mmio_clk(). Fixes: `d19afe7be1` ("PCI: kirin: Use regmap for APB registers") Link: https://lore.kernel.org/r/04636141da1d6d592174eefb56760511468d035d.1668410580.git.josh@joshtriplett.org Signed-off-by: Josh Triplett <josh@joshtriplett.org> [lpieralisi@kernel.org: commit log and removed REGMAP select] Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org # 5.16+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-05-11 23:10:48 +09:00
Thomas Gleixner	89da310571	PCI/MSI: Remove over-zealous hardware size check in pci_msix_validate_entries() commit `e3c026be4d` upstream. pci_msix_validate_entries() validates the entries array which is handed in by the caller for a MSI-X interrupt allocation. Aside of consistency failures it also detects a failure when the size of the MSI-X hardware table in the device is smaller than the size of the entries array. That's wrong for the case of range allocations where the caller provides the minimum and the maximum number of vectors to allocate, when the hardware size is greater or equal than the mininum, but smaller than the maximum. Remove the hardware size check completely from that function and just ensure that the entires array up to the maximum size is consistent. The limitation and range checking versus the hardware size happens independently of that afterwards anyway because the entries array is optional. Fixes: `4644d22eb6` ("PCI/MSI: Validate MSI-X contiguous restriction early") Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87v8i3sg62.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-26 14:30:08 +02:00
Rob Herring	07a75c0050	PCI: Fix use-after-free in pci_bus_release_domain_nr() commit `30ba2d09ed` upstream. Commit `c14f7ccc9f` ("PCI: Assign PCI domain IDs by ida_alloc()") introduced a use-after-free bug in the bus removal cleanup. The issue was found with kfence: [ 19.293351] BUG: KFENCE: use-after-free read in pci_bus_release_domain_nr+0x10/0x70 [ 19.302817] Use-after-free read at 0x000000007f3b80eb (in kfence-#115): [ 19.309677] pci_bus_release_domain_nr+0x10/0x70 [ 19.309691] dw_pcie_host_deinit+0x28/0x78 [ 19.309702] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194] [ 19.309734] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194] [ 19.309752] platform_probe+0x90/0xd8 ... [ 19.311457] kfence-#115: 0x00000000063a155a-0x00000000ba698da8, size=1072, cache=kmalloc-2k [ 19.311469] allocated by task 96 on cpu 10 at 19.279323s: [ 19.311562] __kmem_cache_alloc_node+0x260/0x278 [ 19.311571] kmalloc_trace+0x24/0x30 [ 19.311580] pci_alloc_bus+0x24/0xa0 [ 19.311590] pci_register_host_bridge+0x48/0x4b8 [ 19.311601] pci_scan_root_bus_bridge+0xc0/0xe8 [ 19.311613] pci_host_probe+0x18/0xc0 [ 19.311623] dw_pcie_host_init+0x2c0/0x568 [ 19.311630] tegra_pcie_dw_probe+0x610/0xb28 [pcie_tegra194] [ 19.311647] platform_probe+0x90/0xd8 ... [ 19.311782] freed by task 96 on cpu 10 at 19.285833s: [ 19.311799] release_pcibus_dev+0x30/0x40 [ 19.311808] device_release+0x30/0x90 [ 19.311814] kobject_put+0xa8/0x120 [ 19.311832] device_unregister+0x20/0x30 [ 19.311839] pci_remove_bus+0x78/0x88 [ 19.311850] pci_remove_root_bus+0x5c/0x98 [ 19.311860] dw_pcie_host_deinit+0x28/0x78 [ 19.311866] tegra_pcie_deinit_controller+0x1c/0x38 [pcie_tegra194] [ 19.311883] tegra_pcie_dw_probe+0x648/0xb28 [pcie_tegra194] [ 19.311900] platform_probe+0x90/0xd8 ... [ 19.313579] CPU: 10 PID: 96 Comm: kworker/u24:2 Not tainted 6.2.0 #4 [ 19.320171] Hardware name: /, BIOS 1.0-d7fb19b 08/10/2022 [ 19.325852] Workqueue: events_unbound deferred_probe_work_func The stack trace is a bit misleading as dw_pcie_host_deinit() doesn't directly call pci_bus_release_domain_nr(). The issue turns out to be in pci_remove_root_bus() which first calls pci_remove_bus() which frees the struct pci_bus when its struct device is released. Then pci_bus_release_domain_nr() is called and accesses the freed struct pci_bus. Reordering these fixes the issue. Fixes: `c14f7ccc9f` ("PCI: Assign PCI domain IDs by ida_alloc()") Link: https://lore.kernel.org/r/20230329123835.2724518-1-robh@kernel.org Link: https://lore.kernel.org/r/b529cb69-0602-9eed-fc02-2f068707a006@nvidia.com Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v6.2+ Cc: Pali Rohár <pali@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-20 12:36:58 +02:00
Lukas Wunner	95628b8309	PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y commit `abf04be0e7` upstream. After a pci_doe_task completes, its work_struct needs to be destroyed to avoid a memory leak with CONFIG_DEBUG_OBJECTS=y. Fixes: `9d24322e88` ("PCI/DOE: Add DOE mailbox support functions") Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: stable@vger.kernel.org # v6.0+ Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/775768b4912531c3b887d405fc51a50e465e1bf9.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-13 17:02:42 +02:00
Lukas Wunner	626f782ba4	PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y commit `92dc899c3b` upstream. Gregory Price reports a WARN splat with CONFIG_DEBUG_OBJECTS=y upon CXL probing because pci_doe_submit_task() invokes INIT_WORK() instead of INIT_WORK_ONSTACK() for a work_struct that was allocated on the stack. All callers of pci_doe_submit_task() allocate the work_struct on the stack, so replace INIT_WORK() with INIT_WORK_ONSTACK() as a backportable short-term fix. The long-term fix implemented by a subsequent commit is to move to a synchronous API which allocates the work_struct internally in the DOE library. Stacktrace for posterity: WARNING: CPU: 0 PID: 23 at lib/debugobjects.c:545 __debug_object_init.cold+0x18/0x183 CPU: 0 PID: 23 Comm: kworker/u2:1 Not tainted 6.1.0-0.rc1.20221019gitaae703b02f92.17.fc38.x86_64 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: pci_doe_submit_task+0x5d/0xd0 pci_doe_discovery+0xb4/0x100 pcim_doe_create_mb+0x219/0x290 cxl_pci_probe+0x192/0x430 local_pci_probe+0x41/0x80 pci_device_probe+0xb3/0x220 really_probe+0xde/0x380 __driver_probe_device+0x78/0x170 driver_probe_device+0x1f/0x90 __driver_attach_async_helper+0x5c/0xe0 async_run_entry_fn+0x30/0x130 process_one_work+0x294/0x5b0 Fixes: `9d24322e88` ("PCI/DOE: Add DOE mailbox support functions") Link: https://lore.kernel.org/linux-cxl/Y1bOniJliOFszvIK@memverge.com/ Reported-by: Gregory Price <gregory.price@memverge.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Cc: stable@vger.kernel.org # v6.0+ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/67a9117f463ecdb38a2dbca6a20391ce2f1e7a06.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-13 17:02:42 +02:00
Lukas Wunner	027882381a	cxl/pci: Fix CDAT retrieval on big endian commit `fbaa38214c` upstream. The CDAT exposed in sysfs differs between little endian and big endian arches: On big endian, every 4 bytes are byte-swapped. PCI Configuration Space is little endian (PCI r3.0 sec 6.1). Accessors such as pci_read_config_dword() implicitly swap bytes on big endian. That way, the macros in include/uapi/linux/pci_regs.h work regardless of the arch's endianness. For an example of implicit byte-swapping, see ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx (Load Word Byte-Reverse Indexed). DOE Read/Write Data Mailbox Registers are unlike other registers in Configuration Space in that they contain or receive a 4 byte portion of an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f). They need to be copied to or from the request/response buffer verbatim. So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit byte-swapping. The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume implicit byte-swapping. Byte-swap requests after constructing them with those macros and byte-swap responses before parsing them. Change the request and response type to __le32 to avoid sparse warnings. Per a request from Jonathan, replace sizeof(u32) with sizeof(__le32) for consistency. Fixes: `c97006046c` ("cxl/port: Read CDAT table") Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org # v6.0+ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/3051114102f41d19df3debbee123129118fc5e6d.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-13 17:02:42 +02:00
Yoshihiro Shimoda	4a64d35412	PCI: dwc: Fix PORT_LINK_CONTROL update when CDM check enabled [ Upstream commit `cdce670991` ] If CDM_CHECK is enabled (by the DT "snps,enable-cdm-check" property), 'val' is overwritten by PCIE_PL_CHK_REG_CONTROL_STATUS initialization. Commit `ec7b952f45` ("PCI: dwc: Always enable CDM check if "snps,enable-cdm-check" exists") did not account for further usage of 'val', so we wrote improper values to PCIE_PORT_LINK_CONTROL when the CDM check is enabled. Move the PCIE_PORT_LINK_CONTROL update to be completely after the PCIE_PL_CHK_REG_CONTROL_STATUS register initialization. [bhelgaas: commit log adapted from Serge's version] Fixes: `ec7b952f45` ("PCI: dwc: Always enable CDM check if "snps,enable-cdm-check" exists") Link: https://lore.kernel.org/r/20230310123510.675685-2-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-04-06 12:12:32 +02:00
Niklas Schnelle	b99ebf4b62	PCI: s390: Fix use-after-free of PCI resources with per-function hotplug [ Upstream commit `ab90950985` ] On s390 PCI functions may be hotplugged individually even when they belong to a multi-function device. In particular on an SR-IOV device VFs may be removed and later re-added. In commit `a50297cf82` ("s390/pci: separate zbus creation from scanning") it was missed however that struct pci_bus and struct zpci_bus's resource list retained a reference to the PCI functions MMIO resources even though those resources are released and freed on hot-unplug. These stale resources may subsequently be claimed when the PCI function re-appears resulting in use-after-free. One idea of fixing this use-after-free in s390 specific code that was investigated was to simply keep resources around from the moment a PCI function first appeared until the whole virtual PCI bus created for a multi-function device disappears. The problem with this however is that due to the requirement of artificial MMIO addreesses (address cookies) extra logic is then needed to keep the address cookies compatible on re-plug. At the same time the MMIO resources semantically belong to the PCI function so tying their lifecycle to the function seems more logical. Instead a simpler approach is to remove the resources of an individually hot-unplugged PCI function from the PCI bus's resource list while keeping the resources of other PCI functions on the PCI bus untouched. This is done by introducing pci_bus_remove_resource() to remove an individual resource. Similarly the resource also needs to be removed from the struct zpci_bus's resource list. It turns out however, that there is really no need to add the MMIO resources to the struct zpci_bus's resource list at all and instead we can simply use the zpci_bar_struct's resource pointer directly. Fixes: `a50297cf82` ("s390/pci: separate zbus creation from scanning") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-22 13:37:46 +01:00
Manivannan Sadhasivam	aa2295bf75	PCI: pciehp: Add Qualcomm quirk for Command Completed erratum [ Upstream commit `82b34b0800` ] The Qualcomm PCI bridge device (Device ID 0x010e) found in chipsets such as SC8280XP used in Lenovo Thinkpad X13s, does not set the Command Completed bit unless writes to the Slot Command register change "Control" bits. This results in timeouts like below during boot and resume from suspend: pcieport 0002:00:00.0: pciehp: Timeout on hotplug command 0x03c0 (issued 2020 msec ago) ... pcieport 0002:00:00.0: pciehp: Timeout on hotplug command 0x13f1 (issued 107724 msec ago) Add the device to the Command Completed quirk to mark commands "completed" immediately unless they change the "Control" bits. Link: https://lore.kernel.org/r/20230213144922.89982-1-manivannan.sadhasivam@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:46 +01:00
Mengyuan Lou	d1200162c2	PCI: Add ACS quirk for Wangxun NICs [ Upstream commit `a2b9b123cc` ] Wangxun has verified there is no peer-to-peer between functions for the below selection of SFxxx, RP1000 and RP2000 NICS. They may be multi-function devices, but the hardware does not advertise ACS capability. Add an ACS quirk for these devices so the functions can be in independent IOMMU groups. Link: https://lore.kernel.org/r/20230207102419.44326-1-mengyuanlou@net-swift.com Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:45 +01:00
Huacai Chen	3dd596f248	PCI: loongson: Add more devices that need MRRS quirk [ Upstream commit `c768f8c5f4` ] Loongson-2K SOC and LS7A2000 chipset add new PCI IDs that need MRRS quirk. Add them. Link: https://lore.kernel.org/r/20230211023321.3530080-1-chenhuacai@loongson.cn Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:45 +01:00
Mika Westerberg	aa6030a4d0	PCI: Distribute available resources for root buses, too [ Upstream commit `7180c1d086` ] Previously we distributed spare resources only upon hot-add, so if the initial root bus scan found devices that had not been fully configured by the BIOS, we allocated only enough resources to cover what was then present. If some of those devices were hotplug bridges, we did not leave any additional resource space for future expansion. Distribute the available resources for root buses, too, to make this work the same way as the normal hotplug case. A previous commit to do this was reverted due to a regression reported by Jonathan Cameron: `e96e27fc6f` ("PCI: Distribute available resources for root buses, too") `5632e2beaf` ("Revert "PCI: Distribute available resources for root buses, too"") This commit changes pci_bridge_resources_not_assigned() to work with bridges that do not have all the resource windows programmed by the boot firmware (previously we expected all I/O, memory and prefetchable memory were programmed). Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000 Link: https://lore.kernel.org/r/20220905080232.36087-5-mika.westerberg@linux.intel.com Link: https://lore.kernel.org/r/20230131092405.29121-4-mika.westerberg@linux.intel.com Reported-by: Chris Chiu <chris.chiu@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:44 +01:00
Mika Westerberg	6208bdb654	PCI: Take other bus devices into account when distributing resources [ Upstream commit `9db0b9b6a1` ] A PCI bridge may reside on a bus with other devices as well. The resource distribution code does not take this into account and therefore it expands the bridge resource windows too much, not leaving space for the other devices (or functions of a multifunction device). This leads to an issue that Jonathan reported when running QEMU with the following topology (QEMU parameters): -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2 \ -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \ -device e1000,bus=root_port13,addr=0.1 \ -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3 \ -device e1000,bus=fun1 The first e1000 NIC here is another function in the switch upstream port. This leads to following errors: pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04] pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04] pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000] e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0] Fix this by taking into account bridge windows, device BARs and SR-IOV PF BARs on the bus (PF BARs include space for VF BARS so only account PF BARs), including the ones belonging to bridges themselves if it has any. Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/ Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@ixsystems.com/ Link: https://lore.kernel.org/r/20230131092405.29121-3-mika.westerberg@linux.intel.com Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: Alexander Motin <mav@ixsystems.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:44 +01:00
Mika Westerberg	730b81ea89	PCI: Align extra resources for hotplug bridges properly [ Upstream commit `08f0a15ee8` ] After division the extra resource space per hotplug bridge may not be aligned according to the window alignment, so align it before passing it down for further distribution. Link: https://lore.kernel.org/r/20230131092405.29121-2-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:44 +01:00
Huacai Chen	f45374c1b2	PCI: loongson: Prevent LS7A MRRS increases [ Upstream commit `8b3517f88f` ] Except for isochronous-configured devices, software may set Max_Read_Request_Size (MRRS) to any value up to 4096. If a device issues a read request with size greater than the completer's Max_Payload_Size (MPS), the completer is required to break the response into multiple completions. Instead of correctly responding with multiple completions to a large read request, some LS7A Root Ports respond with a Completer Abort. To prevent this, the MRRS must be limited to an implementation-specific value. The OS cannot detect that value, so rely on BIOS to configure MRRS before booting, and quirk the Root Ports so we never set an MRRS larger than that BIOS value for any downstream device. N.B. Hot-added devices are not configured by BIOS, and they power up with MRRS = 512 bytes, so these devices will be limited to 512 bytes. If the LS7A limit is smaller, those hot-added devices may not work correctly, but per [1], hotplug is not supported with this chipset revision. [1] https://lore.kernel.org/r/073638a7-ae68-2847-ac3d-29e5e760d6af@loongson.cn [bhelgaas: commit log] Link: https://bugzilla.kernel.org/show_bug.cgi?id=216884 Link: https://lore.kernel.org/r/20230201043018.778499-3-chenhuacai@loongson.cn Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:41 +01:00
Huacai Chen	104c82d862	PCI/portdrv: Prevent LS7A Bus Master clearing on shutdown [ Upstream commit `62b6dee1b4` ] After `cc27b735ad` ("PCI/portdrv: Turn off PCIe services during shutdown") we observe hangs during poweroff/reboot on systems with LS7A chipset. This happens because the portdrv .shutdown() method (pcie_portdrv_remove()) clears PCI_COMMAND_MASTER via pci_disable_device(), which prevents bridges from forwarding memory or I/O Requests in the upstream direction (PCIe r6.0, sec 7.5.1.1.3). LS7A Root Ports have a hardware defect: clearing PCI_COMMAND_MASTER also prevents the bridge from forwarding CPU MMIO requests in the downstream direction, and these MMIO accesses to devices below the bridge happen even after .shutdown(), e.g., to print console messages. LS7A neither forwards the requests nor sends an unsuccessful completion to the CPU, so the CPU waits forever, resulting in the hang. The purpose of .shutdown() is to disable interrupts and DMA from the device. PCIe ports may generate interrupts (either MSI/MSI-X or INTx) for AER, DPC, PME, hotplug, etc., but they never perform DMA except MSI/MSI-X. Clearing PCI_COMMAND_MASTER effectively disables MSI/MSI-X, but not INTx. The port service driver .remove() methods clear the interrupt enables in PCI_ERR_ROOT_COMMAND, PCI_EXP_DPC_CTL, PCI_EXP_SLTCTL, and PCI_EXP_RTCTL, etc., which disables interrupts regardless of whether they are MSI/MSI-X or INTx. Add a pcie_portdrv_shutdown() method that calls all the port service driver .remove() methods to clear the interrupt enables for each service but does not clear Bus Mastering on the port itself. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20230201043018.778499-2-chenhuacai@loongson.cn Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:41 +01:00
Rafael J. Wysocki	131d91b5d3	PCI/ACPI: Account for _S0W of the target bridge in acpi_pci_bridge_d3() [ Upstream commit `8133844a8f` ] It is questionable to allow a PCI bridge to go into D3 if it has _S0W returning D2 or a shallower power state, so modify acpi_pci_bridge_d3(() to always take the return value of _S0W for the target bridge into account. That is, make it return 'false' if _S0W returns D2 or a shallower power state for the target bridge regardless of its ancestor Root Port properties. Of course, this also causes 'false' to be returned if the Root Port itself is the target and its _S0W returns D2 or a shallower power state. However, still allow bridges without _S0W that are power-manageable via ACPI to enter D3 to retain the current code behavior in that case. This fixes problems where a hotplug notification is missed because a bridge is in D3. That means hot-added devices such as USB4 docks (and the devices they contain) and Thunderbolt 3 devices may not work. Link: https://lore.kernel.org/linux-pci/20221031223356.32570-1-mario.limonciello@amd.com/ Link: https://lore.kernel.org/r/12155458.O9o76ZdvQC@kreacher Reported-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-11 13:50:38 +01:00
Lukas Wunner	0081032082	PCI/DPC: Await readiness of secondary bus after reset commit `53b54ad074` upstream. pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus Reset, but not after a DPC-induced Hot Reset. As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not observed and devices on the secondary bus may be accessed before they're ready. One affected device is Intel's Ponte Vecchio HPC GPU. It comprises a PCIe switch whose upstream port is not immediately ready after reset. Because its config space is restored too early, it remains in D0uninitialized, its subordinate devices remain inaccessible and DPC recovery fails with messages such as: i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible) intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible) pcieport 0000:89:02.0: AER: device recovery failed Fix it. Link: https://lore.kernel.org/r/9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:29:54 +01:00
Damien Le Moal	07a966891b	PCI: Avoid FLR for AMD FCH AHCI adapters commit `63ba51db24` upstream. PCI passthrough to VMs does not work with AMD FCH AHCI adapters: the guest OS fails to correctly probe devices attached to the controller due to FIS communication failures: ata4: softreset failed (1st FIS failed) ... ata4.00: qc timeout after 5000 msecs (cmd 0xec) ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) Forcing the "bus" reset method before unbinding & binding the adapter to the vfio-pci driver solves this issue, e.g.: echo "bus" > /sys/bus/pci/devices/<ID>/reset_method gives a working guest OS, indicating that the default FLR reset method doesn't work correctly. Apply quirk_no_flr() to AMD FCH AHCI devices to work around this issue. Link: https://lore.kernel.org/r/20230128013951.523247-1-damien.lemoal@opensource.wdc.com Reported-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:29:54 +01:00
Lukas Wunner	3f1719324e	PCI: hotplug: Allow marking devices as disconnected during bind/unbind commit `74ff8864cc` upstream. On surprise removal, pciehp_unconfigure_device() and acpiphp's trim_stale_devices() call pci_dev_set_disconnected() to mark removed devices as permanently offline. Thereby, the PCI core and drivers know to skip device accesses. However pci_dev_set_disconnected() takes the device_lock and thus waits for a concurrent driver bind or unbind to complete. As a result, the driver's ->probe and ->remove hooks have no chance to learn that the device is gone. That doesn't make any sense, so drop the device_lock and instead use atomic xchg() and cmpxchg() operations to update the device state. As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which occurs on surprise removal with AER concurrently performing a bus reset. AER bus reset: INFO: task irq/26-aerdrv:95 blocked for more than 120 seconds. Tainted: G W 6.2.0-rc3-custom-norework-jan11+ schedule rwsem_down_write_slowpath down_write_nested pciehp_reset_slot # acquires reset_lock pci_reset_hotplug_slot pci_slot_reset # acquires device_lock pci_bus_error_reset aer_root_reset pcie_do_recovery aer_process_err_devices aer_isr pciehp surprise removal: INFO: task irq/26-pciehp:96 blocked for more than 120 seconds. Tainted: G W 6.2.0-rc3-custom-norework-jan11+ schedule_preempt_disabled __mutex_lock mutex_lock_nested pci_dev_set_disconnected # acquires device_lock pci_walk_bus pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist # acquires reset_lock Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590 Fixes: `a6bd101b8f` ("PCI: Unify device inaccessible") Link: https://lore.kernel.org/r/3dc88ea82bdc0e37d9000e413d5ebce481cbd629.1674205689.git.lukas@wunner.de Reported-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.20+ Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:29:54 +01:00
Lukas Wunner	145cf271d5	PCI: Unify delay handling for reset and resume commit `ac91e69805` upstream. Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait for devices on the secondary bus to become accessible after reset: Although it does call pci_dev_wait(), it erroneously passes the bridge's pci_dev rather than that of a child. The bridge of course is always accessible while its secondary bus is reset, so pci_dev_wait() returns immediately. Sheng Bi proposes introducing a new pci_bridge_secondary_bus_wait() function which is called from pci_bridge_secondary_bus_reset(): https://lore.kernel.org/linux-pci/20220523171517.32407-1-windy.bi.enflame@gmail.com/ However we already have pci_bridge_wait_for_secondary_bus() which does almost exactly what we need. So far it's only called on resume from D3cold (which implies a Fundamental Reset per PCIe r6.0 sec 5.8). Re-using it for Secondary Bus Resets is a leaner and more rational approach than introducing a new function. That only requires a few minor tweaks: - Amend pci_bridge_wait_for_secondary_bus() to await accessibility of the first device on the secondary bus by calling pci_dev_wait() after performing the prescribed delays. pci_dev_wait() needs two parameters, a reset reason and a timeout, which callers must now pass to pci_bridge_wait_for_secondary_bus(). The timeout is 1 sec for resume (PCIe r6.0 sec 6.6.1) and 60 sec for reset (commit `821cdad5c4` ("PCI: Wait up to 60 seconds for device to become ready after FLR")). Introduce a PCI_RESET_WAIT macro for the 1 sec timeout. - Amend pci_bridge_wait_for_secondary_bus() to return 0 on success or -ENOTTY on error for consumption by pci_bridge_secondary_bus_reset(). - Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which is now performed by pci_bridge_wait_for_secondary_bus(). A static delay this long is only necessary for Conventional PCI, so modern PCIe systems benefit from shorter reset times as a side effect. Fixes: `6b2f1351af` ("PCI: Wait for device to become ready after secondary bus reset") Link: https://lore.kernel.org/r/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de Reported-by: Sheng Bi <windy.bi.enflame@gmail.com> Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:29:54 +01:00
Lukas Wunner	1ec8738fe3	PCI/PM: Observe reset delay irrespective of bridge_d3 commit `8ef0217227` upstream. If a PCI bridge is suspended to D3cold upon entering system sleep, resuming it entails a Fundamental Reset per PCIe r6.0 sec 5.8. The delay prescribed after a Fundamental Reset in PCIe r6.0 sec 6.6.1 is sought to be observed by: pci_pm_resume_noirq() pci_pm_bridge_power_up_actions() pci_bridge_wait_for_secondary_bus() However, pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3 flag is not set. That flag indicates whether a bridge is allowed to suspend to D3cold at runtime. Hence no delay is observed on resume from system sleep if runtime D3cold is forbidden. That doesn't make any sense, so drop the bridge_d3 check from pci_bridge_wait_for_secondary_bus(). The purpose of the bridge_d3 check was probably to avoid delays if a bridge remained in D0 during suspend. However the sole caller of pci_bridge_wait_for_secondary_bus(), pci_pm_bridge_power_up_actions(), is only invoked if the previous power state was D3cold. Hence the additional bridge_d3 check seems superfluous. Fixes: `ad9001f2f4` ("PCI/PM: Add missing link delays required by the PCIe spec") Link: https://lore.kernel.org/r/eb37fa345285ec8bacabbf06b020b803f77bdd3d.1673769517.git.lukas@wunner.de Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:29:53 +01:00
Johan Hovold	090745df5a	PCI: qcom: Fix host-init error handling [ Upstream commit `997e010de9` ] Implement the new host_deinit() callback so that the PHY is powered off and regulators and clocks are disabled also on late host-init errors. Link: https://lore.kernel.org/r/20221017114705.8277-2-johan+linaro@kernel.org Fixes: `82a823833f` ("PCI: qcom: Add Qualcomm PCIe controller driver") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:56 +01:00
Geert Uytterhoeven	7e6f2714d9	PCI: Fix dropping valid root bus resources with .end = zero [ Upstream commit `9d8ba74a18` ] On r8a7791/koelsch: kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) # cat /sys/kernel/debug/kmemleak unreferenced object 0xc3a34e00 (size 64): comm "swapper/0", pid 1, jiffies 4294937460 (age 199.080s) hex dump (first 32 bytes): b4 5d 81 f0 b4 5d 81 f0 c0 b0 a2 c3 00 00 00 00 .]...].......... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<fe3aa979>] __kmalloc+0xf0/0x140 [<34bd6bc0>] resource_list_create_entry+0x18/0x38 [<767046bc>] pci_add_resource_offset+0x20/0x68 [<b3f3edf2>] devm_of_pci_get_host_bridge_resources.constprop.0+0xb0/0x390 When coalescing two resources for a contiguous aperture, the second resource is enlarged to cover the full contiguous range, while the first resource is marked invalid. This invalidation is done by clearing the flags, start, and end members. When adding the initial resources to the bus later, invalid resources are skipped. Unfortunately, the check for an invalid resource considers only the end member, causing false positives. E.g. on r8a7791/koelsch, root bus resource 0 ("bus 00") is skipped, and no longer registered with pci_bus_insert_busn_res() (causing the memory leak), nor printed: pci-rcar-gen2 ee090000.pci: host bridge /soc/pci@ee090000 ranges: pci-rcar-gen2 ee090000.pci: MEM 0x00ee080000..0x00ee08ffff -> 0x00ee080000 pci-rcar-gen2 ee090000.pci: PCI: revision 11 pci-rcar-gen2 ee090000.pci: PCI host bridge to bus 0000:00 -pci_bus 0000:00: root bus resource [bus 00] pci_bus 0000:00: root bus resource [mem 0xee080000-0xee08ffff] Fix this by only skipping resources where all of the flags, start, and end members are zero. Fixes: `7c3855c423` ("PCI: Coalesce host bridge contiguous apertures") Link: https://lore.kernel.org/r/da0fcd5e86c74239be79c7cb03651c0fce31b515.1676036673.git.geert+renesas@glider.be Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:56 +01:00
Sergio Paracuellos	b6ad904081	PCI: mt7621: Delay phy ports initialization [ Upstream commit `0cb2a8f345` ] Some devices like ZBT WE1326 and ZBT WF3526-P and some Netgear models need to delay phy port initialization after calling the mt7621_pcie_init_port() driver function to get into reliable boots for both warm and hard resets. The delay required to detect the ports seems to be in the range [75-100] milliseconds. If the ports are not detected the controller is not functional. There is no datasheet or something similar to really understand why this extra delay is needed only for these devices and it is not for most of the boards that are built on mt7621 SoC. This issue has been reported by openWRT community and the complete discussion is in [0]. The 100 milliseconds delay has been tested in all devices to validate it. Add the extra 100 milliseconds delay to fix the issue. [0]: https://github.com/openwrt/openwrt/pull/11220 Link: https://lore.kernel.org/r/20221231074041.264738-1-sergio.paracuellos@gmail.com Fixes: `2bdd5238e7` ("PCI: mt7621: Add MediaTek MT7621 PCIe host controller driver") Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:54 +01:00
Yang Yingliang	cdf0f19cb9	PCI: endpoint: pci-epf-vntb: Add epf_ntb_mw_bar_clear() num_mws kernel-doc [ Upstream commit `fd858402c6` ] `8e4bfbe644` ("PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()") added a "num_mws" parameter to epf_ntb_mw_bar_clear() but failed to add kernel-doc for num_mws. Add kernel-doc for num_mws on epf_ntb_mw_bar_clear(). Fixes: `8e4bfbe644` ("PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()") Link: https://lore.kernel.org/r/20230103024907.293853-1-yangyingliang@huawei.com Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:49 +01:00
Bjorn Helgaas	ec4b595572	PCI: switchtec: Return -EFAULT for copy_to_user() errors [ Upstream commit `ddc10938e0` ] switchtec_dev_read() didn't handle copy_to_user() errors correctly: it assigned "rc = -EFAULT", but actually returned either "size", -ENXIO, or -EBADMSG instead. Update the failure cases to unlock mrpc_mutex and return -EFAULT directly. Link: https://lore.kernel.org/r/20221216162126.207863-3-helgaas@kernel.org Fixes: `080b47def5` ("MicroSemi Switchtec management interface driver") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:49 +01:00
Alexey V. Vissarionov	c3c277b7fd	PCI/IOV: Enlarge virtfn sysfs name buffer [ Upstream commit `ea0b5aa5f1` ] The sysfs link name "virtfn%u" constructed by pci_iov_sysfs_link() requires 17 bytes to contain the longest possible string. Increase VIRTFN_ID_LEN to accommodate that. Found by Linux Verification Center (linuxtesting.org) with SVACE. [bhelgaas: commit log, comment at #define] Fixes: `dd7cc44d0b` ("PCI: add SR-IOV API for Physical Function driver") Link: https://lore.kernel.org/r/20221218033347.23743-1-gremlin@altlinux.org Signed-off-by: Alexey V. Vissarionov <gremlin@altlinux.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-03-10 09:28:49 +01:00
Linus Torvalds	4cfd5afcd8	pci-v6.2-fixes-2 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPmuwEUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vzEYg//XHHddqDRiZmx9McETDAi33rJ9DDo CMCwiydUzGlDl/IDnBxwcmq0K5wiA5jFvXlRFmzHfnHGpWpRf6ntcT436QnhKe4G /DAXxVdZGWr079m7s4NKjByDunhkkkT/elapFCtZTwXxMkUvbprM0ozMdtSMnC/M RDCJKfaV2CKUkl/5Mk9Iw3vzrr62PP8fVHHMIr+6O39frZ2+MrzYCgpGkW0pubmT He0gmeVnNFzR6qB1GraXVNwlapjPjzvHe1IggDDLJRxM4+sz8qKJz0vKew10JwSo R5s8ACfTNtHwY45af1EWIeO9BoGD3soNLvWmK/5uNrCWJx9wnczQuz4b/Km2y02Y KCJaudiC6EfAzu5gCSgao3VZ/EQ45sHrYZN9qiyDujOgAUUPl0oonwa1HW/1WUSH Pd/ff9o78vASxdZP1o1hF0davNET1HOsvXGxQj71TJLXVsB2pifWvAoNocHHnpoe cPCix8t3c4pgXzI0RG04tcfqGWAgsaVz73SdU0/g5qk+hPRvypjcY1lw6U66sk9f /ZNII5fSX6hIWTetD27JiCZNOxJq1jikxOD4/LZizMTjdZYf6VxjDxkIaLS99pZw RCOQ8chKVemr12lD//8eFUJJvblug2aTlHIwFnMuKiavy6pL5Sm1zGMBrqhYmUSO pkNXzFaZe+GyF3k= =NSFX -----END PGP SIGNATURE----- Merge tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - Move to a shared PCI git tree (Bjorn Helgaas) - Add Krzysztof Wilczyński as another PCI maintainer (Lorenzo Pieralisi) - Revert a couple ASPM patches to fix suspend/resume regressions (Bjorn Helgaas) * tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming" Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume" MAINTAINERS: Promote Krzysztof to PCI controller maintainer MAINTAINERS: Move to shared PCI tree	2023-02-10 14:18:48 -08:00
Bjorn Helgaas	ff209ecc37	Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming" This reverts commit `5e85eba6f5`. Thomas Witt reported that `5e85eba6f5` ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") broke suspend/resume on a Tuxedo Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard. The main symptom is: iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible and the machine is only partially usable after resume. It can't run dmesg and can't do a clean reboot. This happens on every suspend/resume cycle. Revert `5e85eba6f5` until we can figure out the root cause. Fixes: `5e85eba6f5` ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Reported-by: Thomas Witt <kernel@witt.link> Tested-by: Thomas Witt <kernel@witt.link> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v6.1+ Cc: Vidya Sagar <vidyas@nvidia.com>	2023-02-10 15:30:24 -06:00
Bjorn Helgaas	a7152be79b	Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume" This reverts commit `4ff116d0d5`. Tasev Nikola and Mark Enriquez reported that resume from suspend was broken in v6.1-rc1. Tasev bisected to `a47126ec29` ("PCI/PTM: Cache PTM Capability offset"), but we can't figure out how that could be related. Mark saw the same symptoms and bisected to `4ff116d0d5` ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"), which does have a connection: it restores L1 Substates configuration while ASPM L1 may be enabled: pci_restore_state pci_restore_aspm_l1ss_state aspm_program_l1ss pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore pci_restore_pcie_state pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore which is a problem because PCIe r6.0, sec 5.5.4, requires that: If setting either or both of the enable bits for ASPM L1 PM Substates, both ports must be configured as described in this section while ASPM L1 is disabled. Separately, Thomas Witt reported that `5e85eba6f5` ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") broke suspend/resume, and it depends on `4ff116d0d5`. Revert `4ff116d0d5` ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") to fix the resume issue and enable revert of `5e85eba6f5` to fix the issue Thomas reported. Note that reverting `4ff116d0d5` means L1 Substates config may be lost on suspend/resume. As far as we know the system will use more power but will still work correctly. Fixes: `4ff116d0d5` ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be> Reported-by: Mark Enriquez <enriquezmark36@gmail.com> Reported-by: Thomas Witt <kernel@witt.link> Tested-by: Mark Enriquez <enriquezmark36@gmail.com> Tested-by: Thomas Witt <kernel@witt.link> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v6.1+ Cc: Vidya Sagar <vidyas@nvidia.com>	2023-02-10 15:29:53 -06:00
Linus Torvalds	9e058c2952	pci-v6.2-fixes-1 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9 796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+ FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3 ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh 6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3 k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc 3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf i8iTTUcQKCdqVW4= =Fv2v -----END PGP SIGNATURE----- Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fixes from Bjorn Helgaas: - Work around apparent firmware issue that made Linux reject MMCONFIG space, which broke PCI extended config space (Bjorn Helgaas) - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas Bulwahn) * tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space x86/pci: Simplify is_mmconf_reserved() messages PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN	2023-01-13 17:32:22 -06:00
Linus Torvalds	bad8c4a850	xen: branch for v6.2-rc4 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCY76ohgAKCRCAXGG7T9hj vo8fAP0XJ94B7asqcN4W3EyeyfqxUf1eZvmWRhrbKqpLnmHLaQEA/uJBkXL49Zj7 TTcbxR1coJ/hPwhtmONU4TNtCZ+RXw0= =2Ib5 -----END PGP SIGNATURE----- Merge tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - two cleanup patches - a fix of a memory leak in the Xen pvfront driver - a fix of a locking issue in the Xen hypervisor console driver * tag 'for-linus-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: free active map buffer on pvcalls_front_free_map hvc/xen: lock console list traversal x86/xen: Remove the unused function p2m_index() xen: make remove callback of xen driver void returned	2023-01-12 17:02:20 -06:00
Lukas Bulwahn	760d560f71	PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN `a474d3fbe2` ("PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN") removed PCI_MSI_IRQ_DOMAIN and changed all references to refer to PCI_MSI instead. `ba6ed462dc` ("PCI: dwc: Add Baikal-T1 PCIe controller support") independently added PCIE_BT1, depending on PCI_MSI_IRQ_DOMAIN. Both commits appeared in v6.2-rc1, so the latter missed the conversion from PCI_MSI_IRQ_DOMAIN to PCI_MSI. Update PCIE_BT1 to depend on PCI_MSI instead. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20221215103452.23131-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>	2023-01-04 06:06:52 -06:00
Linus Torvalds	e79041113b	phy-for-6.2 - New support: - Allwinner H616 USB PHY and A100 DPHY support - TI J721s2, J784s4 and J721e support - Freescale i.MX8MP PCIe PHY support - New driver for Renesas Ethernet SERDES supporting R-Car S4-8 - Qualcomm SM8450 PCIe1 PHY support in EP mode - Updates: - again a big pile of updates on qcom-qmp-* drivers following the driver split and reorganization merged earlier - Phy order of API calls documentation update -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmOfIbYACgkQfBQHDyUj g0fSbw//Rgfk+owGLWyJ3PxRXiDhZaJdBUQNuZEe46TjGKKHvWLJ4+ig6vrXlPgr 8mVte7jEMZubO7YE/1Vifv9xiFmjo+5R4//WlfkIwy/0SFR8+N+DPQiGU7i7ecov uzkFN26qsi4aQrKmxyadGJQzHipaLViBkr6fqfuFcmyDiFII0FoVa/mV7ZQlFtl3 cDv3leFnp3HQ9mr/mKhOSmbyWCEQHqQvjDwB50R915WfH9PLV2jYddfO4Cbwpr4r 7m7wX2EiFlQ1o2gwcFQdLiDkA8YL9Kw3wOChpbcCu4gOapJ+GWqCk0AqS9m8MMWF HnyAyHw3NxDagwV6sN19Xxa7XgkPJZPn6/92BfGYeD6H5gxmYwdROeU2/x6Qt1+z scTl1m6z8X9WWwjnWK1cqVqBPUXoJJ2smym6VBHh3f4AJAVmwZy+yyk1Oar5qa2M yDWV7nIRJQmXnuQ+XsG5rmXmmMwOuBgng4NsNX9PjhdVy6/1FUOJuMCr8ldPLAkG Lpg+GN8w6tn2G0bxrHzWeAOytxjK5XuXch99BHmXDl+NgIpp/6DuyddXmvG4nrvk R6eDv86UOQgGP2h7SujUm9f6RIWb3nJrYN27r+IHK/z5LjSMfylSSu13GvMjZkt4 Et5Q4Wk9MomHFQkhiTGTd9WlSvb497RgzKhBhMg/lJoSyTi9Eew= =4HRP -----END PGP SIGNATURE----- Merge tag 'phy-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy updates from Vinod Koul: "This tme we have again a big pile of qcom-qmp-* changes, one new driver and bunch of new hardware support. New hardware support: - Allwinner H616 USB PHY and A100 DPHY support - TI J721s2, J784s4 and J721e support - Freescale i.MX8MP PCIe PHY support - New driver for Renesas Ethernet SERDES supporting R-Car S4-8 - Qualcomm SM8450 PCIe1 PHY support in EP mode - Qualcomm SC8280XP PCIe PHY support (including x4 mode) - Fixed Qualcomm SC8280XP USB4-USB3-DP PHY DT bindings Updates: - A big pile of updates on qcom-qmp-* drivers following the driver split and reorganization merged earlier - Phy order of API calls documentation update" * tag 'phy-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (174 commits) phy: ti: phy-j721e-wiz: add j721s2-wiz-10g module support dt-bindings: phy-j721e-wiz: add j721s2 compatible string phy: use devm_platform_get_and_ioremap_resource() phy: allwinner: phy-sun6i-mipi-dphy: Add the A100 DPHY variant phy: allwinner: phy-sun6i-mipi-dphy: Add a variant power-on hook phy: allwinner: phy-sun6i-mipi-dphy: Set the enable bit last phy: allwinner: phy-sun6i-mipi-dphy: Make RX support optional dt-bindings: sun6i-a31-mipi-dphy: Add the A100 DPHY variant dt-bindings: sun6i-a31-mipi-dphy: Add the interrupts property phy: qcom-qmp-pcie: drop redundant clock allocation phy: qcom-qmp-usb: drop redundant clock allocation phy: qcom-qmp: drop unused type header phy: qcom-qmp-usb: drop sc8280xp reference-clock source dt-bindings: phy: qcom,sc8280xp-qmp-usb3-uni: drop reference-clock source phy: qcom-qmp-combo: add support for updated sc8280xp binding phy: qcom-qmp-combo: rename DP_PHY register pointer phy: qcom-qmp-combo: rename common-register pointers phy: qcom-qmp-combo: clean up DP clock callbacks phy: qcom-qmp-combo: separate clock and provider registration phy: qcom-qmp-combo: add clock registration helper ...	2022-12-19 08:40:58 -06:00
Dawei Li	7cffcade57	xen: make remove callback of xen driver void returned Since commit `fc7a6209d5` ("bus: Make remove callback return void") forces bus_type::remove be void-returned, it doesn't make much sense for any bus based driver implementing remove callbalk to return non-void to its caller. This change is for xen bus based drivers. Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dawei Li <set_pte_at@outlook.com> Link: https://lore.kernel.org/r/TYCP286MB23238119AB4DF190997075C9CAE39@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Juergen Gross <jgross@suse.com>	2022-12-15 16:06:10 +01:00
Linus Torvalds	c7020e1b34	pci-v6.2-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmOYpTIUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vxuZhAAhGjE8voLZeOYwxbvfL69hGTAZ+Me x2hqRWVhh/IGWXTTaoSLwSjMMokcmAKN5S/wv8qdCG5sB8EN8FyTBIZDy8PuRRdl 8UlqlBMSL+d4oSRDCnYLxFNcynLRNnmx2dfcdw9tJ4zjTLN8Y4o8PHFogR6pJ3MT sDC8S0myTQKXr4wAGzTZycKsiGManviYtByp6dCcKD3Oy5Q2uZ9OKO2DP2yQpn+F c3IJSV9oDz3KR8JVJ5Q1iz9cdMXbGwjkM3JLlHpxhedwjN4ErLumPutKcebtzO5C aTqabN7Nnzc4yJusAIfojFCWH7fgaYUyJ3pxcFyJ4tu4m9Last+2I5UB/kV2sYAD jWiCYx3sA/mRopNXOnrBGae+Lgy+sQnt8or0grySr0bK+b+ArAGis4uT4A0uASGO RUQdIQwz7zhHeQrwAladHWxnx4BEDNCatgfn38p4fklIYKydCY5nfZURMDvHezSR G6Nu08hoE9ZXlmkWTFw+5F23wPWKcCpzZj0hf7OroIouXUp8vqSFSqatH5vGkbCl bDswck9GdRJ2hl5SvFOeelaXkM42du45TMLU2JmIn6dYYFNrO93JgdvKSU7E2CpG AmDIpg1Idxo8fEPPGH1I7RVU5+ilzmmPQQY7poQW+va4/dEd/QVp1+ZZTDnMC1qk qi3ck22VdvPU2VU= =KULr -----END PGP SIGNATURE----- Merge tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and make more things static. - Make portdrv bind to Switch Ports that have AER. Previously, if these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant the Ports couldn't be suspended to low-power states. AER on these Ports doesn't use interrupts, and the AER driver doesn't need to claim them. - Assign PCI domain IDs using ida_alloc(), which makes host bridge add/remove work better. Resource management: - To work better with recent BIOSes that use EfiMemoryMappedIO for PCI host bridge apertures, remove those regions from the E820 map (E820 entries normally prevent us from allocating BARs). In v5.19, we added some quirks to disable E820 checking, but that's not very maintainable. EfiMemoryMappedIO means the OS needs to map the region for use by EFI runtime services; it shouldn't prevent OS from using it. PCIe native device hotplug: - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4 PCIe tunneling depends on native PCIe hotplug. - Enable Command Completed Interrupt only if supported to avoid user confusion from lspci output that says this is enabled but not supported. - Prevent pciehp from binding to Switch Upstream Ports; this happened because of interaction with acpiphp and caused devices below the Upstream Port to disappear. Power management: - Convert AGP drivers to generic power management. We hope to remove legacy power management from the PCI core eventually. Virtualization: - Fix pci_device_is_present(), which previously always returned "false" for VFs, causing virtio hangs when unbinding the driver. Miscellaneous: - Convert drivers to gpiod API to prepare for dropping some legacy code. - Fix DOE fencepost error for the maximum data object length. Baikal-T1 PCIe controller driver: - Add driver and DT bindings. Broadcom STB PCIe controller driver: - Enable Multi-MSI. - Delay 100ms after PERST# deassert to allow power and clocks to stabilize. - Configure Read Completion Boundary to 64 bytes. Freescale i.MX6 PCIe controller driver: - Initialize PHY before deasserting core reset to fix a regression in v6.0 on boards where the PHY provides the reference. - Fix imx6sx and imx8mq clock names in DT schema. Intel VMD host bridge driver: - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe SSDs in VT-d pass-through scenarios. - Disable MSI remapping, which gets re-enabled by firmware during suspend/resume. MediaTek PCIe Gen3 controller driver: - Add MT7986 and MT8195 support. Qualcomm PCIe controller driver: - Add SC8280XP/SA8540P basic interconnect support. Rockchip DesignWare PCIe controller driver: - Base DT schema on common Synopsys schema. Synopsys DesignWare PCIe core: - Collect DT items shared between Root Port and Endpoint (PERST GPIO, PHY info, clocks, resets, link speed, number of lanes, number of iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml. - Add dma-ranges support for Root Ports and Endpoints. - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to reduce code duplication. - Add generic names for clocks and resets to encourage more consistent naming across drivers using DesignWare IP. - Stop advertising PTM Responder role for Endpoints, which aren't allowed to be responders. TI J721E PCIe driver: - Add j721s2 host mode ID to DT schema. - Add interrupt properties to DT schema. Toshiba Visconti PCIe controller driver: - Fix interrupts array max constraints in DT schema" * tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits) x86/PCI: Use pr_info() when possible x86/PCI: Fix log message typo x86/PCI: Tidy E820 removal messages PCI: Skip allocate_resource() if too little space available efi/x86: Remove EfiMemoryMappedIO from E820 map PCI/portdrv: Allow AER service only for Root Ports & RCECs PCI: xilinx-nwl: Fix coding style violations PCI: mvebu: Switch to using gpiod API PCI: pciehp: Enable Command Completed Interrupt only if supported PCI: aardvark: Switch to using devm_gpiod_get_optional() dt-bindings: PCI: mediatek-gen3: add support for mt7986 dt-bindings: PCI: mediatek-gen3: add SoC based clock config dt-bindings: PCI: qcom: Allow 'dma-coherent' property PCI: mt7621: Add sentinel to quirks table PCI: vmd: Fix secondary bus reset for Intel bridges PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32) PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path ...	2022-12-14 09:54:10 -08:00
Linus Torvalds	08cdc21579	iommufd for 6.2 iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl s7fAu+CQ1pr9+9NKGevD+frw8Solsw4= =jJkd -----END PGP SIGNATURE----- Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...	2022-12-14 09:15:43 -08:00
Linus Torvalds	ce8a79d560	for-6.2/block-2022-12-08 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOScsgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpi5ID/9pLXFYOq1+uDjU0KO/MdjMjK8Ukr34lCnk WkajRLheE8JBKOFDE54XJk56sQSZHX9bTWqziar0h1fioh7FlQR/tVvzsERCm2M9 2y9THJNJygC68wgybStyiKlshFjl7TD7Kv5N9Y3xP3mkQygT+D6o8fXZk5xQbYyH YdFSoq4rJVHxRL03yzQiReGGIYdOUEQQh8l1FiLwLlKa3lXAey1KuxWIzksVN0KK aZB4QhiBpOiPgDHUVisq2XtyQjpZ2byoCImPzgrcqk9Jo4esvm/e6esrg4xlsvII LKFFkTmbVqjUZtFjqakFHmfuzVor4nU5f+xb90ZHExuuODYckkxWp5rWhf9QwqqI 0ik6WYgI1/5vnHnX8f2DYzOFQf9qa/rLgg0CshyUODlD6RfHa9vntqYvlIFkmOBd Q7KblIoK8YTzUS1M+v7X8JQ7gDR2KwygH37Da2KJS+vgvfIb8kJGr1ZORuhJuJJ7 Bl69gaNkHTHrqufp7UI64YXfueeuNu2J9z3zwzGoxeaFaofF/phDn0/2gCQE1fQI XBhsMw+ETqI6B2SPHMnzYDu2DM1S8ZTOYQlaD4G3uqgWnAM1tG707395uAy5yu4n D5azU1fVG4UocoNIyPujpaoSRs2zWZycEFEeUQkhyDDww/j4hlHi6H33eOnk0zsr wxzFGfvHfw== =k/vv -----END PGP SIGNATURE----- Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ...	2022-12-13 10:43:59 -08:00
Linus Torvalds	268325bda5	Random number generator updates for Linux 6.2-rc1. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6 /ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4 CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE= =QRhK -----END PGP SIGNATURE----- Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...	2022-12-12 16:22:22 -08:00
Linus Torvalds	c1f0fcd85d	cxl for 6.2 - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY5UpyAAKCRDfioYZHlFs Z0ttAP4uxCjIibKsFVyexpSgI4vaZqQ9yt9NesmPwonc0XookwD+PlwP6Xc0d0Ox t0gJ6+pwdh11NRzhcNE1pAaPcJZU4gs= =HAQk -----END PGP SIGNATURE----- Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.2. While it may seem backwards, the CXL update this time around includes some focus on CXL 1.x enabling where the work to date had been with CXL 2.0 (VH topologies) in mind. First generation CXL can mostly be supported via BIOS, similar to DDR, however it became clear there are use cases for OS native CXL error handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x hosts (Restricted CXL Host (RCH) topologies). So, this update brings RCH topologies into the Linux CXL device model. In support of the ongoing CXL 2.0+ enabling two new core kernel facilities are added. One is the ability for the kernel to flag collisions between userspace access to PCI configuration registers and kernel accesses. This is brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware mailbox over config-cycles. The other is a cpu_cache_invalidate_memregion() API that maps to wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest VMs and architectures that do not support it yet. The CXL paths that need it, dynamic memory region creation and security commands (erase / unlock), are disabled when it is not present. As for the CXL 2.0+ this cycle the subsystem gains support Persistent Memory Security commands, error handling in response to PCIe AER notifications, and support for the "XOR" host bridge interleave algorithm. Summary: - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits) cxl/region: Fix memdev reuse check cxl/pci: Remove endian confusion cxl/pci: Add some type-safety to the AER trace points cxl/security: Drop security command ioctl uapi cxl/mbox: Add variable output size validation for internal commands cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size cxl/security: Fix Get Security State output payload endian handling cxl: update names for interleave ways conversion macros cxl: update names for interleave granularity conversion macros cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry tools/testing/cxl: Require cache invalidation bypass cxl/acpi: Fail decoder add if CXIMS for HBIG is missing cxl/region: Fix spelling mistake "memergion" -> "memregion" cxl/regs: Fix sparse warning cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support tools/testing/cxl: Add an RCH topology cxl/port: Add RCD endpoint port enumeration cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem tools/testing/cxl: Add XOR Math support to cxl_test cxl/acpi: Support CXL XOR Interleave Math (CXIMS) ...	2022-12-12 13:55:31 -08:00
Linus Torvalds	9d33edb20f	Updates for the interrupt core and driver subsystem: - Core: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages contrary to the uniform and specification defined storage mechanisms for PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stranglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: \|--- device 1 [Vector]---[Remapping]---[PCI/MSI]--\|... \|--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: \|--- [PCI/MSI] device 1 [Vector]---[Remapping]---\|... \|--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: \|--- [PCI/MSI] device 1 \|--- [PCI/IMS] device 1 [Vector]---[Remapping]---\|... \|--- [PCI/MSI] device N \|--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. - Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2 wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7 fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+ CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5 /FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM 03JfvdxnueM3gw== =9erA -----END PGP SIGNATURE----- Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: \|--- device 1 [Vector]---[Remapping]---[PCI/MSI]--\|... \|--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: \|--- [PCI/MSI] device 1 [Vector]---[Remapping]---\|... \|--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: \|--- [PCI/MSI] device 1 \|--- [PCI/IMS] device 1 [Vector]---[Remapping]---\|... \|--- [PCI/MSI] device N \|--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...	2022-12-12 11:21:29 -08:00
Bjorn Helgaas	f826afe5ea	Merge branch 'pci/kbuild' - Remove unnecessary <linux/of_irq.h> includes (Bjorn Helgaas) * pci/kbuild: PCI: Drop of_match_ptr() to avoid unused variables PCI: Remove unnecessary <linux/of_irq.h> includes PCI: xgene-msi: Include <linux/irqdomain.h> explicitly PCI: mvebu: Include <linux/irqdomain.h> explicitly PCI: microchip: Include <linux/irqdomain.h> explicitly PCI: altera-msi: Include <linux/irqdomain.h> explicitly # Conflicts: # drivers/pci/controller/pci-mvebu.c	2022-12-10 10:36:52 -06:00
Bjorn Helgaas	e4d741e9e4	Merge branch 'pci/ctrl/xilinx' - Fix whitespace issues (Michal Simek) * pci/ctrl/xilinx: PCI: xilinx-nwl: Fix coding style violations	2022-12-10 10:36:42 -06:00
Bjorn Helgaas	4e5194733a	Merge branch 'pci/ctrl/mvebu' - Switch to the gpiod API so we can make of_get_named_gpio_flags() private (Dmitry Torokhov) * pci/ctrl/mvebu: PCI: mvebu: Switch to using gpiod API	2022-12-10 10:36:41 -06:00

1 2 3 4 5 ...

9918 commits