linux-stable/drivers/cxl/core/port.c

2292 lines
56 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
cxl/port: Fix cxl_test register enumeration regression The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 08:07:30 +00:00
#include <linux/platform_device.h>
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
#include <linux/memregion.h>
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
#include <linux/workqueue.h>
#include <linux/debugfs.h>
#include <linux/device.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/node.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
#include "core.h"
/**
* DOC: cxl core
*
* The CXL core provides a set of interfaces that can be consumed by CXL aware
* drivers. The interfaces allow for creation, modification, and destruction of
* regions, memory devices, ports, and decoders. CXL aware drivers must register
* with the CXL core via these interfaces in order to be able to participate in
* cross-device interleave coordination. The CXL core also establishes and
* maintains the bridge to the nvdimm subsystem.
*
* CXL core introduces sysfs hierarchy to control the devices that are
* instantiated by the core.
*/
/*
* All changes to the interleave configuration occur with this lock held
* for write.
*/
DECLARE_RWSEM(cxl_region_rwsem);
static DEFINE_IDA(cxl_port_ida);
static DEFINE_XARRAY(cxl_root_buses);
int cxl_num_decoders_committed(struct cxl_port *port)
{
lockdep_assert_held(&cxl_region_rwsem);
return port->commit_end + 1;
}
static ssize_t devtype_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
return sysfs_emit(buf, "%s\n", dev->type->name);
}
static DEVICE_ATTR_RO(devtype);
static int cxl_device_id(const struct device *dev)
{
if (dev->type == &cxl_nvdimm_bridge_type)
return CXL_DEVICE_NVDIMM_BRIDGE;
if (dev->type == &cxl_nvdimm_type)
return CXL_DEVICE_NVDIMM;
if (dev->type == CXL_PMEM_REGION_TYPE())
return CXL_DEVICE_PMEM_REGION;
if (dev->type == CXL_DAX_REGION_TYPE())
return CXL_DEVICE_DAX_REGION;
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
if (is_cxl_port(dev)) {
if (is_cxl_root(to_cxl_port(dev)))
return CXL_DEVICE_ROOT;
return CXL_DEVICE_PORT;
}
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
if (is_cxl_memdev(dev))
return CXL_DEVICE_MEMORY_EXPANDER;
if (dev->type == CXL_REGION_TYPE())
return CXL_DEVICE_REGION;
if (dev->type == &cxl_pmu_type)
return CXL_DEVICE_PMU;
return 0;
}
static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
return sysfs_emit(buf, CXL_MODALIAS_FMT "\n", cxl_device_id(dev));
}
static DEVICE_ATTR_RO(modalias);
static struct attribute *cxl_base_attributes[] = {
&dev_attr_devtype.attr,
&dev_attr_modalias.attr,
NULL,
};
struct attribute_group cxl_base_attribute_group = {
.attrs = cxl_base_attributes,
};
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
static ssize_t start_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_decoder *cxld = to_cxl_decoder(dev);
return sysfs_emit(buf, "%#llx\n", cxld->hpa_range.start);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
}
static DEVICE_ATTR_ADMIN_RO(start);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
static ssize_t size_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_decoder *cxld = to_cxl_decoder(dev);
return sysfs_emit(buf, "%#llx\n", range_len(&cxld->hpa_range));
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
}
static DEVICE_ATTR_RO(size);
#define CXL_DECODER_FLAG_ATTR(name, flag) \
static ssize_t name##_show(struct device *dev, \
struct device_attribute *attr, char *buf) \
{ \
struct cxl_decoder *cxld = to_cxl_decoder(dev); \
\
return sysfs_emit(buf, "%s\n", \
(cxld->flags & (flag)) ? "1" : "0"); \
} \
static DEVICE_ATTR_RO(name)
CXL_DECODER_FLAG_ATTR(cap_pmem, CXL_DECODER_F_PMEM);
CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
static ssize_t target_type_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cxl_decoder *cxld = to_cxl_decoder(dev);
switch (cxld->target_type) {
case CXL_DECODER_DEVMEM:
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
return sysfs_emit(buf, "accelerator\n");
case CXL_DECODER_HOSTONLYMEM:
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
return sysfs_emit(buf, "expander\n");
}
return -ENXIO;
}
static DEVICE_ATTR_RO(target_type);
static ssize_t emit_target_list(struct cxl_switch_decoder *cxlsd, char *buf)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
{
struct cxl_decoder *cxld = &cxlsd->cxld;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
ssize_t offset = 0;
int i, rc = 0;
for (i = 0; i < cxld->interleave_ways; i++) {
struct cxl_dport *dport = cxlsd->target[i];
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_dport *next = NULL;
if (!dport)
break;
if (i + 1 < cxld->interleave_ways)
next = cxlsd->target[i + 1];
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
rc = sysfs_emit_at(buf, offset, "%d%s", dport->port_id,
next ? "," : "");
if (rc < 0)
return rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
offset += rc;
}
return offset;
}
static ssize_t target_list_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
ssize_t offset;
int rc;
guard(rwsem_read)(&cxl_region_rwsem);
rc = emit_target_list(cxlsd, buf);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
if (rc < 0)
return rc;
offset = rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
rc = sysfs_emit_at(buf, offset, "\n");
if (rc < 0)
return rc;
return offset + rc;
}
static DEVICE_ATTR_RO(target_list);
static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode));
}
static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
enum cxl_decoder_mode mode;
ssize_t rc;
if (sysfs_streq(buf, "pmem"))
mode = CXL_DECODER_PMEM;
else if (sysfs_streq(buf, "ram"))
mode = CXL_DECODER_RAM;
else
return -EINVAL;
rc = cxl_dpa_set_mode(cxled, mode);
if (rc)
return rc;
return len;
}
static DEVICE_ATTR_RW(mode);
static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
cxl/hdm: Fix dpa translation locking The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is sufficient to assert that cxl_dpa_rwsem is held rather than acquire it in the helper. Otherwise, it triggers multiple lockdep reports: 1/ Tracing callbacks are in an atomic context that can not acquire sleeping locks: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] 2/ The rwsem is already held in the inject poison path: WARNING: possible recursive locking detected 6.7.0-rc2+ #12 Tainted: G W OE N -------------------------------------------- bash/1288 is trying to acquire lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core] but task is already holding lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core] [..] Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] __traceiter_cxl_poison+0x5c/0x80 [cxl_core] cxl_inject_poison+0x1bc/0x1e0 [cxl_core] This appears to have been an issue since the initial implementation and uncovered by the new cxl-poison.sh test [1]. That test is now passing with these changes. Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1] Cc: <stable@vger.kernel.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-07 03:11:14 +00:00
guard(rwsem_read)(&cxl_dpa_rwsem);
return sysfs_emit(buf, "%#llx\n", (u64)cxl_dpa_resource_start(cxled));
}
static DEVICE_ATTR_RO(dpa_resource);
static ssize_t dpa_size_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
resource_size_t size = cxl_dpa_size(cxled);
return sysfs_emit(buf, "%pa\n", &size);
}
static ssize_t dpa_size_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
unsigned long long size;
ssize_t rc;
rc = kstrtoull(buf, 0, &size);
if (rc)
return rc;
if (!IS_ALIGNED(size, SZ_256M))
return -EINVAL;
rc = cxl_dpa_free(cxled);
if (rc)
return rc;
if (size == 0)
return len;
rc = cxl_dpa_alloc(cxled, size);
if (rc)
return rc;
return len;
}
static DEVICE_ATTR_RW(dpa_size);
static ssize_t interleave_granularity_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct cxl_decoder *cxld = to_cxl_decoder(dev);
return sysfs_emit(buf, "%d\n", cxld->interleave_granularity);
}
static DEVICE_ATTR_RO(interleave_granularity);
static ssize_t interleave_ways_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cxl_decoder *cxld = to_cxl_decoder(dev);
return sysfs_emit(buf, "%d\n", cxld->interleave_ways);
}
static DEVICE_ATTR_RO(interleave_ways);
static ssize_t qos_class_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
return sysfs_emit(buf, "%d\n", cxlrd->qos_class);
}
static DEVICE_ATTR_RO(qos_class);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
static struct attribute *cxl_decoder_base_attrs[] = {
&dev_attr_start.attr,
&dev_attr_size.attr,
&dev_attr_locked.attr,
&dev_attr_interleave_granularity.attr,
&dev_attr_interleave_ways.attr,
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
NULL,
};
static struct attribute_group cxl_decoder_base_attribute_group = {
.attrs = cxl_decoder_base_attrs,
};
static struct attribute *cxl_decoder_root_attrs[] = {
&dev_attr_cap_pmem.attr,
&dev_attr_cap_ram.attr,
&dev_attr_cap_type2.attr,
&dev_attr_cap_type3.attr,
&dev_attr_target_list.attr,
&dev_attr_qos_class.attr,
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
SET_CXL_REGION_ATTR(create_pmem_region)
SET_CXL_REGION_ATTR(create_ram_region)
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
SET_CXL_REGION_ATTR(delete_region)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
NULL,
};
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
{
unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
static bool can_create_ram(struct cxl_root_decoder *cxlrd)
{
unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
if (a == CXL_REGION_ATTR(create_pmem_region) && !can_create_pmem(cxlrd))
return 0;
if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
return 0;
if (a == CXL_REGION_ATTR(delete_region) &&
!(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
return 0;
return a->mode;
}
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
static struct attribute_group cxl_decoder_root_attribute_group = {
.attrs = cxl_decoder_root_attrs,
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
.is_visible = cxl_root_decoder_visible,
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
};
static const struct attribute_group *cxl_decoder_root_attribute_groups[] = {
&cxl_decoder_root_attribute_group,
&cxl_decoder_base_attribute_group,
&cxl_base_attribute_group,
NULL,
};
static struct attribute *cxl_decoder_switch_attrs[] = {
&dev_attr_target_type.attr,
&dev_attr_target_list.attr,
SET_CXL_REGION_ATTR(region)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
NULL,
};
static struct attribute_group cxl_decoder_switch_attribute_group = {
.attrs = cxl_decoder_switch_attrs,
};
static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
&cxl_decoder_switch_attribute_group,
&cxl_decoder_base_attribute_group,
&cxl_base_attribute_group,
NULL,
};
static struct attribute *cxl_decoder_endpoint_attrs[] = {
&dev_attr_target_type.attr,
&dev_attr_mode.attr,
&dev_attr_dpa_size.attr,
&dev_attr_dpa_resource.attr,
SET_CXL_REGION_ATTR(region)
NULL,
};
static struct attribute_group cxl_decoder_endpoint_attribute_group = {
.attrs = cxl_decoder_endpoint_attrs,
};
static const struct attribute_group *cxl_decoder_endpoint_attribute_groups[] = {
&cxl_decoder_base_attribute_group,
&cxl_decoder_endpoint_attribute_group,
&cxl_base_attribute_group,
NULL,
};
static void __cxl_decoder_release(struct cxl_decoder *cxld)
{
struct cxl_port *port = to_cxl_port(cxld->dev.parent);
ida_free(&port->decoder_ida, cxld->id);
put_device(&port->dev);
}
static void cxl_endpoint_decoder_release(struct device *dev)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
{
struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
__cxl_decoder_release(&cxled->cxld);
kfree(cxled);
}
static void cxl_switch_decoder_release(struct device *dev)
{
struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
__cxl_decoder_release(&cxlsd->cxld);
kfree(cxlsd);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
}
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev)
{
if (dev_WARN_ONCE(dev, !is_root_decoder(dev),
"not a cxl_root_decoder device\n"))
return NULL;
return container_of(dev, struct cxl_root_decoder, cxlsd.cxld.dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_root_decoder, CXL);
static void cxl_root_decoder_release(struct device *dev)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
if (atomic_read(&cxlrd->region_id) >= 0)
memregion_free(atomic_read(&cxlrd->region_id));
__cxl_decoder_release(&cxlrd->cxlsd.cxld);
kfree(cxlrd);
}
static const struct device_type cxl_decoder_endpoint_type = {
.name = "cxl_decoder_endpoint",
.release = cxl_endpoint_decoder_release,
.groups = cxl_decoder_endpoint_attribute_groups,
};
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
static const struct device_type cxl_decoder_switch_type = {
.name = "cxl_decoder_switch",
.release = cxl_switch_decoder_release,
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
.groups = cxl_decoder_switch_attribute_groups,
};
static const struct device_type cxl_decoder_root_type = {
.name = "cxl_decoder_root",
.release = cxl_root_decoder_release,
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
.groups = cxl_decoder_root_attribute_groups,
};
bool is_endpoint_decoder(struct device *dev)
{
return dev->type == &cxl_decoder_endpoint_type;
}
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL);
bool is_root_decoder(struct device *dev)
{
return dev->type == &cxl_decoder_root_type;
}
EXPORT_SYMBOL_NS_GPL(is_root_decoder, CXL);
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
bool is_switch_decoder(struct device *dev)
{
return is_root_decoder(dev) || dev->type == &cxl_decoder_switch_type;
}
EXPORT_SYMBOL_NS_GPL(is_switch_decoder, CXL);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_decoder *to_cxl_decoder(struct device *dev)
{
if (dev_WARN_ONCE(dev,
!is_switch_decoder(dev) && !is_endpoint_decoder(dev),
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
"not a cxl_decoder device\n"))
return NULL;
return container_of(dev, struct cxl_decoder, dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_decoder, CXL);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev)
{
if (dev_WARN_ONCE(dev, !is_endpoint_decoder(dev),
"not a cxl_endpoint_decoder device\n"))
return NULL;
return container_of(dev, struct cxl_endpoint_decoder, cxld.dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_endpoint_decoder, CXL);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev)
{
if (dev_WARN_ONCE(dev, !is_switch_decoder(dev),
"not a cxl_switch_decoder device\n"))
return NULL;
return container_of(dev, struct cxl_switch_decoder, cxld.dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_switch_decoder, CXL);
static void cxl_ep_release(struct cxl_ep *ep)
{
put_device(ep->ep);
kfree(ep);
}
static void cxl_ep_remove(struct cxl_port *port, struct cxl_ep *ep)
{
if (!ep)
return;
xa_erase(&port->endpoints, (unsigned long) ep->ep);
cxl_ep_release(ep);
}
static void cxl_port_release(struct device *dev)
{
struct cxl_port *port = to_cxl_port(dev);
unsigned long index;
struct cxl_ep *ep;
xa_for_each(&port->endpoints, index, ep)
cxl_ep_remove(port, ep);
xa_destroy(&port->endpoints);
xa_destroy(&port->dports);
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
xa_destroy(&port->regions);
ida_free(&cxl_port_ida, port->id);
if (is_cxl_root(port))
kfree(to_cxl_root(port));
else
kfree(port);
}
static ssize_t decoders_committed_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cxl_port *port = to_cxl_port(dev);
int rc;
down_read(&cxl_region_rwsem);
rc = sysfs_emit(buf, "%d\n", cxl_num_decoders_committed(port));
up_read(&cxl_region_rwsem);
return rc;
}
static DEVICE_ATTR_RO(decoders_committed);
static struct attribute *cxl_port_attrs[] = {
&dev_attr_decoders_committed.attr,
NULL,
};
static struct attribute_group cxl_port_attribute_group = {
.attrs = cxl_port_attrs,
};
static const struct attribute_group *cxl_port_attribute_groups[] = {
&cxl_base_attribute_group,
&cxl_port_attribute_group,
NULL,
};
static const struct device_type cxl_port_type = {
.name = "cxl_port",
.release = cxl_port_release,
.groups = cxl_port_attribute_groups,
};
bool is_cxl_port(const struct device *dev)
{
return dev->type == &cxl_port_type;
}
EXPORT_SYMBOL_NS_GPL(is_cxl_port, CXL);
struct cxl_port *to_cxl_port(const struct device *dev)
{
if (dev_WARN_ONCE(dev, dev->type != &cxl_port_type,
"not a cxl_port device\n"))
return NULL;
return container_of(dev, struct cxl_port, dev);
}
EXPORT_SYMBOL_NS_GPL(to_cxl_port, CXL);
static void unregister_port(void *_port)
{
struct cxl_port *port = _port;
struct cxl_port *parent;
struct device *lock_dev;
if (is_cxl_root(port))
parent = NULL;
else
parent = to_cxl_port(port->dev.parent);
/*
* CXL root port's and the first level of ports are unregistered
* under the platform firmware device lock, all other ports are
* unregistered while holding their parent port lock.
*/
if (!parent)
lock_dev = port->uport_dev;
else if (is_cxl_root(parent))
lock_dev = parent->uport_dev;
else
lock_dev = &parent->dev;
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
device_lock_assert(lock_dev);
port->dead = true;
device_unregister(&port->dev);
}
static void cxl_unlink_uport(void *_port)
{
struct cxl_port *port = _port;
sysfs_remove_link(&port->dev.kobj, "uport");
}
static int devm_cxl_link_uport(struct device *host, struct cxl_port *port)
{
int rc;
rc = sysfs_create_link(&port->dev.kobj, &port->uport_dev->kobj,
"uport");
if (rc)
return rc;
return devm_add_action_or_reset(host, cxl_unlink_uport, port);
}
static void cxl_unlink_parent_dport(void *_port)
{
struct cxl_port *port = _port;
sysfs_remove_link(&port->dev.kobj, "parent_dport");
}
static int devm_cxl_link_parent_dport(struct device *host,
struct cxl_port *port,
struct cxl_dport *parent_dport)
{
int rc;
if (!parent_dport)
return 0;
rc = sysfs_create_link(&port->dev.kobj, &parent_dport->dport_dev->kobj,
"parent_dport");
if (rc)
return rc;
return devm_add_action_or_reset(host, cxl_unlink_parent_dport, port);
}
2022-04-21 15:33:13 +00:00
static struct lock_class_key cxl_port_key;
static struct cxl_port *cxl_port_alloc(struct device *uport_dev,
struct cxl_dport *parent_dport)
{
struct cxl_root *cxl_root __free(kfree) = NULL;
struct cxl_port *port, *_port __free(kfree) = NULL;
struct device *dev;
int rc;
/* No parent_dport, root cxl_port */
if (!parent_dport) {
cxl_root = kzalloc(sizeof(*cxl_root), GFP_KERNEL);
if (!cxl_root)
return ERR_PTR(-ENOMEM);
} else {
_port = kzalloc(sizeof(*port), GFP_KERNEL);
if (!_port)
return ERR_PTR(-ENOMEM);
}
rc = ida_alloc(&cxl_port_ida, GFP_KERNEL);
if (rc < 0)
return ERR_PTR(rc);
if (cxl_root)
port = &no_free_ptr(cxl_root)->port;
else
port = no_free_ptr(_port);
port->id = rc;
port->uport_dev = uport_dev;
/*
* The top-level cxl_port "cxl_root" does not have a cxl_port as
* its parent and it does not have any corresponding component
* registers as its decode is described by a fixed platform
* description.
*/
dev = &port->dev;
if (parent_dport) {
struct cxl_port *parent_port = parent_dport->port;
struct cxl_port *iter;
dev->parent = &parent_port->dev;
2022-04-21 15:33:13 +00:00
port->depth = parent_port->depth + 1;
port->parent_dport = parent_dport;
/*
* walk to the host bridge, or the first ancestor that knows
* the host bridge
*/
iter = port;
while (!iter->host_bridge &&
!is_cxl_root(to_cxl_port(iter->dev.parent)))
iter = to_cxl_port(iter->dev.parent);
if (iter->host_bridge)
port->host_bridge = iter->host_bridge;
2022-12-03 08:40:29 +00:00
else if (parent_dport->rch)
port->host_bridge = parent_dport->dport_dev;
else
port->host_bridge = iter->uport_dev;
dev_dbg(uport_dev, "host-bridge: %s\n",
dev_name(port->host_bridge));
2022-04-21 15:33:13 +00:00
} else
dev->parent = uport_dev;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
ida_init(&port->decoder_ida);
port->hdm_end = -1;
port->commit_end = -1;
xa_init(&port->dports);
xa_init(&port->endpoints);
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
xa_init(&port->regions);
device_initialize(dev);
2022-04-21 15:33:13 +00:00
lockdep_set_class_and_subclass(&dev->mutex, &cxl_port_key, port->depth);
device_set_pm_not_required(dev);
dev->bus = &cxl_bus_type;
dev->type = &cxl_port_type;
return port;
}
static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map,
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
resource_size_t component_reg_phys)
{
*map = (struct cxl_register_map) {
.host = host,
.reg_type = CXL_REGLOC_RBI_EMPTY,
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
.resource = component_reg_phys,
};
if (component_reg_phys == CXL_RESOURCE_NONE)
return 0;
map->reg_type = CXL_REGLOC_RBI_COMPONENT;
map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
return cxl_setup_regs(map);
}
cxl/port: Fix cxl_test register enumeration regression The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 08:07:30 +00:00
static int cxl_port_setup_regs(struct cxl_port *port,
resource_size_t component_reg_phys)
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
{
cxl/port: Fix cxl_test register enumeration regression The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 08:07:30 +00:00
if (dev_is_platform(port->uport_dev))
return 0;
return cxl_setup_comp_regs(&port->dev, &port->reg_map,
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
component_reg_phys);
}
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
cxl/port: Fix cxl_test register enumeration regression The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 08:07:30 +00:00
resource_size_t component_reg_phys)
{
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
int rc;
cxl/port: Fix cxl_test register enumeration regression The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15 08:07:30 +00:00
if (dev_is_platform(dport->dport_dev))
return 0;
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
/*
* use @dport->dport_dev for the context for error messages during
* register probing, and fixup @host after the fact, since @host may be
* NULL.
*/
rc = cxl_setup_comp_regs(dport->dport_dev, &dport->reg_map,
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
component_reg_phys);
dport->reg_map.host = host;
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
return rc;
}
static struct cxl_port *__devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport)
{
struct cxl_port *port;
struct device *dev;
int rc;
port = cxl_port_alloc(uport_dev, parent_dport);
if (IS_ERR(port))
return port;
dev = &port->dev;
if (is_cxl_memdev(uport_dev)) {
struct cxl_memdev *cxlmd = to_cxl_memdev(uport_dev);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
rc = dev_set_name(dev, "endpoint%d", port->id);
if (rc)
goto err;
/*
* The endpoint driver already enumerated the component and RAS
* registers. Reuse that enumeration while prepping them to be
* mapped by the cxl_port driver.
*/
port->reg_map = cxlds->reg_map;
port->reg_map.host = &port->dev;
} else if (parent_dport) {
rc = dev_set_name(dev, "port%d", port->id);
if (rc)
goto err;
rc = cxl_port_setup_regs(port, component_reg_phys);
if (rc)
goto err;
} else
rc = dev_set_name(dev, "root%d", port->id);
cxl/port: Store the port's Component Register mappings in struct cxl_port CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-22 20:55:10 +00:00
if (rc)
goto err;
rc = device_add(dev);
if (rc)
goto err;
rc = devm_add_action_or_reset(host, unregister_port, port);
if (rc)
return ERR_PTR(rc);
rc = devm_cxl_link_uport(host, port);
if (rc)
return ERR_PTR(rc);
rc = devm_cxl_link_parent_dport(host, port, parent_dport);
if (rc)
return ERR_PTR(rc);
if (parent_dport && dev_is_pci(uport_dev))
port->pci_latency = cxl_pci_get_latency(to_pci_dev(uport_dev));
return port;
err:
put_device(dev);
return ERR_PTR(rc);
}
/**
* devm_cxl_add_port - register a cxl_port in CXL memory decode hierarchy
* @host: host device for devm operations
* @uport_dev: "physical" device implementing this upstream port
* @component_reg_phys: (optional) for configurable cxl_port instances
* @parent_dport: next hop up in the CXL memory decode hierarchy
*/
struct cxl_port *devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport)
{
struct cxl_port *port, *parent_port;
port = __devm_cxl_add_port(host, uport_dev, component_reg_phys,
parent_dport);
parent_port = parent_dport ? parent_dport->port : NULL;
if (IS_ERR(port)) {
dev_dbg(uport_dev, "Failed to add%s%s%s: %ld\n",
parent_port ? " port to " : "",
parent_port ? dev_name(&parent_port->dev) : "",
parent_port ? "" : " root port",
PTR_ERR(port));
} else {
dev_dbg(uport_dev, "%s added%s%s%s\n",
dev_name(&port->dev),
parent_port ? " to " : "",
parent_port ? dev_name(&parent_port->dev) : "",
parent_port ? "" : " (root port)");
}
return port;
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_port, CXL);
struct cxl_root *devm_cxl_add_root(struct device *host,
const struct cxl_root_ops *ops)
{
struct cxl_root *cxl_root;
struct cxl_port *port;
port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
if (IS_ERR(port))
return (struct cxl_root *)port;
cxl_root = to_cxl_root(port);
cxl_root->ops = ops;
return cxl_root;
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_root, CXL);
struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port)
{
/* There is no pci_bus associated with a CXL platform-root port */
if (is_cxl_root(port))
return NULL;
if (dev_is_pci(port->uport_dev)) {
struct pci_dev *pdev = to_pci_dev(port->uport_dev);
return pdev->subordinate;
}
return xa_load(&cxl_root_buses, (unsigned long)port->uport_dev);
}
EXPORT_SYMBOL_NS_GPL(cxl_port_to_pci_bus, CXL);
static void unregister_pci_bus(void *uport_dev)
{
xa_erase(&cxl_root_buses, (unsigned long)uport_dev);
}
int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
struct pci_bus *bus)
{
int rc;
if (dev_is_pci(uport_dev))
return -EINVAL;
rc = xa_insert(&cxl_root_buses, (unsigned long)uport_dev, bus,
GFP_KERNEL);
if (rc)
return rc;
return devm_add_action_or_reset(host, unregister_pci_bus, uport_dev);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_register_pci_bus, CXL);
static bool dev_is_cxl_root_child(struct device *dev)
{
struct cxl_port *port, *parent;
if (!is_cxl_port(dev))
return false;
port = to_cxl_port(dev);
if (is_cxl_root(port))
return false;
parent = to_cxl_port(port->dev.parent);
if (is_cxl_root(parent))
return true;
return false;
}
struct cxl_root *find_cxl_root(struct cxl_port *port)
{
struct cxl_port *iter = port;
while (iter && !is_cxl_root(iter))
iter = to_cxl_port(iter->dev.parent);
if (!iter)
return NULL;
get_device(&iter->dev);
return to_cxl_root(iter);
}
EXPORT_SYMBOL_NS_GPL(find_cxl_root, CXL);
void put_cxl_root(struct cxl_root *cxl_root)
{
if (!cxl_root)
return;
put_device(&cxl_root->port.dev);
}
EXPORT_SYMBOL_NS_GPL(put_cxl_root, CXL);
static struct cxl_dport *find_dport(struct cxl_port *port, int id)
{
struct cxl_dport *dport;
unsigned long index;
device_lock_assert(&port->dev);
xa_for_each(&port->dports, index, dport)
if (dport->port_id == id)
return dport;
return NULL;
}
static int add_dport(struct cxl_port *port, struct cxl_dport *dport)
{
struct cxl_dport *dup;
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04 00:30:54 +00:00
int rc;
device_lock_assert(&port->dev);
dup = find_dport(port, dport->port_id);
if (dup) {
dev_err(&port->dev,
"unable to add dport%d-%s non-unique port id (%s)\n",
dport->port_id, dev_name(dport->dport_dev),
dev_name(dup->dport_dev));
return -EBUSY;
}
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04 00:30:54 +00:00
rc = xa_insert(&port->dports, (unsigned long)dport->dport_dev, dport,
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04 00:30:54 +00:00
GFP_KERNEL);
if (rc)
return rc;
port->nr_dports++;
return 0;
}
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
/*
* Since root-level CXL dports cannot be enumerated by PCI they are not
* enumerated by the common port driver that acquires the port lock over
* dport add/remove. Instead, root dports are manually added by a
* platform driver and cond_cxl_root_lock() is used to take the missing
* port lock in that case.
*/
static void cond_cxl_root_lock(struct cxl_port *port)
{
if (is_cxl_root(port))
device_lock(&port->dev);
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
}
static void cond_cxl_root_unlock(struct cxl_port *port)
{
if (is_cxl_root(port))
device_unlock(&port->dev);
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
}
static void cxl_dport_remove(void *data)
{
struct cxl_dport *dport = data;
struct cxl_port *port = dport->port;
xa_erase(&port->dports, (unsigned long) dport->dport_dev);
put_device(dport->dport_dev);
}
static void cxl_dport_unlink(void *data)
{
struct cxl_dport *dport = data;
struct cxl_port *port = dport->port;
char link_name[CXL_TARGET_STRLEN];
sprintf(link_name, "dport%d", dport->port_id);
sysfs_remove_link(&port->dev.kobj, link_name);
}
2022-12-03 08:40:29 +00:00
static struct cxl_dport *
__devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev,
int port_id, resource_size_t component_reg_phys,
resource_size_t rcrb)
{
char link_name[CXL_TARGET_STRLEN];
struct cxl_dport *dport;
struct device *host;
int rc;
if (is_cxl_root(port))
host = port->uport_dev;
else
host = &port->dev;
if (!host->driver) {
dev_WARN_ONCE(&port->dev, 1, "dport:%s bad devm context\n",
dev_name(dport_dev));
return ERR_PTR(-ENXIO);
}
if (snprintf(link_name, CXL_TARGET_STRLEN, "dport%d", port_id) >=
CXL_TARGET_STRLEN)
return ERR_PTR(-EINVAL);
dport = devm_kzalloc(host, sizeof(*dport), GFP_KERNEL);
if (!dport)
return ERR_PTR(-ENOMEM);
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
dport->dport_dev = dport_dev;
dport->port_id = port_id;
dport->port = port;
if (rcrb == CXL_RESOURCE_NONE) {
rc = cxl_dport_setup_regs(&port->dev, dport,
component_reg_phys);
if (rc)
return ERR_PTR(rc);
} else {
dport->rcrb.base = rcrb;
component_reg_phys = __rcrb_to_component(dport_dev, &dport->rcrb,
2023-06-25 18:35:20 +00:00
CXL_RCRB_DOWNSTREAM);
if (component_reg_phys == CXL_RESOURCE_NONE) {
dev_warn(dport_dev, "Invalid Component Registers in RCRB");
return ERR_PTR(-ENXIO);
}
cxl/port: Fix @host confusion in cxl_dport_setup_regs() commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-18 17:16:56 +00:00
/*
* RCH @dport is not ready to map until associated with its
* memdev
*/
rc = cxl_dport_setup_regs(NULL, dport, component_reg_phys);
if (rc)
return ERR_PTR(rc);
2023-06-25 18:35:20 +00:00
dport->rch = true;
}
if (component_reg_phys != CXL_RESOURCE_NONE)
dev_dbg(dport_dev, "Component Registers found for dport: %pa\n",
&component_reg_phys);
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
cond_cxl_root_lock(port);
rc = add_dport(port, dport);
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
cond_cxl_root_unlock(port);
if (rc)
return ERR_PTR(rc);
get_device(dport_dev);
rc = devm_add_action_or_reset(host, cxl_dport_remove, dport);
if (rc)
return ERR_PTR(rc);
rc = sysfs_create_link(&port->dev.kobj, &dport_dev->kobj, link_name);
if (rc)
return ERR_PTR(rc);
rc = devm_add_action_or_reset(host, cxl_dport_unlink, dport);
if (rc)
return ERR_PTR(rc);
if (dev_is_pci(dport_dev))
dport->link_latency = cxl_pci_get_latency(to_pci_dev(dport_dev));
return dport;
}
/**
2022-12-03 08:40:29 +00:00
* devm_cxl_add_dport - append VH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
* @component_reg_phys: optional location of CXL component registers
*
* Note that dports are appended to the devm release action's of the
* either the port's host (for root ports), or the port itself (for
* switch ports)
*/
struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
struct device *dport_dev, int port_id,
resource_size_t component_reg_phys)
{
struct cxl_dport *dport;
dport = __devm_cxl_add_dport(port, dport_dev, port_id,
2022-12-03 08:40:29 +00:00
component_reg_phys, CXL_RESOURCE_NONE);
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
} else {
dev_dbg(dport_dev, "dport added to %s\n",
dev_name(&port->dev));
}
return dport;
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_dport, CXL);
2022-12-03 08:40:29 +00:00
/**
* devm_cxl_add_rch_dport - append RCH downstream port data to a cxl_port
* @port: the cxl_port that references this dport
* @dport_dev: firmware or PCI device representing the dport
* @port_id: identifier for this dport in a decoder's target list
* @rcrb: mandatory location of a Root Complex Register Block
*
* See CXL 3.0 9.11.8 CXL Devices Attached to an RCH
*/
struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev, int port_id,
resource_size_t rcrb)
{
struct cxl_dport *dport;
if (rcrb == CXL_RESOURCE_NONE) {
dev_dbg(&port->dev, "failed to add RCH dport, missing RCRB\n");
return ERR_PTR(-EINVAL);
}
dport = __devm_cxl_add_dport(port, dport_dev, port_id,
2023-06-25 18:35:20 +00:00
CXL_RESOURCE_NONE, rcrb);
2022-12-03 08:40:29 +00:00
if (IS_ERR(dport)) {
dev_dbg(dport_dev, "failed to add RCH dport to %s: %ld\n",
dev_name(&port->dev), PTR_ERR(dport));
} else {
dev_dbg(dport_dev, "RCH dport added to %s\n",
dev_name(&port->dev));
}
return dport;
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL);
static int add_ep(struct cxl_ep *new)
{
struct cxl_port *port = new->dport->port;
int rc;
device_lock(&port->dev);
if (port->dead) {
device_unlock(&port->dev);
return -ENXIO;
}
rc = xa_insert(&port->endpoints, (unsigned long)new->ep, new,
GFP_KERNEL);
device_unlock(&port->dev);
return rc;
}
/**
* cxl_add_ep - register an endpoint's interest in a port
* @dport: the dport that routes to @ep_dev
* @ep_dev: device representing the endpoint
*
* Intermediate CXL ports are scanned based on the arrival of endpoints.
* When those endpoints depart the port can be destroyed once all
* endpoints that care about that port have been removed.
*/
static int cxl_add_ep(struct cxl_dport *dport, struct device *ep_dev)
{
struct cxl_ep *ep;
int rc;
ep = kzalloc(sizeof(*ep), GFP_KERNEL);
if (!ep)
return -ENOMEM;
ep->ep = get_device(ep_dev);
ep->dport = dport;
rc = add_ep(ep);
if (rc)
cxl_ep_release(ep);
return rc;
}
struct cxl_find_port_ctx {
const struct device *dport_dev;
const struct cxl_port *parent_port;
struct cxl_dport **dport;
};
static int match_port_by_dport(struct device *dev, const void *data)
{
const struct cxl_find_port_ctx *ctx = data;
struct cxl_dport *dport;
struct cxl_port *port;
if (!is_cxl_port(dev))
return 0;
if (ctx->parent_port && dev->parent != &ctx->parent_port->dev)
return 0;
port = to_cxl_port(dev);
dport = cxl_find_dport_by_dev(port, ctx->dport_dev);
if (ctx->dport)
*ctx->dport = dport;
return dport != NULL;
}
static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
{
struct device *dev;
if (!ctx->dport_dev)
return NULL;
dev = bus_find_device(&cxl_bus_type, NULL, ctx, match_port_by_dport);
if (dev)
return to_cxl_port(dev);
return NULL;
}
static struct cxl_port *find_cxl_port(struct device *dport_dev,
struct cxl_dport **dport)
{
struct cxl_find_port_ctx ctx = {
.dport_dev = dport_dev,
.dport = dport,
};
struct cxl_port *port;
port = __find_cxl_port(&ctx);
return port;
}
static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port,
struct device *dport_dev,
struct cxl_dport **dport)
{
struct cxl_find_port_ctx ctx = {
.dport_dev = dport_dev,
.parent_port = parent_port,
.dport = dport,
};
struct cxl_port *port;
port = __find_cxl_port(&ctx);
return port;
}
/*
* All users of grandparent() are using it to walk PCIe-like switch port
* hierarchy. A PCIe switch is comprised of a bridge device representing the
* upstream switch port and N bridges representing downstream switch ports. When
* bridges stack the grand-parent of a downstream switch port is another
* downstream switch port in the immediate ancestor switch.
*/
static struct device *grandparent(struct device *dev)
{
if (dev && dev->parent)
return dev->parent->parent;
return NULL;
}
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
static struct device *endpoint_host(struct cxl_port *endpoint)
{
struct cxl_port *port = to_cxl_port(endpoint->dev.parent);
if (is_cxl_root(port))
return port->uport_dev;
return &port->dev;
}
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
static void delete_endpoint(void *data)
{
struct cxl_memdev *cxlmd = data;
struct cxl_port *endpoint = cxlmd->endpoint;
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
struct device *host = endpoint_host(endpoint);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
device_lock(host);
if (host->driver && !endpoint->dead) {
devm_release_action(host, cxl_unlink_parent_dport, endpoint);
devm_release_action(host, cxl_unlink_uport, endpoint);
devm_release_action(host, unregister_port, endpoint);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
}
cxlmd->endpoint = NULL;
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
device_unlock(host);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
put_device(&endpoint->dev);
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
put_device(host);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
}
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
{
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
struct device *host = endpoint_host(endpoint);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
struct device *dev = &cxlmd->dev;
cxl/port: Fix delete_endpoint() vs parent unregistration race The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-28 03:13:23 +00:00
get_device(host);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
get_device(&endpoint->dev);
cxlmd->endpoint = endpoint;
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
cxlmd->depth = endpoint->depth;
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
/*
* The natural end of life of a non-root 'cxl_port' is when its parent port goes
* through a ->remove() event ("top-down" unregistration). The unnatural trigger
* for a port to be unregistered is when all memdevs beneath that port have gone
* through ->remove(). This "bottom-up" removal selectively removes individual
* child ports manually. This depends on devm_cxl_add_port() to not change is
* devm action registration order, and for dports to have already been
* destroyed by reap_dports().
*/
static void delete_switch_port(struct cxl_port *port)
{
devm_release_action(port->dev.parent, cxl_unlink_parent_dport, port);
devm_release_action(port->dev.parent, cxl_unlink_uport, port);
devm_release_action(port->dev.parent, unregister_port, port);
}
static void reap_dports(struct cxl_port *port)
{
struct cxl_dport *dport;
unsigned long index;
device_lock_assert(&port->dev);
xa_for_each(&port->dports, index, dport) {
devm_release_action(&port->dev, cxl_dport_unlink, dport);
devm_release_action(&port->dev, cxl_dport_remove, dport);
devm_kfree(&port->dev, dport);
}
}
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
struct detach_ctx {
struct cxl_memdev *cxlmd;
int depth;
};
static int port_has_memdev(struct device *dev, const void *data)
{
const struct detach_ctx *ctx = data;
struct cxl_port *port;
if (!is_cxl_port(dev))
return 0;
port = to_cxl_port(dev);
if (port->depth != ctx->depth)
return 0;
return !!cxl_ep_load(port, ctx->cxlmd);
}
static void cxl_detach_ep(void *data)
{
struct cxl_memdev *cxlmd = data;
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
for (int i = cxlmd->depth - 1; i >= 1; i--) {
struct cxl_port *port, *parent_port;
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
struct detach_ctx ctx = {
.cxlmd = cxlmd,
.depth = i,
};
struct device *dev;
struct cxl_ep *ep;
bool died = false;
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
dev = bus_find_device(&cxl_bus_type, NULL, &ctx,
port_has_memdev);
if (!dev)
continue;
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
port = to_cxl_port(dev);
parent_port = to_cxl_port(port->dev.parent);
device_lock(&parent_port->dev);
device_lock(&port->dev);
ep = cxl_ep_load(port, cxlmd);
dev_dbg(&cxlmd->dev, "disconnect %s from %s\n",
ep ? dev_name(ep->ep) : "", dev_name(&port->dev));
cxl_ep_remove(port, ep);
if (ep && !port->dead && xa_empty(&port->endpoints) &&
cxl/memdev: Fix endpoint port removal Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: <TASK> attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed given the port is pinned. That is, the two sources of cxl_ep_remove() are in cxl_detach_ep() and cxl_port_release(), and cxl_port_release() can not run if cxl_detach_ep() holds a reference. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:29:09 +00:00
!is_cxl_root(parent_port) && parent_port->dev.driver) {
/*
* This was the last ep attached to a dynamically
* enumerated port. Block new cxl_add_ep() and garbage
* collect the port.
*/
died = true;
port->dead = true;
reap_dports(port);
}
device_unlock(&port->dev);
if (died) {
dev_dbg(&cxlmd->dev, "delete %s\n",
dev_name(&port->dev));
delete_switch_port(port);
}
put_device(&port->dev);
device_unlock(&parent_port->dev);
}
}
static resource_size_t find_component_registers(struct device *dev)
{
struct cxl_register_map map;
struct pci_dev *pdev;
/*
* Theoretically, CXL component registers can be hosted on a
* non-PCI device, in practice, only cxl_test hits this case.
*/
if (!dev_is_pci(dev))
return CXL_RESOURCE_NONE;
pdev = to_pci_dev(dev);
cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
return map.resource;
}
static int add_port_attach_ep(struct cxl_memdev *cxlmd,
struct device *uport_dev,
struct device *dport_dev)
{
struct device *dparent = grandparent(dport_dev);
struct cxl_port *port, *parent_port = NULL;
struct cxl_dport *dport, *parent_dport;
resource_size_t component_reg_phys;
int rc;
if (!dparent) {
/*
* The iteration reached the topology root without finding the
* CXL-root 'cxl_port' on a previous iteration, fail for now to
* be re-probed after platform driver attaches.
*/
dev_dbg(&cxlmd->dev, "%s is a root dport\n",
dev_name(dport_dev));
return -ENXIO;
}
parent_port = find_cxl_port(dparent, &parent_dport);
if (!parent_port) {
/* iterate to create this parent_port */
return -EAGAIN;
}
device_lock(&parent_port->dev);
if (!parent_port->dev.driver) {
dev_warn(&cxlmd->dev,
"port %s:%s disabled, failed to enumerate CXL.mem\n",
dev_name(&parent_port->dev), dev_name(uport_dev));
port = ERR_PTR(-ENXIO);
goto out;
}
port = find_cxl_port_at(parent_port, dport_dev, &dport);
if (!port) {
component_reg_phys = find_component_registers(uport_dev);
port = devm_cxl_add_port(&parent_port->dev, uport_dev,
component_reg_phys, parent_dport);
/* retry find to pick up the new dport information */
if (!IS_ERR(port))
port = find_cxl_port_at(parent_port, dport_dev, &dport);
}
out:
device_unlock(&parent_port->dev);
if (IS_ERR(port))
rc = PTR_ERR(port);
else {
dev_dbg(&cxlmd->dev, "add to new port %s:%s\n",
dev_name(&port->dev), dev_name(port->uport_dev));
rc = cxl_add_ep(dport, &cxlmd->dev);
if (rc == -EBUSY) {
/*
* "can't" happen, but this error code means
* something to the caller, so translate it.
*/
rc = -ENXIO;
}
put_device(&port->dev);
}
put_device(&parent_port->dev);
return rc;
}
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
{
struct device *dev = &cxlmd->dev;
struct device *iter;
int rc;
cxl/port: Add RCD endpoint port enumeration Unlike a CXL memory expander in a VH topology that has at least one intervening 'struct cxl_port' instance between itself and the CXL root device, an RCD attaches one-level higher. For example: VH ┌──────────┐ │ ACPI0017 │ │ root0 │ └─────┬────┘ │ ┌─────┴────┐ │ dport0 │ ┌─────┤ ACPI0016 ├─────┐ │ │ port1 │ │ │ └────┬─────┘ │ │ │ │ ┌──┴───┐ ┌──┴───┐ ┌───┴──┐ │dport0│ │dport1│ │dport2│ │ RP0 │ │ RP1 │ │ RP2 │ └──────┘ └──┬───┘ └──────┘ │ ┌───┴─────┐ │endpoint0│ │ port2 │ └─────────┘ ...vs: RCH ┌──────────┐ │ ACPI0017 │ │ root0 │ └────┬─────┘ │ ┌───┴────┐ │ dport0 │ │ACPI0016│ └───┬────┘ │ ┌────┴─────┐ │endpoint0 │ │ port1 │ └──────────┘ So arrange for endpoint port in the RCH/RCD case to appear directly connected to the host-bridge in its singular role as a dport. Compare that to the VH case where the host-bridge serves a dual role as a 'cxl_dport' for the CXL root device *and* a 'cxl_port' upstream port for the Root Ports in the Root Complex that are modeled as 'cxl_dport' instances in the CXL topology. Another deviation from the VH case is that RCDs may need to look up their component registers from the Root Complex Register Block (RCRB). That platform firmware specified RCRB area is cached by the cxl_acpi driver and conveyed via the host-bridge dport to the cxl_mem driver to perform the cxl_rcrb_to_component() lookup for the endpoint port (See 9.11.8 CXL Devices Attached to an RCH for the lookup of the upstream port component registers). Tested-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993045621.1882361.1730100141527044744.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Camerom <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 21:34:16 +00:00
/*
* Skip intermediate port enumeration in the RCH case, there
* are no ports in between a host bridge and an endpoint.
*/
if (cxlmd->cxlds->rcd)
return 0;
rc = devm_add_action_or_reset(&cxlmd->dev, cxl_detach_ep, cxlmd);
if (rc)
return rc;
/*
* Scan for and add all cxl_ports in this device's ancestry.
* Repeat until no more ports are added. Abort if a port add
* attempt fails.
*/
retry:
for (iter = dev; iter; iter = grandparent(iter)) {
struct device *dport_dev = grandparent(iter);
struct device *uport_dev;
struct cxl_dport *dport;
struct cxl_port *port;
/*
* The terminal "grandparent" in PCI is NULL and @platform_bus
* for platform devices
*/
if (!dport_dev || dport_dev == &platform_bus)
return 0;
uport_dev = dport_dev->parent;
if (!uport_dev) {
dev_warn(dev, "at %s no parent for dport: %s\n",
dev_name(iter), dev_name(dport_dev));
return -ENXIO;
}
dev_dbg(dev, "scan: iter: %s dport_dev: %s parent: %s\n",
dev_name(iter), dev_name(dport_dev),
dev_name(uport_dev));
port = find_cxl_port(dport_dev, &dport);
if (port) {
dev_dbg(&cxlmd->dev,
"found already registered port %s:%s\n",
dev_name(&port->dev),
dev_name(port->uport_dev));
rc = cxl_add_ep(dport, &cxlmd->dev);
/*
* If the endpoint already exists in the port's list,
* that's ok, it was added on a previous pass.
* Otherwise, retry in add_port_attach_ep() after taking
* the parent_port lock as the current port may be being
* reaped.
*/
if (rc && rc != -EBUSY) {
put_device(&port->dev);
return rc;
}
/* Any more ports to add between this one and the root? */
if (!dev_is_cxl_root_child(&port->dev)) {
put_device(&port->dev);
continue;
}
put_device(&port->dev);
return 0;
}
rc = add_port_attach_ep(cxlmd, uport_dev, dport_dev);
/* port missing, try to add parent */
if (rc == -EAGAIN)
continue;
/* failed to add ep or port */
if (rc)
return rc;
/* port added, new descendants possible, start over */
goto retry;
}
return 0;
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_enumerate_ports, CXL);
struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
struct cxl_dport **dport)
{
return find_cxl_port(pdev->dev.parent, dport);
}
EXPORT_SYMBOL_NS_GPL(cxl_pci_find_port, CXL);
struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport)
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
{
return find_cxl_port(grandparent(&cxlmd->dev), dport);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_find_port, CXL);
static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd,
struct cxl_port *port, int *target_map)
{
int i;
if (!target_map)
return 0;
device_lock_assert(&port->dev);
if (xa_empty(&port->dports))
return -EINVAL;
guard(rwsem_write)(&cxl_region_rwsem);
cxl/port: Fix decoder initialization when nr_targets > interleave_ways The decoder_populate_targets() helper walks all of the targets in a port and makes sure they can be looked up in @target_map. Where @target_map is a lookup table from target position to target id (corresponding to a cxl_dport instance). However @target_map is only responsible for conveying the active dport instances as indicated by interleave_ways. When nr_targets > interleave_ways it results in decoder_populate_targets() walking off the end of the valid entries in @target_map. Given target_map is initialized to 0 it results in the dport lookup failing if position 0 is not mapped to a dport with an id of 0: cxl_port port3: Failed to populate active decoder targets cxl_port port3: Failed to add decoder cxl_port port3: Failed to add decoder3.0 cxl_bus_probe: cxl_port port3: probe: -6 This bug also highlights that when the decoder's ->targets[] array is written in cxl_port_setup_targets() it is missing a hold of the targets_lock to synchronize against sysfs readers of the target list. A fix for that is saved for a later patch. Fixes: a5c258021689 ("cxl/bus: Populate the target list at decoder create") Cc: <stable@vger.kernel.org> Signed-off-by: Huang, Ying <ying.huang@intel.com> [djbw: rewrite the changelog, find the Fixes: tag] Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-08 03:06:36 +00:00
for (i = 0; i < cxlsd->cxld.interleave_ways; i++) {
struct cxl_dport *dport = find_dport(port, target_map[i]);
if (!dport)
return -ENXIO;
cxlsd->target[i] = dport;
}
return 0;
}
struct cxl_dport *cxl_hb_modulo(struct cxl_root_decoder *cxlrd, int pos)
{
struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
struct cxl_decoder *cxld = &cxlsd->cxld;
int iw;
iw = cxld->interleave_ways;
if (dev_WARN_ONCE(&cxld->dev, iw != cxlsd->nr_targets,
"misconfigured root decoder\n"))
return NULL;
return cxlrd->cxlsd.target[pos % iw];
}
EXPORT_SYMBOL_NS_GPL(cxl_hb_modulo, CXL);
2022-04-21 15:33:13 +00:00
static struct lock_class_key cxl_decoder_key;
/**
* cxl_decoder_init - Common decoder setup / initialization
* @port: owning port of this decoder
* @cxld: common decoder properties to initialize
*
* A port may contain one or more decoders. Each of those decoders
* enable some address space for CXL.mem utilization. A decoder is
* expected to be configured by the caller before registering via
* cxl_decoder_add()
*/
static int cxl_decoder_init(struct cxl_port *port, struct cxl_decoder *cxld)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
{
struct device *dev;
int rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
rc = ida_alloc(&port->decoder_ida, GFP_KERNEL);
if (rc < 0)
return rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
/* need parent to stick around to release the id */
get_device(&port->dev);
cxld->id = rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
dev = &cxld->dev;
device_initialize(dev);
2022-04-21 15:33:13 +00:00
lockdep_set_class(&dev->mutex, &cxl_decoder_key);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
device_set_pm_not_required(dev);
dev->parent = &port->dev;
dev->bus = &cxl_bus_type;
/* Pre initialize an "empty" decoder */
cxld->interleave_ways = 1;
cxld->interleave_granularity = PAGE_SIZE;
cxld->target_type = CXL_DECODER_HOSTONLYMEM;
cxld->hpa_range = (struct range) {
.start = 0,
.end = -1,
};
return 0;
}
static int cxl_switch_decoder_init(struct cxl_port *port,
struct cxl_switch_decoder *cxlsd,
int nr_targets)
{
if (nr_targets > CXL_DECODER_MAX_INTERLEAVE)
return -EINVAL;
cxlsd->nr_targets = nr_targets;
return cxl_decoder_init(port, &cxlsd->cxld);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
}
/**
* cxl_root_decoder_alloc - Allocate a root level decoder
* @port: owning CXL root of this decoder
* @nr_targets: static number of downstream targets
* @calc_hb: which host bridge covers the n'th position by granularity
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
* 'CXL root' decoder is one that decodes from a top-level / static platform
* firmware description of CXL resources into a CXL standard decode
* topology.
*/
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets,
cxl_calc_hb_fn calc_hb)
{
struct cxl_root_decoder *cxlrd;
struct cxl_switch_decoder *cxlsd;
struct cxl_decoder *cxld;
int rc;
if (!is_cxl_root(port))
return ERR_PTR(-EINVAL);
cxlrd = kzalloc(struct_size(cxlrd, cxlsd.target, nr_targets),
GFP_KERNEL);
if (!cxlrd)
return ERR_PTR(-ENOMEM);
cxlsd = &cxlrd->cxlsd;
rc = cxl_switch_decoder_init(port, cxlsd, nr_targets);
if (rc) {
kfree(cxlrd);
return ERR_PTR(rc);
}
cxlrd->calc_hb = calc_hb;
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
mutex_init(&cxlrd->range_lock);
cxld = &cxlsd->cxld;
cxld->dev.type = &cxl_decoder_root_type;
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
/*
* cxl_root_decoder_release() special cases negative ids to
* detect memregion_alloc() failures.
*/
atomic_set(&cxlrd->region_id, -1);
rc = memregion_alloc(GFP_KERNEL);
if (rc < 0) {
put_device(&cxld->dev);
return ERR_PTR(rc);
}
atomic_set(&cxlrd->region_id, rc);
cxlrd->qos_class = CXL_QOS_CLASS_INVALID;
return cxlrd;
}
EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
/**
* cxl_switch_decoder_alloc - Allocate a switch level decoder
* @port: owning CXL switch port of this decoder
* @nr_targets: max number of dynamically addressable downstream targets
*
* Return: A new cxl decoder to be registered by cxl_decoder_add(). A
* 'switch' decoder is any decoder that can be enumerated by PCIe
* topology and the HDM Decoder Capability. This includes the decoders
* that sit between Switch Upstream Ports / Switch Downstream Ports and
* Host Bridges / Root Ports.
*/
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets)
{
struct cxl_switch_decoder *cxlsd;
struct cxl_decoder *cxld;
int rc;
if (is_cxl_root(port) || is_cxl_endpoint(port))
return ERR_PTR(-EINVAL);
cxlsd = kzalloc(struct_size(cxlsd, target, nr_targets), GFP_KERNEL);
if (!cxlsd)
return ERR_PTR(-ENOMEM);
rc = cxl_switch_decoder_init(port, cxlsd, nr_targets);
if (rc) {
kfree(cxlsd);
return ERR_PTR(rc);
}
cxld = &cxlsd->cxld;
cxld->dev.type = &cxl_decoder_switch_type;
return cxlsd;
}
EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
/**
* cxl_endpoint_decoder_alloc - Allocate an endpoint decoder
* @port: owning port of this decoder
*
* Return: A new cxl decoder to be registered by cxl_decoder_add()
*/
struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
{
struct cxl_endpoint_decoder *cxled;
struct cxl_decoder *cxld;
int rc;
if (!is_cxl_endpoint(port))
return ERR_PTR(-EINVAL);
cxled = kzalloc(sizeof(*cxled), GFP_KERNEL);
if (!cxled)
return ERR_PTR(-ENOMEM);
cxled->pos = -1;
cxld = &cxled->cxld;
rc = cxl_decoder_init(port, cxld);
if (rc) {
kfree(cxled);
return ERR_PTR(rc);
}
cxld->dev.type = &cxl_decoder_endpoint_type;
return cxled;
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL);
/**
* cxl_decoder_add_locked - Add a decoder with targets
* @cxld: The cxl decoder allocated by cxl_<type>_decoder_alloc()
* @target_map: A list of downstream ports that this decoder can direct memory
* traffic to. These numbers should correspond with the port number
* in the PCIe Link Capabilities structure.
*
* Certain types of decoders may not have any targets. The main example of this
* is an endpoint device. A more awkward example is a hostbridge whose root
* ports get hot added (technically possible, though unlikely).
*
* This is the locked variant of cxl_decoder_add().
*
* Context: Process context. Expects the device lock of the port that owns the
* @cxld to be held.
*
* Return: Negative error code if the decoder wasn't properly configured; else
* returns 0.
*/
int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
{
struct cxl_port *port;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct device *dev;
int rc;
if (WARN_ON_ONCE(!cxld))
return -EINVAL;
if (WARN_ON_ONCE(IS_ERR(cxld)))
return PTR_ERR(cxld);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
if (cxld->interleave_ways < 1)
return -EINVAL;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
dev = &cxld->dev;
port = to_cxl_port(cxld->dev.parent);
if (!is_endpoint_decoder(dev)) {
struct cxl_switch_decoder *cxlsd = to_cxl_switch_decoder(dev);
rc = decoder_populate_targets(cxlsd, port, target_map);
if (rc && (cxld->flags & CXL_DECODER_F_ENABLE)) {
dev_err(&port->dev,
"Failed to populate active decoder targets\n");
return rc;
}
}
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
rc = dev_set_name(dev, "decoder%d.%d", port->id, cxld->id);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
if (rc)
return rc;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
return device_add(dev);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
}
EXPORT_SYMBOL_NS_GPL(cxl_decoder_add_locked, CXL);
/**
* cxl_decoder_add - Add a decoder with targets
* @cxld: The cxl decoder allocated by cxl_<type>_decoder_alloc()
* @target_map: A list of downstream ports that this decoder can direct memory
* traffic to. These numbers should correspond with the port number
* in the PCIe Link Capabilities structure.
*
* This is the unlocked variant of cxl_decoder_add_locked().
* See cxl_decoder_add_locked().
*
* Context: Process context. Takes and releases the device lock of the port that
* owns the @cxld.
*/
int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map)
{
struct cxl_port *port;
int rc;
if (WARN_ON_ONCE(!cxld))
return -EINVAL;
if (WARN_ON_ONCE(IS_ERR(cxld)))
return PTR_ERR(cxld);
port = to_cxl_port(cxld->dev.parent);
device_lock(&port->dev);
rc = cxl_decoder_add_locked(cxld, target_map);
device_unlock(&port->dev);
return rc;
}
EXPORT_SYMBOL_NS_GPL(cxl_decoder_add, CXL);
static void cxld_unregister(void *dev)
{
struct cxl_endpoint_decoder *cxled;
if (is_endpoint_decoder(dev)) {
cxled = to_cxl_endpoint_decoder(dev);
cxl_decoder_kill_region(cxled);
}
device_unregister(dev);
}
int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld)
{
return devm_add_action_or_reset(host, cxld_unregister, &cxld->dev);
}
EXPORT_SYMBOL_NS_GPL(cxl_decoder_autoremove, CXL);
/**
* __cxl_driver_register - register a driver for the cxl bus
* @cxl_drv: cxl driver structure to attach
* @owner: owning module/driver
* @modname: KBUILD_MODNAME for parent driver
*/
int __cxl_driver_register(struct cxl_driver *cxl_drv, struct module *owner,
const char *modname)
{
if (!cxl_drv->probe) {
pr_debug("%s ->probe() must be specified\n", modname);
return -EINVAL;
}
if (!cxl_drv->name) {
pr_debug("%s ->name must be specified\n", modname);
return -EINVAL;
}
if (!cxl_drv->id) {
pr_debug("%s ->id must be specified\n", modname);
return -EINVAL;
}
cxl_drv->drv.bus = &cxl_bus_type;
cxl_drv->drv.owner = owner;
cxl_drv->drv.mod_name = modname;
cxl_drv->drv.name = cxl_drv->name;
return driver_register(&cxl_drv->drv);
}
EXPORT_SYMBOL_NS_GPL(__cxl_driver_register, CXL);
void cxl_driver_unregister(struct cxl_driver *cxl_drv)
{
driver_unregister(&cxl_drv->drv);
}
EXPORT_SYMBOL_NS_GPL(cxl_driver_unregister, CXL);
static int cxl_bus_uevent(const struct device *dev, struct kobj_uevent_env *env)
{
return add_uevent_var(env, "MODALIAS=" CXL_MODALIAS_FMT,
cxl_device_id(dev));
}
static int cxl_bus_match(struct device *dev, struct device_driver *drv)
{
return cxl_device_id(dev) == to_cxl_drv(drv)->id;
}
static int cxl_bus_probe(struct device *dev)
{
int rc;
rc = to_cxl_drv(dev->driver)->probe(dev);
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
dev_dbg(dev, "probe: %d\n", rc);
return rc;
}
bus: Make remove callback return void The driver core ignores the return value of this callback because there is only little it can do when a device disappears. This is the final bit of a long lasting cleanup quest where several buses were converted to also return void from their remove callback. Additionally some resource leaks were fixed that were caused by drivers returning an error code in the expectation that the driver won't go away. With struct bus_type::remove returning void it's prevented that newly implemented buses return an ignored error code and so don't anticipate wrong expectations for driver authors. Reviewed-by: Tom Rix <trix@redhat.com> (For fpga) Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> (For drivers/s390 and drivers/vfio) Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> (For ARM, Amba and related parts) Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Chen-Yu Tsai <wens@csie.org> (for sunxi-rsb) Acked-by: Pali Rohár <pali@kernel.org> Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> (for media) Acked-by: Hans de Goede <hdegoede@redhat.com> (For drivers/platform) Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-By: Vinod Koul <vkoul@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> (For xen) Acked-by: Lee Jones <lee.jones@linaro.org> (For mfd) Acked-by: Johannes Thumshirn <jth@kernel.org> (For mcb) Acked-by: Johan Hovold <johan@kernel.org> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> (For slimbus) Acked-by: Kirti Wankhede <kwankhede@nvidia.com> (For vfio) Acked-by: Maximilian Luz <luzmaximilian@gmail.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> (For ulpi and typec) Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (For ipack) Acked-by: Geoff Levand <geoff@infradead.org> (For ps3) Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com> (For thunderbolt) Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> (For intel_th) Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> (For pcmcia) Acked-by: Rafael J. Wysocki <rafael@kernel.org> (For ACPI) Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> (rpmsg and apr) Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> (For intel-ish-hid) Acked-by: Dan Williams <dan.j.williams@intel.com> (For CXL, DAX, and NVDIMM) Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> (For isa) Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (For firewire) Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> (For hid) Acked-by: Thorsten Scherer <t.scherer@eckelmann.de> (For siox) Acked-by: Sven Van Asbroeck <TheSven73@gmail.com> (For anybuss) Acked-by: Ulf Hansson <ulf.hansson@linaro.org> (For MMC) Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210713193522.1770306-6-u.kleine-koenig@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-13 19:35:22 +00:00
static void cxl_bus_remove(struct device *dev)
{
struct cxl_driver *cxl_drv = to_cxl_drv(dev->driver);
if (cxl_drv->remove)
cxl_drv->remove(dev);
}
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
static struct workqueue_struct *cxl_bus_wq;
static void cxl_bus_rescan_queue(struct work_struct *w)
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
{
int rc = bus_rescan_devices(&cxl_bus_type);
pr_debug("CXL bus rescan result: %d\n", rc);
}
void cxl_bus_rescan(void)
{
static DECLARE_WORK(rescan_work, cxl_bus_rescan_queue);
queue_work(cxl_bus_wq, &rescan_work);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
}
EXPORT_SYMBOL_NS_GPL(cxl_bus_rescan, CXL);
void cxl_bus_drain(void)
{
drain_workqueue(cxl_bus_wq);
}
EXPORT_SYMBOL_NS_GPL(cxl_bus_drain, CXL);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
{
return queue_work(cxl_bus_wq, &cxlmd->detach_work);
}
EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
/**
* cxl_hb_get_perf_coordinates - Retrieve performance numbers between initiator
* and host bridge
*
* @port: endpoint cxl_port
* @coord: output access coordinates
*
* Return: errno on failure, 0 on success.
*/
int cxl_hb_get_perf_coordinates(struct cxl_port *port,
struct access_coordinate *coord)
{
struct cxl_port *iter = port;
struct cxl_dport *dport;
if (!is_cxl_endpoint(port))
return -EINVAL;
dport = iter->parent_dport;
while (iter && !is_cxl_root(to_cxl_port(iter->dev.parent))) {
iter = to_cxl_port(iter->dev.parent);
dport = iter->parent_dport;
}
coord[ACCESS_COORDINATE_LOCAL] =
dport->hb_coord[ACCESS_COORDINATE_LOCAL];
coord[ACCESS_COORDINATE_CPU] =
dport->hb_coord[ACCESS_COORDINATE_CPU];
return 0;
}
static bool parent_port_is_cxl_root(struct cxl_port *port)
{
return is_cxl_root(to_cxl_port(port->dev.parent));
}
/**
* cxl_endpoint_get_perf_coordinates - Retrieve performance numbers stored in dports
* of CXL path
* @port: endpoint cxl_port
* @coord: output performance data
*
* Return: errno on failure, 0 on success.
*/
int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
struct access_coordinate *coord)
{
struct access_coordinate c = {
.read_bandwidth = UINT_MAX,
.write_bandwidth = UINT_MAX,
};
struct cxl_port *iter = port;
struct cxl_dport *dport;
struct pci_dev *pdev;
unsigned int bw;
bool is_cxl_root;
if (!is_cxl_endpoint(port))
return -EINVAL;
/*
* Exit the loop when the parent port of the current iter port is cxl
* root. The iterative loop starts at the endpoint and gathers the
* latency of the CXL link from the current device/port to the connected
* downstream port each iteration.
*/
do {
dport = iter->parent_dport;
iter = to_cxl_port(iter->dev.parent);
is_cxl_root = parent_port_is_cxl_root(iter);
/*
* There's no valid access_coordinate for a root port since RPs do not
* have CDAT and therefore needs to be skipped.
*/
if (!is_cxl_root)
cxl_coordinates_combine(&c, &c, &dport->sw_coord);
c.write_latency += dport->link_latency;
c.read_latency += dport->link_latency;
} while (!is_cxl_root);
/* Get the calculated PCI paths bandwidth */
pdev = to_pci_dev(port->uport_dev->parent);
bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL);
if (bw == 0)
return -ENXIO;
bw /= BITS_PER_BYTE;
c.write_bandwidth = min(c.write_bandwidth, bw);
c.read_bandwidth = min(c.read_bandwidth, bw);
*coord = c;
return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_get_perf_coordinates, CXL);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
/* for user tooling to ensure port disable work has completed */
driver core: bus: mark the struct bus_type for sysfs callbacks as constant struct bus_type should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct bus_type to be moved to read-only memory. Cc: "David S. Miller" <davem@davemloft.net> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hu Haowen <src.res@email.cn> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl Reviewed-by: Alex Shi <alexs@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-13 18:29:05 +00:00
static ssize_t flush_store(const struct bus_type *bus, const char *buf, size_t count)
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
{
if (sysfs_streq(buf, "1")) {
flush_workqueue(cxl_bus_wq);
return count;
}
return -EINVAL;
}
static BUS_ATTR_WO(flush);
static struct attribute *cxl_bus_attributes[] = {
&bus_attr_flush.attr,
NULL,
};
static struct attribute_group cxl_bus_attribute_group = {
.attrs = cxl_bus_attributes,
};
static const struct attribute_group *cxl_bus_attribute_groups[] = {
&cxl_bus_attribute_group,
NULL,
};
struct bus_type cxl_bus_type = {
.name = "cxl",
.uevent = cxl_bus_uevent,
.match = cxl_bus_match,
.probe = cxl_bus_probe,
.remove = cxl_bus_remove,
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
.bus_groups = cxl_bus_attribute_groups,
};
EXPORT_SYMBOL_NS_GPL(cxl_bus_type, CXL);
static struct dentry *cxl_debugfs;
struct dentry *cxl_debugfs_create_dir(const char *dir)
{
return debugfs_create_dir(dir, cxl_debugfs);
}
EXPORT_SYMBOL_NS_GPL(cxl_debugfs_create_dir, CXL);
static __init int cxl_core_init(void)
{
int rc;
cxl_debugfs = debugfs_create_dir("cxl", NULL);
cxl_mbox_init();
rc = cxl_memdev_init();
if (rc)
return rc;
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
if (!cxl_bus_wq) {
rc = -ENOMEM;
goto err_wq;
}
rc = bus_register(&cxl_bus_type);
if (rc)
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
goto err_bus;
rc = cxl_region_init();
if (rc)
goto err_region;
return 0;
err_region:
bus_unregister(&cxl_bus_type);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
err_bus:
destroy_workqueue(cxl_bus_wq);
err_wq:
cxl_memdev_exit();
return rc;
}
static void cxl_core_exit(void)
{
cxl_region_exit();
bus_unregister(&cxl_bus_type);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
destroy_workqueue(cxl_bus_wq);
cxl_memdev_exit();
debugfs_remove_recursive(cxl_debugfs);
}
subsys_initcall(cxl_core_init);
module_exit(cxl_core_exit);
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);