linux-stable/drivers/pci/pci-sysfs.c

1545 lines
38 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
// SPDX-License-Identifier: GPL-2.0
/*
* (C) Copyright 2002-2004 Greg Kroah-Hartman <greg@kroah.com>
* (C) Copyright 2002-2004 IBM Corp.
* (C) Copyright 2003 Matthew Wilcox
* (C) Copyright 2003 Hewlett-Packard
* (C) Copyright 2004 Jon Smirl <jonsmirl@yahoo.com>
* (C) Copyright 2004 Silicon Graphics, Inc. Jesse Barnes <jbarnes@sgi.com>
*
* File attributes for PCI devices
*
* Modeled after usb's driverfs.c
*/
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/pci.h>
#include <linux/stat.h>
#include <linux/export.h>
#include <linux/topology.h>
#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/capability.h>
#include <linux/security.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 08:04:11 +00:00
#include <linux/slab.h>
#include <linux/vgaarb.h>
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
#include <linux/pm_runtime.h>
#include <linux/of.h>
#include "pci.h"
static int sysfs_initialized; /* = 0 */
/* show configuration fields */
#define pci_config_attr(field, format_string) \
static ssize_t \
field##_show(struct device *dev, struct device_attribute *attr, char *buf) \
{ \
struct pci_dev *pdev; \
\
pdev = to_pci_dev(dev); \
return sysfs_emit(buf, format_string, pdev->field); \
} \
static DEVICE_ATTR_RO(field)
pci_config_attr(vendor, "0x%04x\n");
pci_config_attr(device, "0x%04x\n");
pci_config_attr(subsystem_vendor, "0x%04x\n");
pci_config_attr(subsystem_device, "0x%04x\n");
pci_config_attr(revision, "0x%02x\n");
pci_config_attr(class, "0x%06x\n");
pci_config_attr(irq, "%u\n");
static ssize_t broken_parity_status_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%u\n", pdev->broken_parity_status);
}
static ssize_t broken_parity_status_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
unsigned long val;
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
pdev->broken_parity_status = !!val;
return count;
}
static DEVICE_ATTR_RW(broken_parity_status);
static ssize_t pci_dev_show_local_cpu(struct device *dev, bool list,
struct device_attribute *attr, char *buf)
{
const struct cpumask *mask;
#ifdef CONFIG_NUMA
mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
cpumask_of_node(dev_to_node(dev));
#else
mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
#endif
return cpumap_print_to_pagebuf(list, buf, mask);
}
static ssize_t local_cpus_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return pci_dev_show_local_cpu(dev, false, attr, buf);
}
static DEVICE_ATTR_RO(local_cpus);
static ssize_t local_cpulist_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return pci_dev_show_local_cpu(dev, true, attr, buf);
}
static DEVICE_ATTR_RO(local_cpulist);
/*
* PCI Bus Class Devices
*/
static ssize_t cpuaffinity_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
const struct cpumask *cpumask = cpumask_of_pcibus(to_pci_bus(dev));
return cpumap_print_to_pagebuf(false, buf, cpumask);
}
static DEVICE_ATTR_RO(cpuaffinity);
static ssize_t cpulistaffinity_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
const struct cpumask *cpumask = cpumask_of_pcibus(to_pci_bus(dev));
return cpumap_print_to_pagebuf(true, buf, cpumask);
}
static DEVICE_ATTR_RO(cpulistaffinity);
static ssize_t power_state_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%s\n", pci_power_name(pdev->current_state));
}
static DEVICE_ATTR_RO(power_state);
/* show resources */
static ssize_t resource_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
int i;
int max;
resource_size_t start, end;
size_t len = 0;
if (pci_dev->subordinate)
max = DEVICE_COUNT_RESOURCE;
else
max = PCI_BRIDGE_RESOURCES;
for (i = 0; i < max; i++) {
struct resource *res = &pci_dev->resource[i];
pci_resource_to_user(pci_dev, i, res, &start, &end);
len += sysfs_emit_at(buf, len, "0x%016llx 0x%016llx 0x%016llx\n",
(unsigned long long)start,
(unsigned long long)end,
(unsigned long long)res->flags);
}
return len;
}
static DEVICE_ATTR_RO(resource);
static ssize_t max_link_speed_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%s\n",
pci_speed_string(pcie_get_speed_cap(pdev)));
}
static DEVICE_ATTR_RO(max_link_speed);
static ssize_t max_link_width_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%u\n", pcie_get_width_cap(pdev));
}
static DEVICE_ATTR_RO(max_link_width);
static ssize_t current_link_speed_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
u16 linkstat;
int err;
PCI: Use pci_speed_string() for all PCI/PCI-X/PCIe strings Previously some PCI speed strings came from pci_speed_string(), some came from the PCIe-specific PCIE_SPEED2STR(), and some came from a PCIe-specific switch statement. These methods were inconsistent: pci_speed_string() PCIE_SPEED2STR() switch ------------------ ---------------- ------ 33 MHz PCI ... 2.5 GT/s PCIe 2.5 GT/s 2.5 GT/s 5.0 GT/s PCIe 5 GT/s 5 GT/s 8.0 GT/s PCIe 8 GT/s 8 GT/s 16.0 GT/s PCIe 16 GT/s 16 GT/s 32.0 GT/s PCIe 32 GT/s 32 GT/s Standardize on pci_speed_string() as the single source of these strings. Note that this adds ".0" and "PCIe" to some messages, including sysfs "max_link_speed" files, a brcmstb "link up" message, and the link status dmesg logging, e.g., nvme 0000:01:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link) I think it's better to standardize on a single version of the speed text. Previously we had strings like this: /sys/bus/pci/slots/0/cur_bus_speed: 8.0 GT/s PCIe /sys/bus/pci/slots/0/max_bus_speed: 8.0 GT/s PCIe /sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8 GT/s /sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8 GT/s This changes the latter two to match the slots files: /sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8.0 GT/s PCIe /sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8.0 GT/s PCIe Based-on-patch by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-02-28 21:24:52 +00:00
enum pci_bus_speed speed;
err = pcie_capability_read_word(pci_dev, PCI_EXP_LNKSTA, &linkstat);
if (err)
return -EINVAL;
PCI: Use pci_speed_string() for all PCI/PCI-X/PCIe strings Previously some PCI speed strings came from pci_speed_string(), some came from the PCIe-specific PCIE_SPEED2STR(), and some came from a PCIe-specific switch statement. These methods were inconsistent: pci_speed_string() PCIE_SPEED2STR() switch ------------------ ---------------- ------ 33 MHz PCI ... 2.5 GT/s PCIe 2.5 GT/s 2.5 GT/s 5.0 GT/s PCIe 5 GT/s 5 GT/s 8.0 GT/s PCIe 8 GT/s 8 GT/s 16.0 GT/s PCIe 16 GT/s 16 GT/s 32.0 GT/s PCIe 32 GT/s 32 GT/s Standardize on pci_speed_string() as the single source of these strings. Note that this adds ".0" and "PCIe" to some messages, including sysfs "max_link_speed" files, a brcmstb "link up" message, and the link status dmesg logging, e.g., nvme 0000:01:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:01.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link) I think it's better to standardize on a single version of the speed text. Previously we had strings like this: /sys/bus/pci/slots/0/cur_bus_speed: 8.0 GT/s PCIe /sys/bus/pci/slots/0/max_bus_speed: 8.0 GT/s PCIe /sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8 GT/s /sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8 GT/s This changes the latter two to match the slots files: /sys/devices/pci0000:00/0000:00:1c.0/current_link_speed: 8.0 GT/s PCIe /sys/devices/pci0000:00/0000:00:1c.0/max_link_speed: 8.0 GT/s PCIe Based-on-patch by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-02-28 21:24:52 +00:00
speed = pcie_link_speed[linkstat & PCI_EXP_LNKSTA_CLS];
return sysfs_emit(buf, "%s\n", pci_speed_string(speed));
}
static DEVICE_ATTR_RO(current_link_speed);
static ssize_t current_link_width_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
u16 linkstat;
int err;
err = pcie_capability_read_word(pci_dev, PCI_EXP_LNKSTA, &linkstat);
if (err)
return -EINVAL;
return sysfs_emit(buf, "%u\n",
(linkstat & PCI_EXP_LNKSTA_NLW) >> PCI_EXP_LNKSTA_NLW_SHIFT);
}
static DEVICE_ATTR_RO(current_link_width);
static ssize_t secondary_bus_number_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
u8 sec_bus;
int err;
err = pci_read_config_byte(pci_dev, PCI_SECONDARY_BUS, &sec_bus);
if (err)
return -EINVAL;
return sysfs_emit(buf, "%u\n", sec_bus);
}
static DEVICE_ATTR_RO(secondary_bus_number);
static ssize_t subordinate_bus_number_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
u8 sub_bus;
int err;
err = pci_read_config_byte(pci_dev, PCI_SUBORDINATE_BUS, &sub_bus);
if (err)
return -EINVAL;
return sysfs_emit(buf, "%u\n", sub_bus);
}
static DEVICE_ATTR_RO(subordinate_bus_number);
static ssize_t ari_enabled_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
return sysfs_emit(buf, "%u\n", pci_ari_enabled(pci_dev->bus));
}
static DEVICE_ATTR_RO(ari_enabled);
static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
return sysfs_emit(buf, "pci:v%08Xd%08Xsv%08Xsd%08Xbc%02Xsc%02Xi%02X\n",
pci_dev->vendor, pci_dev->device,
pci_dev->subsystem_vendor, pci_dev->subsystem_device,
(u8)(pci_dev->class >> 16), (u8)(pci_dev->class >> 8),
(u8)(pci_dev->class));
}
static DEVICE_ATTR_RO(modalias);
PCI: switch pci_{enable,disable}_device() to be nestable Changes the pci_{enable,disable}_device() functions to work in a nested basis, so that eg, three calls to enable_device() require three calls to disable_device(). The reason for this is to simplify PCI drivers for multi-interface/capability devices. These are devices that cram more than one interface in a single function. A relevant example of that is the Wireless [USB] Host Controller Interface (similar to EHCI) [see http://www.intel.com/technology/comms/wusb/whci.htm]. In these kind of devices, multiple interfaces are accessed through a single bar and IRQ line. For that, the drivers map only the smallest area of the bar to access their register banks and use shared IRQ handlers. However, because the order at which those drivers load cannot be known ahead of time, the sequence in which the calls to pci_enable_device() and pci_disable_device() cannot be predicted. Thus: 1. driverA starts pci_enable_device() 2. driverB starts pci_enable_device() 3. driverA shutdown pci_disable_device() 4. driverB shutdown pci_disable_device() between steps 3 and 4, driver B would loose access to it's device, even if it didn't intend to. By using this modification, the device won't be disabled until all the callers to enable() have called disable(). This is implemented by replacing 'struct pci_dev->is_enabled' from a bitfield to an atomic use count. Each caller to enable increments it, each caller to disable decrements it. When the count increments from 0 to 1, __pci_enable_device() is called to actually enable the device. When it drops to zero, pci_disable_device() actually does the disabling. We keep the backend __pci_enable_device() for pci_default_resume() to use and also change the sysfs method implementation, so that userspace enabling/disabling the device doesn't disable it one time too much. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-22 20:40:31 +00:00
static ssize_t enable_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
unsigned long val;
ssize_t result = kstrtoul(buf, 0, &val);
if (result < 0)
return result;
/* this can crash the machine when done on the "wrong" device */
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
device_lock(dev);
if (dev->driver)
result = -EBUSY;
else if (val)
PCI: switch pci_{enable,disable}_device() to be nestable Changes the pci_{enable,disable}_device() functions to work in a nested basis, so that eg, three calls to enable_device() require three calls to disable_device(). The reason for this is to simplify PCI drivers for multi-interface/capability devices. These are devices that cram more than one interface in a single function. A relevant example of that is the Wireless [USB] Host Controller Interface (similar to EHCI) [see http://www.intel.com/technology/comms/wusb/whci.htm]. In these kind of devices, multiple interfaces are accessed through a single bar and IRQ line. For that, the drivers map only the smallest area of the bar to access their register banks and use shared IRQ handlers. However, because the order at which those drivers load cannot be known ahead of time, the sequence in which the calls to pci_enable_device() and pci_disable_device() cannot be predicted. Thus: 1. driverA starts pci_enable_device() 2. driverB starts pci_enable_device() 3. driverA shutdown pci_disable_device() 4. driverB shutdown pci_disable_device() between steps 3 and 4, driver B would loose access to it's device, even if it didn't intend to. By using this modification, the device won't be disabled until all the callers to enable() have called disable(). This is implemented by replacing 'struct pci_dev->is_enabled' from a bitfield to an atomic use count. Each caller to enable increments it, each caller to disable decrements it. When the count increments from 0 to 1, __pci_enable_device() is called to actually enable the device. When it drops to zero, pci_disable_device() actually does the disabling. We keep the backend __pci_enable_device() for pci_default_resume() to use and also change the sysfs method implementation, so that userspace enabling/disabling the device doesn't disable it one time too much. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-22 20:40:31 +00:00
result = pci_enable_device(pdev);
else if (pci_is_enabled(pdev))
pci_disable_device(pdev);
else
result = -EIO;
device_unlock(dev);
PCI: switch pci_{enable,disable}_device() to be nestable Changes the pci_{enable,disable}_device() functions to work in a nested basis, so that eg, three calls to enable_device() require three calls to disable_device(). The reason for this is to simplify PCI drivers for multi-interface/capability devices. These are devices that cram more than one interface in a single function. A relevant example of that is the Wireless [USB] Host Controller Interface (similar to EHCI) [see http://www.intel.com/technology/comms/wusb/whci.htm]. In these kind of devices, multiple interfaces are accessed through a single bar and IRQ line. For that, the drivers map only the smallest area of the bar to access their register banks and use shared IRQ handlers. However, because the order at which those drivers load cannot be known ahead of time, the sequence in which the calls to pci_enable_device() and pci_disable_device() cannot be predicted. Thus: 1. driverA starts pci_enable_device() 2. driverB starts pci_enable_device() 3. driverA shutdown pci_disable_device() 4. driverB shutdown pci_disable_device() between steps 3 and 4, driver B would loose access to it's device, even if it didn't intend to. By using this modification, the device won't be disabled until all the callers to enable() have called disable(). This is implemented by replacing 'struct pci_dev->is_enabled' from a bitfield to an atomic use count. Each caller to enable increments it, each caller to disable decrements it. When the count increments from 0 to 1, __pci_enable_device() is called to actually enable the device. When it drops to zero, pci_disable_device() actually does the disabling. We keep the backend __pci_enable_device() for pci_default_resume() to use and also change the sysfs method implementation, so that userspace enabling/disabling the device doesn't disable it one time too much. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-22 20:40:31 +00:00
return result < 0 ? result : count;
}
static ssize_t enable_show(struct device *dev, struct device_attribute *attr,
char *buf)
PCI: switch pci_{enable,disable}_device() to be nestable Changes the pci_{enable,disable}_device() functions to work in a nested basis, so that eg, three calls to enable_device() require three calls to disable_device(). The reason for this is to simplify PCI drivers for multi-interface/capability devices. These are devices that cram more than one interface in a single function. A relevant example of that is the Wireless [USB] Host Controller Interface (similar to EHCI) [see http://www.intel.com/technology/comms/wusb/whci.htm]. In these kind of devices, multiple interfaces are accessed through a single bar and IRQ line. For that, the drivers map only the smallest area of the bar to access their register banks and use shared IRQ handlers. However, because the order at which those drivers load cannot be known ahead of time, the sequence in which the calls to pci_enable_device() and pci_disable_device() cannot be predicted. Thus: 1. driverA starts pci_enable_device() 2. driverB starts pci_enable_device() 3. driverA shutdown pci_disable_device() 4. driverB shutdown pci_disable_device() between steps 3 and 4, driver B would loose access to it's device, even if it didn't intend to. By using this modification, the device won't be disabled until all the callers to enable() have called disable(). This is implemented by replacing 'struct pci_dev->is_enabled' from a bitfield to an atomic use count. Each caller to enable increments it, each caller to disable decrements it. When the count increments from 0 to 1, __pci_enable_device() is called to actually enable the device. When it drops to zero, pci_disable_device() actually does the disabling. We keep the backend __pci_enable_device() for pci_default_resume() to use and also change the sysfs method implementation, so that userspace enabling/disabling the device doesn't disable it one time too much. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-11-22 20:40:31 +00:00
{
struct pci_dev *pdev;
pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%u\n", atomic_read(&pdev->enable_cnt));
}
static DEVICE_ATTR_RW(enable);
#ifdef CONFIG_NUMA
static ssize_t numa_node_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
int node, ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
ret = kstrtoint(buf, 0, &node);
if (ret)
return ret;
if ((node < 0 && node != NUMA_NO_NODE) || node >= MAX_NUMNODES)
return -EINVAL;
if (node != NUMA_NO_NODE && !node_online(node))
return -EINVAL;
add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
pci_alert(pdev, FW_BUG "Overriding NUMA node to %d. Contact your vendor for updates.",
node);
dev->numa_node = node;
return count;
}
static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
return sysfs_emit(buf, "%d\n", dev->numa_node);
}
static DEVICE_ATTR_RW(numa_node);
#endif
static ssize_t dma_mask_bits_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%d\n", fls64(pdev->dma_mask));
}
static DEVICE_ATTR_RO(dma_mask_bits);
static ssize_t consistent_dma_mask_bits_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
return sysfs_emit(buf, "%d\n", fls64(dev->coherent_dma_mask));
}
static DEVICE_ATTR_RO(consistent_dma_mask_bits);
static ssize_t msi_bus_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_bus *subordinate = pdev->subordinate;
return sysfs_emit(buf, "%u\n", subordinate ?
!(subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
: !pdev->no_msi);
}
static ssize_t msi_bus_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_bus *subordinate = pdev->subordinate;
unsigned long val;
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
/*
* "no_msi" and "bus_flags" only affect what happens when a driver
* requests MSI or MSI-X. They don't affect any drivers that have
* already requested MSI or MSI-X.
*/
if (!subordinate) {
pdev->no_msi = !val;
pci_info(pdev, "MSI/MSI-X %s for future drivers\n",
val ? "allowed" : "disallowed");
return count;
}
if (val)
subordinate->bus_flags &= ~PCI_BUS_FLAGS_NO_MSI;
else
subordinate->bus_flags |= PCI_BUS_FLAGS_NO_MSI;
dev_info(&subordinate->dev, "MSI/MSI-X %s for future drivers of devices on this bus\n",
val ? "allowed" : "disallowed");
return count;
}
static DEVICE_ATTR_RW(msi_bus);
static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count)
{
unsigned long val;
struct pci_bus *b = NULL;
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
if (val) {
pci_lock_rescan_remove();
while ((b = pci_find_next_bus(b)) != NULL)
pci_rescan_bus(b);
pci_unlock_rescan_remove();
}
return count;
}
static BUS_ATTR_WO(rescan);
static struct attribute *pci_bus_attrs[] = {
&bus_attr_rescan.attr,
NULL,
};
static const struct attribute_group pci_bus_group = {
.attrs = pci_bus_attrs,
};
const struct attribute_group *pci_bus_groups[] = {
&pci_bus_group,
NULL,
};
static ssize_t dev_rescan_store(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
{
unsigned long val;
struct pci_dev *pdev = to_pci_dev(dev);
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
if (val) {
pci_lock_rescan_remove();
pci_rescan_bus(pdev->bus);
pci_unlock_rescan_remove();
}
return count;
}
static struct device_attribute dev_attr_dev_rescan = __ATTR(rescan, 0200, NULL,
dev_rescan_store);
static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
unsigned long val;
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
if (val && device_remove_file_self(dev, attr))
pci_stop_and_remove_bus_device_locked(to_pci_dev(dev));
return count;
}
static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0220, NULL,
remove_store);
static ssize_t bus_rescan_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
unsigned long val;
struct pci_bus *bus = to_pci_bus(dev);
if (kstrtoul(buf, 0, &val) < 0)
return -EINVAL;
if (val) {
pci_lock_rescan_remove();
if (!pci_is_root_bus(bus) && list_empty(&bus->devices))
pci_rescan_bus_bridge_resize(bus->self);
else
pci_rescan_bus(bus);
pci_unlock_rescan_remove();
}
return count;
}
static struct device_attribute dev_attr_bus_rescan = __ATTR(rescan, 0200, NULL,
bus_rescan_store);
#if defined(CONFIG_PM) && defined(CONFIG_ACPI)
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
static ssize_t d3cold_allowed_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
unsigned long val;
if (kstrtoul(buf, 0, &val) < 0)
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
return -EINVAL;
pdev->d3cold_allowed = !!val;
PCI: Put PCIe ports into D3 during suspend Currently the Linux PCI core does not touch power state of PCI bridges and PCIe ports when system suspend is entered. Leaving them in D0 consumes power unnecessarily and may prevent the CPU from entering deeper C-states. With recent PCIe hardware we can power down the ports to save power given that we take into account few restrictions: - The PCIe port hardware is recent enough, starting from 2015. - Devices connected to PCIe ports are effectively in D3cold once the port is transitioned to D3 (the config space is not accessible anymore and the link may be powered down). - Devices behind the PCIe port need to be allowed to transition to D3cold and back. There is a way both drivers and userspace can forbid this. - If the device behind the PCIe port is capable of waking the system it needs to be able to do so from D3cold. This patch adds a new flag to struct pci_device called 'bridge_d3'. This flag is set and cleared by the PCI core whenever there is a change in power management state of any of the devices behind the PCIe port. When system later on is suspended we only need to check this flag and if it is true transition the port to D3 otherwise we leave it in D0. Also provide override mechanism via command line parameter "pcie_port_pm=[off|force]" that can be used to disable or enable the feature regardless of the BIOS manufacturing date. Tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-02 08:17:12 +00:00
if (pdev->d3cold_allowed)
pci_d3cold_enable(pdev);
else
pci_d3cold_disable(pdev);
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
pm_runtime_resume(dev);
return count;
}
static ssize_t d3cold_allowed_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
return sysfs_emit(buf, "%u\n", pdev->d3cold_allowed);
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
}
static DEVICE_ATTR_RW(d3cold_allowed);
PCI/PM: add PCIe runtime D3cold support This patch adds runtime D3cold support and corresponding ACPI platform support. This patch only enables runtime D3cold support; it does not enable D3cold support during system suspend/hibernate. D3cold is the deepest power saving state for a PCIe device, where its main power is removed. While it is in D3cold, you can't access the device at all, not even its configuration space (which is still accessible in D3hot). Therefore the PCI PM registers can not be used to transition into/out of the D3cold state; that must be done by platform logic such as ACPI _PR3. To support wakeup from D3cold, a system may provide auxiliary power, which allows a device to request wakeup using a Beacon or the sideband WAKE# signal. WAKE# is usually connected to platform logic such as ACPI GPE. This is quite different from other power saving states, where devices request wakeup via a PME message on the PCIe link. Some devices, such as those in plug-in slots, have no direct platform logic. For example, there is usually no ACPI _PR3 for them. D3cold support for these devices can be done via the PCIe Downstream Port leading to the device. When the PCIe port is powered on/off, the device is powered on/off too. Wakeup events from the device will be notified to the corresponding PCIe port. For more information about PCIe D3cold and corresponding ACPI support, please refer to: - PCI Express Base Specification Revision 2.0 - Advanced Configuration and Power Interface Specification Revision 5.0 [bhelgaas: changelog] Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Originally-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-23 02:23:51 +00:00
#endif
#ifdef CONFIG_OF
static ssize_t devspec_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct device_node *np = pci_device_to_OF_node(pdev);
if (np == NULL)
return 0;
return sysfs_emit(buf, "%pOF", np);
}
static DEVICE_ATTR_RO(devspec);
#endif
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
static ssize_t driver_override_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
char *driver_override, *old, *cp;
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
/* We need to keep extra room for a newline */
if (count >= (PAGE_SIZE - 1))
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
return -EINVAL;
driver_override = kstrndup(buf, count, GFP_KERNEL);
if (!driver_override)
return -ENOMEM;
cp = strchr(driver_override, '\n');
if (cp)
*cp = '\0';
device_lock(dev);
old = pdev->driver_override;
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
if (strlen(driver_override)) {
pdev->driver_override = driver_override;
} else {
kfree(driver_override);
pdev->driver_override = NULL;
}
device_unlock(dev);
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
kfree(old);
return count;
}
static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
ssize_t len;
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
device_lock(dev);
len = sysfs_emit(buf, "%s\n", pdev->driver_override);
device_unlock(dev);
return len;
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
}
static DEVICE_ATTR_RW(driver_override);
static struct attribute *pci_dev_attrs[] = {
&dev_attr_power_state.attr,
&dev_attr_resource.attr,
&dev_attr_vendor.attr,
&dev_attr_device.attr,
&dev_attr_subsystem_vendor.attr,
&dev_attr_subsystem_device.attr,
&dev_attr_revision.attr,
&dev_attr_class.attr,
&dev_attr_irq.attr,
&dev_attr_local_cpus.attr,
&dev_attr_local_cpulist.attr,
&dev_attr_modalias.attr,
#ifdef CONFIG_NUMA
&dev_attr_numa_node.attr,
#endif
&dev_attr_dma_mask_bits.attr,
&dev_attr_consistent_dma_mask_bits.attr,
&dev_attr_enable.attr,
&dev_attr_broken_parity_status.attr,
&dev_attr_msi_bus.attr,
#if defined(CONFIG_PM) && defined(CONFIG_ACPI)
&dev_attr_d3cold_allowed.attr,
#endif
#ifdef CONFIG_OF
&dev_attr_devspec.attr,
#endif
PCI: Introduce new device binding path using pci_dev.driver_override The driver_override field allows us to specify the driver for a device rather than relying on the driver to provide a positive match of the device. This shortcuts the existing process of looking up the vendor and device ID, adding them to the driver new_id, binding the device, then removing the ID, but it also provides a couple advantages. First, the above existing process allows the driver to bind to any device matching the new_id for the window where it's enabled. This is often not desired, such as the case of trying to bind a single device to a meta driver like pci-stub or vfio-pci. Using driver_override we can do this deterministically using: echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Previously we could not invoke drivers_probe after adding a device to new_id for a driver as we get non-deterministic behavior whether the driver we intend or the standard driver will claim the device. Now it becomes a deterministic process, only the driver matching driver_override will probe the device. To return the device to the standard driver, we simply clear the driver_override and reprobe the device: echo > /sys/bus/pci/devices/0000:03:00.0/driver_override echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind echo 0000:03:00.0 > /sys/bus/pci/drivers_probe Another advantage to this approach is that we can specify a driver override to force a specific binding or prevent any binding. For instance when an IOMMU group is exposed to userspace through VFIO we require that all devices within that group are owned by VFIO. However, devices can be hot-added into an IOMMU group, in which case we want to prevent the device from binding to any driver (override driver = "none") or perhaps have it automatically bind to vfio-pci. With driver_override it's a simple matter for this field to be set internally when the device is first discovered to prevent driver matches. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 14:53:21 +00:00
&dev_attr_driver_override.attr,
&dev_attr_ari_enabled.attr,
NULL,
};
static struct attribute *pci_bridge_attrs[] = {
&dev_attr_subordinate_bus_number.attr,
&dev_attr_secondary_bus_number.attr,
NULL,
};
static struct attribute *pcie_dev_attrs[] = {
&dev_attr_current_link_speed.attr,
&dev_attr_current_link_width.attr,
&dev_attr_max_link_width.attr,
&dev_attr_max_link_speed.attr,
NULL,
};
static struct attribute *pcibus_attrs[] = {
&dev_attr_bus_rescan.attr,
&dev_attr_cpuaffinity.attr,
&dev_attr_cpulistaffinity.attr,
NULL,
};
static const struct attribute_group pcibus_group = {
.attrs = pcibus_attrs,
};
const struct attribute_group *pcibus_groups[] = {
&pcibus_group,
NULL,
};
static ssize_t boot_vga_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_dev *vga_dev = vga_default_device();
if (vga_dev)
return sysfs_emit(buf, "%u\n", (pdev == vga_dev));
return sysfs_emit(buf, "%u\n",
!!(pdev->resource[PCI_ROM_RESOURCE].flags &
IORESOURCE_ROM_SHADOW));
}
static DEVICE_ATTR_RO(boot_vga);
static ssize_t pci_read_config(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj));
unsigned int size = 64;
loff_t init_off = off;
u8 *data = (u8 *) buf;
/* Several chips lock up trying to read undefined config space */
if (file_ns_capable(filp, &init_user_ns, CAP_SYS_ADMIN))
size = dev->cfg_size;
else if (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
size = 128;
if (off > size)
return 0;
if (off + count > size) {
size -= off;
count = size;
} else {
size = count;
}
pci_config_pm_runtime_get(dev);
if ((off & 1) && size) {
u8 val;
pci_user_read_config_byte(dev, off, &val);
data[off - init_off] = val;
off++;
size--;
}
if ((off & 3) && size > 2) {
u16 val;
pci_user_read_config_word(dev, off, &val);
data[off - init_off] = val & 0xff;
data[off - init_off + 1] = (val >> 8) & 0xff;
off += 2;
size -= 2;
}
while (size > 3) {
u32 val;
pci_user_read_config_dword(dev, off, &val);
data[off - init_off] = val & 0xff;
data[off - init_off + 1] = (val >> 8) & 0xff;
data[off - init_off + 2] = (val >> 16) & 0xff;
data[off - init_off + 3] = (val >> 24) & 0xff;
off += 4;
size -= 4;
cond_resched();
}
if (size >= 2) {
u16 val;
pci_user_read_config_word(dev, off, &val);
data[off - init_off] = val & 0xff;
data[off - init_off + 1] = (val >> 8) & 0xff;
off += 2;
size -= 2;
}
if (size > 0) {
u8 val;
pci_user_read_config_byte(dev, off, &val);
data[off - init_off] = val;
off++;
--size;
}
pci_config_pm_runtime_put(dev);
return count;
}
static ssize_t pci_write_config(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_dev *dev = to_pci_dev(kobj_to_dev(kobj));
unsigned int size = count;
loff_t init_off = off;
u8 *data = (u8 *) buf;
int ret;
ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
if (ret)
return ret;
if (off > dev->cfg_size)
return 0;
if (off + count > dev->cfg_size) {
size = dev->cfg_size - off;
count = size;
}
pci_config_pm_runtime_get(dev);
if ((off & 1) && size) {
pci_user_write_config_byte(dev, off, data[off - init_off]);
off++;
size--;
}
if ((off & 3) && size > 2) {
u16 val = data[off - init_off];
val |= (u16) data[off - init_off + 1] << 8;
pci_user_write_config_word(dev, off, val);
off += 2;
size -= 2;
}
while (size > 3) {
u32 val = data[off - init_off];
val |= (u32) data[off - init_off + 1] << 8;
val |= (u32) data[off - init_off + 2] << 16;
val |= (u32) data[off - init_off + 3] << 24;
pci_user_write_config_dword(dev, off, val);
off += 4;
size -= 4;
}
if (size >= 2) {
u16 val = data[off - init_off];
val |= (u16) data[off - init_off + 1] << 8;
pci_user_write_config_word(dev, off, val);
off += 2;
size -= 2;
}
if (size) {
pci_user_write_config_byte(dev, off, data[off - init_off]);
off++;
--size;
}
pci_config_pm_runtime_put(dev);
return count;
}
PCI/sysfs: Convert "config" to static attribute The "config" sysfs attribute allows access to either the legacy (PCI and PCI-X Mode 1) or the extended (PCI-X Mode 2 and PCIe) device configuration space. Previously it was dynamically created either when a device was added (for hot-added devices) or via a late_initcall (for devices present at boot): pci_bus_add_devices pci_bus_add_device pci_create_sysfs_dev_files if (!sysfs_initialized) return sysfs_create_bin_file # for hot-added devices pci_sysfs_init # late_initcall sysfs_initialized = 1 for_each_pci_dev(pdev) pci_create_sysfs_dev_files(pdev) # for devices present at boot And dynamically removed when the PCI device is stopped and removed: pci_stop_bus_device pci_stop_dev pci_remove_sysfs_dev_files sysfs_remove_bin_file This attribute does not need to be created or removed dynamically, so we can use a static attribute so the device model takes care of addition and removal automatically. Convert "config" to a static attribute and use the .is_bin_visible() callback to set the correct object size (either 256 bytes or 4 KiB) at runtime. The pci_sysfs_init() scheme was added in the pre-git era by https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2 [bhelgaas: commit log] Suggested-by: Oliver O'Halloran <oohall@gmail.com> Link: https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com Link: https://lore.kernel.org/r/20210416205856.3234481-2-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-16 20:58:37 +00:00
static BIN_ATTR(config, 0644, pci_read_config, pci_write_config, 0);
static struct bin_attribute *pci_dev_config_attrs[] = {
&bin_attr_config,
NULL,
};
static umode_t pci_dev_config_attr_is_visible(struct kobject *kobj,
struct bin_attribute *a, int n)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
a->size = PCI_CFG_SPACE_SIZE;
if (pdev->cfg_size > PCI_CFG_SPACE_SIZE)
a->size = PCI_CFG_SPACE_EXP_SIZE;
return a->attr.mode;
}
static const struct attribute_group pci_dev_config_attr_group = {
.bin_attrs = pci_dev_config_attrs,
.is_bin_visible = pci_dev_config_attr_is_visible,
};
#ifdef HAVE_PCI_LEGACY
/**
* pci_read_legacy_io - read byte(s) from legacy I/O port space
* @filp: open sysfs file
* @kobj: kobject corresponding to file to read from
* @bin_attr: struct bin_attribute for this file
* @buf: buffer to store results
* @off: offset into legacy I/O port space
* @count: number of bytes to read
*
* Reads 1, 2, or 4 bytes from legacy I/O port space using an arch specific
* callback routine (pci_legacy_read).
*/
static ssize_t pci_read_legacy_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_bus *bus = to_pci_bus(kobj_to_dev(kobj));
/* Only support 1, 2 or 4 byte accesses */
if (count != 1 && count != 2 && count != 4)
return -EINVAL;
return pci_legacy_read(bus, off, (u32 *)buf, count);
}
/**
* pci_write_legacy_io - write byte(s) to legacy I/O port space
* @filp: open sysfs file
* @kobj: kobject corresponding to file to read from
* @bin_attr: struct bin_attribute for this file
* @buf: buffer containing value to be written
* @off: offset into legacy I/O port space
* @count: number of bytes to write
*
* Writes 1, 2, or 4 bytes from legacy I/O port space using an arch specific
* callback routine (pci_legacy_write).
*/
static ssize_t pci_write_legacy_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_bus *bus = to_pci_bus(kobj_to_dev(kobj));
/* Only support 1, 2 or 4 byte accesses */
if (count != 1 && count != 2 && count != 4)
return -EINVAL;
return pci_legacy_write(bus, off, *(u32 *)buf, count);
}
/**
* pci_mmap_legacy_mem - map legacy PCI memory into user memory space
* @filp: open sysfs file
* @kobj: kobject corresponding to device to be mapped
* @attr: struct bin_attribute for this file
* @vma: struct vm_area_struct passed to mmap
*
* Uses an arch specific callback, pci_mmap_legacy_mem_page_range, to mmap
* legacy memory space (first meg of bus space) into application virtual
* memory space.
*/
static int pci_mmap_legacy_mem(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr,
struct vm_area_struct *vma)
{
struct pci_bus *bus = to_pci_bus(kobj_to_dev(kobj));
return pci_mmap_legacy_page_range(bus, vma, pci_mmap_mem);
}
/**
* pci_mmap_legacy_io - map legacy PCI IO into user memory space
* @filp: open sysfs file
* @kobj: kobject corresponding to device to be mapped
* @attr: struct bin_attribute for this file
* @vma: struct vm_area_struct passed to mmap
*
* Uses an arch specific callback, pci_mmap_legacy_io_page_range, to mmap
* legacy IO space (first meg of bus space) into application virtual
* memory space. Returns -ENOSYS if the operation isn't supported
*/
static int pci_mmap_legacy_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr,
struct vm_area_struct *vma)
{
struct pci_bus *bus = to_pci_bus(kobj_to_dev(kobj));
return pci_mmap_legacy_page_range(bus, vma, pci_mmap_io);
}
/**
* pci_adjust_legacy_attr - adjustment of legacy file attributes
* @b: bus to create files under
* @mmap_type: I/O port or memory
*
* Stub implementation. Can be overridden by arch if necessary.
*/
void __weak pci_adjust_legacy_attr(struct pci_bus *b,
enum pci_mmap_state mmap_type)
{
}
/**
* pci_create_legacy_files - create legacy I/O port and memory files
* @b: bus to create files under
*
* Some platforms allow access to legacy I/O port and ISA memory space on
* a per-bus basis. This routine creates the files and ties them into
* their associated read, write and mmap files from pci-sysfs.c
*
* On error unwind, but don't propagate the error to the caller
* as it is ok to set up the PCI bus without these files.
*/
void pci_create_legacy_files(struct pci_bus *b)
{
int error;
PCI: Also set up legacy files only after sysfs init We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far no problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up and running already to make sure that iomem_get_mapping() works. Wire it up exactly like the existing code in pci_create_sysfs_dev_files(). Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient. An alternative solution would be to implement a callback in sysfs to set up the address space from iomem_get_mapping() when userspace calls mmap(). This also works, but Greg didn't really like that just to work around an ordering issue when the kernel loads initially. v2: Improve commit message (Bjorn) Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210205133632.2827730-1-daniel.vetter@ffwll.ch
2021-02-05 13:36:32 +00:00
if (!sysfs_initialized)
return;
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 21:03:40 +00:00
b->legacy_io = kcalloc(2, sizeof(struct bin_attribute),
GFP_ATOMIC);
if (!b->legacy_io)
goto kzalloc_err;
sysfs_bin_attr_init(b->legacy_io);
b->legacy_io->attr.name = "legacy_io";
b->legacy_io->size = 0xffff;
b->legacy_io->attr.mode = 0600;
b->legacy_io->read = pci_read_legacy_io;
b->legacy_io->write = pci_write_legacy_io;
b->legacy_io->mmap = pci_mmap_legacy_io;
PCI: Revoke mappings like devmem Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps PTEs when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses. Except there are two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole. For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time: - for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs PCI access has only one file per device, and access to a specific resource first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm. A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There are only 2 ways in-tree to support mmap of ioports: generic PCI mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approaches support ioport mmap through a special PFN range and not through magic PTE attributes. Aliasing is therefore not a problem. The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210204165831.2703772-3-daniel.vetter@ffwll.ch
2021-02-04 16:58:31 +00:00
b->legacy_io->mapping = iomem_get_mapping();
pci_adjust_legacy_attr(b, pci_mmap_io);
error = device_create_bin_file(&b->dev, b->legacy_io);
if (error)
goto legacy_io_err;
/* Allocated above after the legacy_io struct */
b->legacy_mem = b->legacy_io + 1;
sysfs_bin_attr_init(b->legacy_mem);
b->legacy_mem->attr.name = "legacy_mem";
b->legacy_mem->size = 1024*1024;
b->legacy_mem->attr.mode = 0600;
b->legacy_mem->mmap = pci_mmap_legacy_mem;
PCI: Revoke mappings like devmem Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps PTEs when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses. Except there are two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole. For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time: - for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs PCI access has only one file per device, and access to a specific resource first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm. A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There are only 2 ways in-tree to support mmap of ioports: generic PCI mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approaches support ioport mmap through a special PFN range and not through magic PTE attributes. Aliasing is therefore not a problem. The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210204165831.2703772-3-daniel.vetter@ffwll.ch
2021-02-04 16:58:31 +00:00
b->legacy_io->mapping = iomem_get_mapping();
pci_adjust_legacy_attr(b, pci_mmap_mem);
error = device_create_bin_file(&b->dev, b->legacy_mem);
if (error)
goto legacy_mem_err;
return;
legacy_mem_err:
device_remove_bin_file(&b->dev, b->legacy_io);
legacy_io_err:
kfree(b->legacy_io);
b->legacy_io = NULL;
kzalloc_err:
dev_warn(&b->dev, "could not create legacy I/O port and ISA memory resources in sysfs\n");
}
void pci_remove_legacy_files(struct pci_bus *b)
{
if (b->legacy_io) {
device_remove_bin_file(&b->dev, b->legacy_io);
device_remove_bin_file(&b->dev, b->legacy_mem);
kfree(b->legacy_io); /* both are allocated here */
}
}
#endif /* HAVE_PCI_LEGACY */
#if defined(HAVE_PCI_MMAP) || defined(ARCH_GENERIC_PCI_MMAP_RESOURCE)
int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vma,
enum pci_mmap_api mmap_api)
{
unsigned long nr, start, size;
resource_size_t pci_start = 0, pci_end;
if (pci_resource_len(pdev, resno) == 0)
return 0;
nr = vma_pages(vma);
start = vma->vm_pgoff;
size = ((pci_resource_len(pdev, resno) - 1) >> PAGE_SHIFT) + 1;
if (mmap_api == PCI_MMAP_PROCFS) {
pci_resource_to_user(pdev, resno, &pdev->resource[resno],
&pci_start, &pci_end);
pci_start >>= PAGE_SHIFT;
}
if (start >= pci_start && start < pci_start + size &&
start + nr <= pci_start + size)
return 1;
return 0;
}
/**
* pci_mmap_resource - map a PCI resource into user memory space
* @kobj: kobject for mapping
* @attr: struct bin_attribute for the file being mapped
* @vma: struct vm_area_struct passed into the mmap
* @write_combine: 1 for write_combine mapping
*
* Use the regular PCI mapping routines to map a PCI resource into userspace.
*/
static int pci_mmap_resource(struct kobject *kobj, struct bin_attribute *attr,
struct vm_area_struct *vma, int write_combine)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
int bar = (unsigned long)attr->private;
enum pci_mmap_state mmap_type;
struct resource *res = &pdev->resource[bar];
int ret;
ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
if (ret)
return ret;
if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
return -EINVAL;
if (!pci_mmap_fits(pdev, bar, vma, PCI_MMAP_SYSFS))
return -EINVAL;
mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;
return pci_mmap_resource_range(pdev, bar, vma, mmap_type, write_combine);
}
static int pci_mmap_resource_uc(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr,
struct vm_area_struct *vma)
{
return pci_mmap_resource(kobj, attr, vma, 0);
}
static int pci_mmap_resource_wc(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr,
struct vm_area_struct *vma)
{
return pci_mmap_resource(kobj, attr, vma, 1);
}
static ssize_t pci_resource_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count, bool write)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
int bar = (unsigned long)attr->private;
unsigned long port = off;
port += pci_resource_start(pdev, bar);
if (port > pci_resource_end(pdev, bar))
return 0;
if (port + count - 1 > pci_resource_end(pdev, bar))
return -EINVAL;
switch (count) {
case 1:
if (write)
outb(*(u8 *)buf, port);
else
*(u8 *)buf = inb(port);
return 1;
case 2:
if (write)
outw(*(u16 *)buf, port);
else
*(u16 *)buf = inw(port);
return 2;
case 4:
if (write)
outl(*(u32 *)buf, port);
else
*(u32 *)buf = inl(port);
return 4;
}
return -EINVAL;
}
static ssize_t pci_read_resource_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
{
return pci_resource_io(filp, kobj, attr, buf, off, count, false);
}
static ssize_t pci_write_resource_io(struct file *filp, struct kobject *kobj,
struct bin_attribute *attr, char *buf,
loff_t off, size_t count)
{
int ret;
ret = security_locked_down(LOCKDOWN_PCI_ACCESS);
if (ret)
return ret;
return pci_resource_io(filp, kobj, attr, buf, off, count, true);
}
/**
* pci_remove_resource_files - cleanup resource files
* @pdev: dev to cleanup
*
* If we created resource files for @pdev, remove them from sysfs and
* free their resources.
*/
static void pci_remove_resource_files(struct pci_dev *pdev)
{
int i;
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
struct bin_attribute *res_attr;
res_attr = pdev->res_attr[i];
if (res_attr) {
sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
kfree(res_attr);
}
res_attr = pdev->res_attr_wc[i];
if (res_attr) {
sysfs_remove_bin_file(&pdev->dev.kobj, res_attr);
kfree(res_attr);
}
}
}
static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
{
/* allocate attribute structure, piggyback attribute name */
int name_len = write_combine ? 13 : 10;
struct bin_attribute *res_attr;
char *res_attr_name;
int retval;
res_attr = kzalloc(sizeof(*res_attr) + name_len, GFP_ATOMIC);
if (!res_attr)
return -ENOMEM;
res_attr_name = (char *)(res_attr + 1);
sysfs_bin_attr_init(res_attr);
if (write_combine) {
pdev->res_attr_wc[num] = res_attr;
sprintf(res_attr_name, "resource%d_wc", num);
res_attr->mmap = pci_mmap_resource_wc;
} else {
pdev->res_attr[num] = res_attr;
sprintf(res_attr_name, "resource%d", num);
if (pci_resource_flags(pdev, num) & IORESOURCE_IO) {
res_attr->read = pci_read_resource_io;
res_attr->write = pci_write_resource_io;
if (arch_can_pci_mmap_io())
res_attr->mmap = pci_mmap_resource_uc;
} else {
res_attr->mmap = pci_mmap_resource_uc;
}
}
PCI: Revoke mappings like devmem Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps PTEs when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses. Except there are two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole. For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time: - for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs PCI access has only one file per device, and access to a specific resource first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm. A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There are only 2 ways in-tree to support mmap of ioports: generic PCI mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approaches support ioport mmap through a special PFN range and not through magic PTE attributes. Aliasing is therefore not a problem. The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210204165831.2703772-3-daniel.vetter@ffwll.ch
2021-02-04 16:58:31 +00:00
if (res_attr->mmap)
res_attr->mapping = iomem_get_mapping();
res_attr->attr.name = res_attr_name;
res_attr->attr.mode = 0600;
res_attr->size = pci_resource_len(pdev, num);
res_attr->private = (void *)(unsigned long)num;
retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
if (retval)
kfree(res_attr);
return retval;
}
/**
* pci_create_resource_files - create resource files in sysfs for @dev
* @pdev: dev in question
*
* Walk the resources in @pdev creating files for each resource available.
*/
static int pci_create_resource_files(struct pci_dev *pdev)
{
int i;
int retval;
/* Expose the PCI resources from this device as files */
for (i = 0; i < PCI_STD_NUM_BARS; i++) {
/* skip empty resources */
if (!pci_resource_len(pdev, i))
continue;
retval = pci_create_attr(pdev, i, 0);
/* for prefetchable resources, create a WC mappable file */
if (!retval && arch_can_pci_mmap_wc() &&
pdev->resource[i].flags & IORESOURCE_PREFETCH)
retval = pci_create_attr(pdev, i, 1);
if (retval) {
pci_remove_resource_files(pdev);
return retval;
}
}
return 0;
}
#else /* !(defined(HAVE_PCI_MMAP) || defined(ARCH_GENERIC_PCI_MMAP_RESOURCE)) */
int __weak pci_create_resource_files(struct pci_dev *dev) { return 0; }
void __weak pci_remove_resource_files(struct pci_dev *dev) { return; }
#endif
/**
* pci_write_rom - used to enable access to the PCI ROM display
* @filp: sysfs file
* @kobj: kernel object handle
* @bin_attr: struct bin_attribute for this file
* @buf: user input
* @off: file offset
* @count: number of byte in input
*
* writing anything except 0 enables it
*/
static ssize_t pci_write_rom(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
if ((off == 0) && (*buf == '0') && (count == 2))
pdev->rom_attr_enabled = 0;
else
pdev->rom_attr_enabled = 1;
return count;
}
/**
* pci_read_rom - read a PCI ROM
* @filp: sysfs file
* @kobj: kernel object handle
* @bin_attr: struct bin_attribute for this file
* @buf: where to put the data we read from the ROM
* @off: file offset
* @count: number of bytes to read
*
* Put @count bytes starting at @off into @buf from the ROM in the PCI
* device corresponding to @kobj.
*/
static ssize_t pci_read_rom(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr, char *buf,
loff_t off, size_t count)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
void __iomem *rom;
size_t size;
if (!pdev->rom_attr_enabled)
return -EINVAL;
rom = pci_map_rom(pdev, &size); /* size starts out as PCI window size */
if (!rom || !size)
return -EIO;
if (off >= size)
count = 0;
else {
if (off + count > size)
count = size - off;
memcpy_fromio(buf, rom + off, count);
}
pci_unmap_rom(pdev, rom);
return count;
}
static BIN_ATTR(rom, 0600, pci_read_rom, pci_write_rom, 0);
static struct bin_attribute *pci_dev_rom_attrs[] = {
&bin_attr_rom,
NULL,
};
static umode_t pci_dev_rom_attr_is_visible(struct kobject *kobj,
struct bin_attribute *a, int n)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
size_t rom_size;
/* If the device has a ROM, try to expose it in sysfs. */
rom_size = pci_resource_len(pdev, PCI_ROM_RESOURCE);
if (!rom_size)
return 0;
a->size = rom_size;
return a->attr.mode;
}
static const struct attribute_group pci_dev_rom_attr_group = {
.bin_attrs = pci_dev_rom_attrs,
.is_bin_visible = pci_dev_rom_attr_is_visible,
};
static ssize_t reset_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct pci_dev *pdev = to_pci_dev(dev);
unsigned long val;
ssize_t result = kstrtoul(buf, 0, &val);
if (result < 0)
return result;
if (val != 1)
return -EINVAL;
pm_runtime_get_sync(dev);
result = pci_reset_function(pdev);
pm_runtime_put(dev);
if (result < 0)
return result;
return count;
}
static DEVICE_ATTR_WO(reset);
static struct attribute *pci_dev_reset_attrs[] = {
&dev_attr_reset.attr,
NULL,
};
static umode_t pci_dev_reset_attr_is_visible(struct kobject *kobj,
struct attribute *a, int n)
{
struct pci_dev *pdev = to_pci_dev(kobj_to_dev(kobj));
if (!pdev->reset_fn)
return 0;
return a->mode;
}
static const struct attribute_group pci_dev_reset_attr_group = {
.attrs = pci_dev_reset_attrs,
.is_visible = pci_dev_reset_attr_is_visible,
};
int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
{
if (!sysfs_initialized)
return -EACCES;
return pci_create_resource_files(pdev);
}
/**
* pci_remove_sysfs_dev_files - cleanup PCI specific sysfs files
* @pdev: device whose entries we should free
*
* Cleanup when @pdev is removed from sysfs.
*/
void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
{
if (!sysfs_initialized)
return;
pci_remove_resource_files(pdev);
}
static int __init pci_sysfs_init(void)
{
struct pci_dev *pdev = NULL;
PCI: Also set up legacy files only after sysfs init We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far no problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up and running already to make sure that iomem_get_mapping() works. Wire it up exactly like the existing code in pci_create_sysfs_dev_files(). Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient. An alternative solution would be to implement a callback in sysfs to set up the address space from iomem_get_mapping() when userspace calls mmap(). This also works, but Greg didn't really like that just to work around an ordering issue when the kernel loads initially. v2: Improve commit message (Bjorn) Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210205133632.2827730-1-daniel.vetter@ffwll.ch
2021-02-05 13:36:32 +00:00
struct pci_bus *pbus = NULL;
int retval;
sysfs_initialized = 1;
for_each_pci_dev(pdev) {
retval = pci_create_sysfs_dev_files(pdev);
if (retval) {
pci_dev_put(pdev);
return retval;
}
}
PCI: Also set up legacy files only after sysfs init We are already doing this for all the regular sysfs files on PCI devices, but not yet on the legacy io files on the PCI buses. Thus far no problem, but in the next patch I want to wire up iomem revoke support. That needs the vfs up and running already to make sure that iomem_get_mapping() works. Wire it up exactly like the existing code in pci_create_sysfs_dev_files(). Note that pci_remove_legacy_files() doesn't need a check since the one for pci_bus->legacy_io is sufficient. An alternative solution would be to implement a callback in sysfs to set up the address space from iomem_get_mapping() when userspace calls mmap(). This also works, but Greg didn't really like that just to work around an ordering issue when the kernel loads initially. v2: Improve commit message (Bjorn) Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210205133632.2827730-1-daniel.vetter@ffwll.ch
2021-02-05 13:36:32 +00:00
while ((pbus = pci_find_next_bus(pbus)))
pci_create_legacy_files(pbus);
return 0;
}
late_initcall(pci_sysfs_init);
static struct attribute *pci_dev_dev_attrs[] = {
&dev_attr_boot_vga.attr,
NULL,
};
static umode_t pci_dev_attrs_are_visible(struct kobject *kobj,
struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct pci_dev *pdev = to_pci_dev(dev);
if (a == &dev_attr_boot_vga.attr)
if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
return 0;
return a->mode;
}
static struct attribute *pci_dev_hp_attrs[] = {
&dev_attr_remove.attr,
&dev_attr_dev_rescan.attr,
NULL,
};
static umode_t pci_dev_hp_attrs_are_visible(struct kobject *kobj,
struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct pci_dev *pdev = to_pci_dev(dev);
if (pdev->is_virtfn)
return 0;
return a->mode;
}
static umode_t pci_bridge_attrs_are_visible(struct kobject *kobj,
struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct pci_dev *pdev = to_pci_dev(dev);
if (pci_is_bridge(pdev))
return a->mode;
return 0;
}
static umode_t pcie_dev_attrs_are_visible(struct kobject *kobj,
struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct pci_dev *pdev = to_pci_dev(dev);
if (pci_is_pcie(pdev))
return a->mode;
return 0;
}
static const struct attribute_group pci_dev_group = {
.attrs = pci_dev_attrs,
};
const struct attribute_group *pci_dev_groups[] = {
&pci_dev_group,
PCI/sysfs: Convert "config" to static attribute The "config" sysfs attribute allows access to either the legacy (PCI and PCI-X Mode 1) or the extended (PCI-X Mode 2 and PCIe) device configuration space. Previously it was dynamically created either when a device was added (for hot-added devices) or via a late_initcall (for devices present at boot): pci_bus_add_devices pci_bus_add_device pci_create_sysfs_dev_files if (!sysfs_initialized) return sysfs_create_bin_file # for hot-added devices pci_sysfs_init # late_initcall sysfs_initialized = 1 for_each_pci_dev(pdev) pci_create_sysfs_dev_files(pdev) # for devices present at boot And dynamically removed when the PCI device is stopped and removed: pci_stop_bus_device pci_stop_dev pci_remove_sysfs_dev_files sysfs_remove_bin_file This attribute does not need to be created or removed dynamically, so we can use a static attribute so the device model takes care of addition and removal automatically. Convert "config" to a static attribute and use the .is_bin_visible() callback to set the correct object size (either 256 bytes or 4 KiB) at runtime. The pci_sysfs_init() scheme was added in the pre-git era by https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=f6d553444da2 [bhelgaas: commit log] Suggested-by: Oliver O'Halloran <oohall@gmail.com> Link: https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@mail.gmail.com Link: https://lore.kernel.org/r/20210416205856.3234481-2-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-04-16 20:58:37 +00:00
&pci_dev_config_attr_group,
&pci_dev_rom_attr_group,
&pci_dev_reset_attr_group,
&pci_dev_vpd_attr_group,
#ifdef CONFIG_DMI
&pci_dev_smbios_attr_group,
#endif
#ifdef CONFIG_ACPI
&pci_dev_acpi_attr_group,
#endif
NULL,
};
static const struct attribute_group pci_dev_hp_attr_group = {
.attrs = pci_dev_hp_attrs,
.is_visible = pci_dev_hp_attrs_are_visible,
};
static const struct attribute_group pci_dev_attr_group = {
.attrs = pci_dev_dev_attrs,
.is_visible = pci_dev_attrs_are_visible,
};
static const struct attribute_group pci_bridge_attr_group = {
.attrs = pci_bridge_attrs,
.is_visible = pci_bridge_attrs_are_visible,
};
static const struct attribute_group pcie_dev_attr_group = {
.attrs = pcie_dev_attrs,
.is_visible = pcie_dev_attrs_are_visible,
};
static const struct attribute_group *pci_dev_attr_groups[] = {
&pci_dev_attr_group,
&pci_dev_hp_attr_group,
#ifdef CONFIG_PCI_IOV
PCI/IOV: Add sysfs MSI-X vector assignment interface A typical cloud provider SR-IOV use case is to create many VFs for use by guest VMs. The VFs may not be assigned to a VM until a customer requests a VM of a certain size, e.g., number of CPUs. A VF may need MSI-X vectors proportional to the number of CPUs in the VM, but there is no standard way to change the number of MSI-X vectors supported by a VF. Some Mellanox ConnectX devices support dynamic assignment of MSI-X vectors to SR-IOV VFs. This can be done by the PF driver after VFs are enabled, and it can be done without affecting VFs that are already in use. The hardware supports a limited pool of MSI-X vectors that can be assigned to the PF or to individual VFs. This is device-specific behavior that requires support in the PF driver. Add a read-only "sriov_vf_total_msix" sysfs file for the PF and a writable "sriov_vf_msix_count" file for each VF. Management software may use these to learn how many MSI-X vectors are available and to dynamically assign them to VFs before the VFs are passed through to a VM. If the PF driver implements the ->sriov_get_vf_total_msix() callback, "sriov_vf_total_msix" contains the total number of MSI-X vectors available for distribution among VFs. If no driver is bound to the VF, writing "N" to "sriov_vf_msix_count" uses the PF driver ->sriov_set_msix_vec_count() callback to assign "N" MSI-X vectors to the VF. When a VF driver subsequently reads the MSI-X Message Control register, it will see the new Table Size "N". Link: https://lore.kernel.org/linux-pci/20210314124256.70253-2-leon@kernel.org Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-04-04 07:22:18 +00:00
&sriov_pf_dev_attr_group,
&sriov_vf_dev_attr_group,
#endif
&pci_bridge_attr_group,
&pcie_dev_attr_group,
#ifdef CONFIG_PCIEAER
&aer_stats_attr_group,
#endif
#ifdef CONFIG_PCIEASPM
&aspm_ctrl_attr_group,
#endif
NULL,
};
const struct device_type pci_dev_type = {
.groups = pci_dev_attr_groups,
};