Commit graph

9 commits

Author SHA1 Message Date
Dan Williams
7dab174e2e dax/hmem: Move hmem device registration to dax_hmem.ko
In preparation for the CXL region driver to take over the responsibility
of registering device-dax instances for CXL regions, move the
registration of "hmem" devices to dax_hmem.ko.

Previously the builtin component of this enabling
(drivers/dax/hmem/device.o) would register platform devices for each
address range and trigger the dax_hmem.ko module to load and attach
device-dax instances to those devices. Now, the ranges are collected
from the HMAT and EFI memory map walking, but the device creation is
deferred. A new "hmem_platform" device is created which triggers
dax_hmem.ko to load and register the platform devices.

Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/167602002771.1924368.5653558226424530127.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10 17:33:34 -08:00
Dan Williams
fe098574a9 dax/hmem: Convey the dax range via memregion_info()
In preparation for hmem platform devices to be unregistered, stop using
platform_device_add_resources() to convey the address range. The
platform_device_add_resources() API causes an existing "Soft Reserved"
iomem resource to be re-parented under an inserted platform device
resource. When that platform device is deleted it removes the platform
device resource and all children.

Instead, it is sufficient to convey just the address range and let
request_mem_region() insert resources to indicate the devices active in
the range. This allows the "Soft Reserved" resource to be re-enumerated
upon the next probe event.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602002217.1924368.7036275892522551624.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10 17:33:29 -08:00
Dan Williams
df2798bc77 dax/hmem: Move HMAT and Soft reservation probe initcall level
In preparation for moving more filtering of "hmem" ranges into the
dax_hmem.ko module, update the initcall levels. HMAT range registration
moves to subsys_initcall() to be done before Soft Reservation probing,
and Soft Reservation probing is moved to device_initcall() to be done
before dax_hmem.ko initialization if it is built-in.

Tested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167602001107.1924368.11562316181038595611.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10 17:33:18 -08:00
Dan Williams
472faf72b3 device-dax: Fix duplicate 'hmem' device registration
So called "soft-reserved" memory is an EFI conventional memory range
with the EFI_MEMORY_SP attribute set. That attribute indicates that the
memory is not part of the platform general purpose memory pool and may
want some consideration from the system administrator about whether to
keep that memory set aside for dedicated access through device-dax (map
a device file), or assigned to the page allocator as another general
purpose memory node target.

Absent an ACPI HMAT table the default device-dax registration creates
coarse grained devices that are delineated by EFI Memory Map entries.
With the HMAT the devices are delineated by the finer grained ranges
associated with the proximity domain of the memory target. I.e. the HMAT
describes the properties of performance differentiated memory and each
unique performance description results in a unique target proximity
domain where each memory proximity domain has an associated SRAT entry
that delineates the address range.

The intent was that SRAT-defined device-dax instances are registered
first. Then any left-over address range with the EFI_MEMORY_SP
attribute, but not covered by the SRAT, would have a coarse grained
device-dax instance established. However, the scheme to detect what
ranges are left to be assigned to a device was buggy and resulted in
multiple overlapping device-dax instances. Fix this by using explicit
tracking for which ranges have been handled.

Now, this new approach may leave memory stranded in the presence of
broken platform firmware that fails to fully describe all EFI_MEMORY_SP
ranges in the HMAT. That requires a deeper fix if it becomes a problem
in practice.

Reported-by: "Tallam Mahendra Kumar" <tallam.mahendra.kumar@intel.com>
Reported-by: Mustafa Hajeer <mustafa.hajeer@intel.com>
Debugged-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166890823379.4183293.15333502171004313377.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-21 15:34:40 -08:00
Linus Torvalds
19d17ab7c6 libnvdimm for 6.1
- Fix nvdimm namespace creation on platforms that do not publish
   associated 'DIMM' metadata for a persistent memory region.
 
 - Miscellaneous fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY0nx4QAKCRDfioYZHlFs
 Zwg3AQD6t9gvqu7AV1eTEGrypC3C47Z5yeeEqLC7U9DjtnxP9wD/VWGNX7uYu6Ck
 rf8vyT7NFqg0khpU6XeaaDWQtasRggs=
 =WCrX
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull nvdimm updates from Dan Williams:
 "Some small cleanups and fixes in and around the nvdimm subsystem. The
  most significant change is a regression fix for nvdimm namespace
  (volume) creation when the namespace size is smaller than 2MB/

  Summary:

   - Fix nvdimm namespace creation on platforms that do not publish
     associated 'DIMM' metadata for a persistent memory region.

   - Miscellaneous fixes and cleanups"

* tag 'libnvdimm-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  ACPI: HMAT: Release platform device in case of platform_device_add_data() fails
  dax: Remove usage of the deprecated ida_simple_xxx API
  libnvdimm/region: Allow setting align attribute on regions without mappings
  nvdimm/namespace: Fix comment typo
  nvdimm: make __nvdimm_security_overwrite_query static
  nvdimm/region: Fix kernel-doc
  nvdimm/namespace: drop unneeded temporary variable in size_store()
  nvdimm/namespace: return uuid_null only once in nd_dev_to_uuid()
2022-10-14 18:41:41 -07:00
Lin Yujun
6a02124c87 ACPI: HMAT: Release platform device in case of platform_device_add_data() fails
The platform device is not released when platform_device_add_data()
fails. And platform_device_put() perfom one more pointer check than
put_device() to check for errors in the 'pdev' pointer.

Use platform_device_put() to release platform device in
platform_device_add()/platform_device_add_data()/
platform_device_add_resources() error case.

Fixes: c01044cc81 ("ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device")
Signed-off-by: Lin Yujun <linyujun809@huawei.com>
Link: https://lore.kernel.org/r/20220914033755.99924-1-linyujun809@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-09-29 17:54:29 -07:00
Dan Williams
67feaba413 devdax: Fix soft-reservation memory description
The "hmem" platform-devices that are created to represent the
platform-advertised "Soft Reserved" memory ranges end up inserting a
resource that causes the iomem_resource tree to look like this:

340000000-43fffffff : hmem.0
  340000000-43fffffff : Soft Reserved
    340000000-43fffffff : dax0.0

This is because insert_resource() reparents ranges when they completely
intersect an existing range.

This matters because code that uses region_intersects() to scan for a
given IORES_DESC will only check that top-level 'hmem.0' resource and
not the 'Soft Reserved' descendant.

So, to support EINJ (via einj_error_inject()) to inject errors into
memory hosted by a dax-device, be sure to describe the memory as
IORES_DESC_SOFT_RESERVED. This is a follow-on to:

commit b13a3e5fd4 ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")

...that fixed EINJ support for "Soft Reserved" ranges in the first
instance.

Fixes: 262b45ae3a ("x86/efi: EFI soft reservation to E820 enumeration")
Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Omar Avelar <omar.avelar@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-09-24 18:05:53 -07:00
Dan Williams
5ccac54f3e ACPI: HMAT: attach a device for each soft-reserved range
The hmem enabling in commit cf8741ac57 ("ACPI: NUMA: HMAT: Register
"soft reserved" memory as an "hmem" device") only registered ranges to the
hmem driver for each soft-reservation that also appeared in the HMAT.
While this is meant to encourage platform firmware to "do the right thing"
and publish an HMAT, the corollary is that platforms that fail to publish
an accurate HMAT will strand memory from Linux usage.  Additionally, the
"efi_fake_mem" kernel command line option enabling will strand memory by
default without an HMAT.

Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to
be published as raw resource ranges for the hmem driver to consume.

Include a module parameter to disable either this fallback behavior, or
the hmat enabling from creating hmem devices.  The module parameter
requires the hmem device enabling to have unique name in the module
namespace: "device_hmem".

The driver depends on the architecture providing phys_to_target_node()
which is only x86 via numa_meminfo() and arm64 via a generic memblock
implementation.

[joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()]
  Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.com

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jia He <justin.he@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:28 -07:00
Dan Williams
c01044cc81 ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
In preparation for exposing "Soft Reserved" memory ranges without an HMAT,
move the hmem device registration to its own compilation unit and make the
implementation generic.

The generic implementation drops usage acpi_map_pxm_to_online_node() that
was translating ACPI proximity domain values and instead relies on
numa_map_to_online_node() to determine the numa node for the device.

[joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y]
  Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:27 -07:00