linux-stable/drivers/cxl
Dan Williams 7c7371b41a cxl/mem: Fix shutdown order
[ Upstream commit 88d3917f82 ]

Ira reports that removing cxl_mock_mem causes a crash with the following
trace:

 BUG: kernel NULL pointer dereference, address: 0000000000000044
 [..]
 RIP: 0010:cxl_region_decode_reset+0x7f/0x180 [cxl_core]
 [..]
 Call Trace:
  <TASK>
  cxl_region_detach+0xe8/0x210 [cxl_core]
  cxl_decoder_kill_region+0x27/0x40 [cxl_core]
  cxld_unregister+0x29/0x40 [cxl_core]
  devres_release_all+0xb8/0x110
  device_unbind_cleanup+0xe/0x70
  device_release_driver_internal+0x1d2/0x210
  bus_remove_device+0xd7/0x150
  device_del+0x155/0x3e0
  device_unregister+0x13/0x60
  devm_release_action+0x4d/0x90
  ? __pfx_unregister_port+0x10/0x10 [cxl_core]
  delete_endpoint+0x121/0x130 [cxl_core]
  devres_release_all+0xb8/0x110
  device_unbind_cleanup+0xe/0x70
  device_release_driver_internal+0x1d2/0x210
  bus_remove_device+0xd7/0x150
  device_del+0x155/0x3e0
  ? lock_release+0x142/0x290
  cdev_device_del+0x15/0x50
  cxl_memdev_unregister+0x54/0x70 [cxl_core]

This crash is due to the clearing out the cxl_memdev's driver context
(@cxlds) before the subsystem is done with it. This is ultimately due to
the region(s), that this memdev is a member, being torn down and expecting
to be able to de-reference @cxlds, like here:

static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
...
                if (cxlds->rcd)
                        goto endpoint_reset;
...

Fix it by keeping the driver context valid until memdev-device
unregistration, and subsequently the entire stack of related
dependencies, unwinds.

Fixes: 9cc238c7a5 ("cxl/pci: Introduce cdevm_file_operations")
Reported-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20 11:52:13 +01:00
..
core cxl/mem: Fix shutdown order 2023-11-20 11:52:13 +01:00
acpi.c cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws() 2023-08-03 10:24:04 +02:00
cxl.h cxl/region: Fix 'distance' calculation with passthrough ports 2022-11-04 16:01:24 -07:00
cxlmem.h cxl/region: Attach endpoint decoders 2022-07-25 12:18:07 -07:00
cxlpci.h cxl: Wait Memory_Info_Valid before access memory related info 2023-05-30 14:03:32 +01:00
Kconfig cxl/region: Allocate HPA capacity to regions 2022-07-25 12:18:06 -07:00
Makefile PM: CXL: Disable suspend 2022-04-22 16:09:42 -07:00
mem.c cxl/mem: Enumerate port targets before adding endpoints 2022-07-21 17:19:25 -07:00
pci.c cxl/pci: Create PCI DOE mailbox's for memory devices 2022-07-19 15:38:04 -07:00
pmem.c cxl/pmem: Fix nvdimm registration races 2023-03-10 09:34:20 +01:00
port.c cxl/port: Read CDAT table 2022-07-19 15:38:05 -07:00