linux-stable/drivers
Dan Williams 53989fad12 cxl/pmem: Fix module reload vs workqueue state
A test of the form:

    while true; do modprobe -r cxl_pmem; modprobe cxl_pmem; done

May lead to a crash signature of the form:

    BUG: unable to handle page fault for address: ffffffffc0660030
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    [..]
    Workqueue: cxl_pmem 0xffffffffc0660030
    RIP: 0010:0xffffffffc0660030
    Code: Unable to access opcode bytes at RIP 0xffffffffc0660006.
    [..]
    Call Trace:
     ? process_one_work+0x4ec/0x9c0
     ? pwq_dec_nr_in_flight+0x100/0x100
     ? rwlock_bug.part.0+0x60/0x60
     ? worker_thread+0x2eb/0x700

In that report the 0xffffffffc0660030 address corresponds to the former
function address of cxl_nvb_update_state() from a previous load of the
module, not the current address. Fix that by arranging for ->state_work
in the 'struct cxl_nvdimm_bridge' object to be reinitialized on cxl_pmem
module reload.

Details:

Recall that CXL subsystem wants to link a CXL memory expander device to
an NVDIMM sub-hierarchy when both a persistent memory range has been
registered by the CXL platform driver (cxl_acpi) *and* when that CXL
memory expander has published persistent memory capacity (Get Partition
Info). To this end the cxl_nvdimm_bridge driver arranges to rescan the
CXL bus when either of those conditions change. The helper
bus_rescan_devices() can not be called underneath the device_lock() for
any device on that bus, so the cxl_nvdimm_bridge driver uses a workqueue
for the rescan.

Typically a driver allocates driver data to hold a 'struct work_struct'
for a driven device, but for a workqueue that may run after ->remove()
returns, driver data will have been freed. The 'struct
cxl_nvdimm_bridge' object holds the state and work_struct directly.
Unfortunately it was only arranging for that infrastructure to be
initialized once per device creation rather than the necessary once per
workqueue (cxl_pmem_wq) creation.

Introduce is_cxl_nvdimm_bridge() and cxl_nvdimm_bridge_reset() in
support of invalidating stale references to a recently destroyed
cxl_pmem_wq.

Cc: <stable@vger.kernel.org>
Fixes: 8fdcb1704f ("cxl/pmem: Add initial infrastructure for pmem support")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/163665474585.3505991.8397182770066720755.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-15 11:03:00 -08:00
..
accessibility
acpi ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT 2021-11-15 11:03:00 -08:00
amba
android Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
ata libata: libahci: declare ahci_shost_attr_group as static 2021-11-12 08:05:47 +09:00
atm
auxdisplay auxdisplay: cfag12864bfb: code indent should use tabs where possible 2021-10-22 00:13:16 +02:00
base arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() 2021-11-11 13:09:33 +01:00
bcma pci-v5.16-changes 2021-11-06 14:36:12 -07:00
block for-5.16/drivers-2021-11-09 2021-11-09 11:24:08 -08:00
bluetooth TTY / Serial driver update for 5.16-rc1 2021-11-04 09:09:37 -07:00
bus - Config updates for BMIPS platform 2021-11-13 09:11:33 -08:00
cdrom for-5.16/cdrom-2021-10-29 2021-11-01 10:09:14 -07:00
char Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
clk Devicetree fixes for v5.16, take 1: 2021-11-14 11:11:51 -08:00
clocksource ARM: 2021-11-02 11:24:14 -07:00
comedi comedi: dt9812: fix DMA buffers on stack 2021-10-30 10:54:47 +02:00
connector
counter
cpufreq cpufreq: intel_pstate: Clear HWP Status during HWP Interrupt enable 2021-11-04 19:48:47 +01:00
cpuidle ARM: SoC drivers for 5.16 2021-11-03 17:00:52 -07:00
crypto pci-v5.16-changes 2021-11-06 14:36:12 -07:00
cxl cxl/pmem: Fix module reload vs workqueue state 2021-11-15 11:03:00 -08:00
dax
dca
devfreq Merge branches 'pm-opp' and 'pm-cpufreq' 2021-11-10 14:06:51 +01:00
dio
dma dmaengine updates for v5.16-rc1 2021-11-10 11:47:55 -08:00
dma-buf drm next/fixes for 5.16-rc1 2021-11-12 12:11:07 -08:00
edac - amd64_edac: Add support for three-rank interleaving mode which is 2021-11-01 15:02:49 -07:00
eisa
extcon extcon: usbc-tusb320: Add support for TUSB320L 2021-10-27 14:13:39 +09:00
firewire SCSI misc on 20211105 2021-11-05 08:42:02 -07:00
firmware Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
fpga
fsi fsi: sbefifo: Use interruptible mutex locking 2021-10-22 09:54:33 +10:30
gnss
gpio gpio updates for v5.16 2021-11-08 11:55:21 -08:00
gpu drm next/fixes for 5.16-rc1 2021-11-12 12:11:07 -08:00
greybus
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2021-11-05 08:31:51 -07:00
hsi HSI changes for the 5.16 series 2021-11-04 13:56:55 -07:00
hv hyperv-next for 5.16 2021-11-02 10:56:49 -07:00
hwmon Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
hwspinlock
hwtracing coresight: trbe: Work around write to out of range 2021-10-27 11:46:01 -06:00
i2c More ACPI updates for 5.16-rc1 2021-11-10 11:52:40 -08:00
i3c
idle
iio chrome platform changes for 5.16 2021-11-10 11:36:43 -08:00
infiniband SCSI misc on 20211105 2021-11-05 08:42:02 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-11-12 11:53:16 -08:00
interconnect
iommu pci-v5.16-changes 2021-11-06 14:36:12 -07:00
ipack
irqchip irqchip/sifive-plic: Fixup EOI failed when masked 2021-11-12 16:09:51 +00:00
isdn
leds
macintosh Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
mailbox mailbox: imx: support i.MX8ULP S4 MU 2021-10-29 23:03:09 -05:00
mcb
md for-5.16/drivers-2021-11-09 2021-11-09 11:24:08 -08:00
media More ACPI updates for 5.16-rc1 2021-11-10 11:52:40 -08:00
memory Memory controller drivers for v5.16, part two 2021-10-21 21:34:21 +02:00
memstick
message pci-v5.16-changes 2021-11-06 14:36:12 -07:00
mfd chrome platform changes for 5.16 2021-11-10 11:36:43 -08:00
misc More ACPI updates for 5.16-rc1 2021-11-10 11:52:40 -08:00
mmc Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
most most: fix control-message timeouts 2021-10-26 19:12:01 +02:00
mtd for-5.16/drivers-2021-11-09 2021-11-09 11:24:08 -08:00
mux mux: add support for delay after muxing 2021-10-21 20:02:42 +01:00
net Networking fixes for 5.16-rc1, including fixes from bpf, can 2021-11-11 09:49:36 -08:00
nfc nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails 2021-11-07 19:37:04 +00:00
ntb
nubus
nvdimm libnvdimm for v5.16 2021-11-10 10:56:02 -08:00
nvme for-5.16/block-2021-11-09 2021-11-09 11:20:07 -08:00
nvmem
of Devicetree fixes for v5.16, take 1: 2021-11-14 11:11:51 -08:00
opp
parisc
parport
pci A set of fixes for the interrupt subsystem: 2021-11-14 10:38:27 -08:00
pcmcia Core: 2021-11-02 06:20:58 -07:00
perf ACPI updates for 5.16-rc1 2021-11-02 15:58:39 -07:00
phy Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
pinctrl Pin control changes for the v5.16 kernel cycle 2021-11-05 08:24:17 -07:00
platform chrome platform changes for 5.16 2021-11-10 11:36:43 -08:00
pnp
power power: supply: bq25890: Fix initial setting of the F_CONV_RATE field 2021-11-02 16:48:47 +01:00
powercap
pps
ps3
ptp ptp: fix code indentation issues 2021-10-28 14:42:20 +01:00
pwm pwm: vt8500: Rename pwm_busy_wait() to make it obviously driver-specific 2021-11-05 11:57:13 +01:00
rapidio rapidio: avoid bogus __alloc_size warning 2021-11-06 13:30:33 -07:00
ras
regulator - Remove Drivers 2021-11-08 12:07:52 -08:00
remoteproc
reset ARM: SoC drivers for 5.16 2021-11-03 17:00:52 -07:00
rpmsg remoteproc updates for v5.16 2021-11-10 09:07:26 -08:00
rtc RTC for 5.16 2021-11-12 11:44:31 -08:00
s390 s390/cio: check the subchannel validity for dev_busid 2021-11-08 14:17:49 +01:00
sbus
scsi SCSI misc on 20211112 2021-11-12 12:25:50 -08:00
sh
siox
slimbus
soc Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
soundwire
spi spi: Updates for v5.16 2021-11-01 19:09:04 -07:00
spmi
ssb
staging Merge branch 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-11-10 16:15:54 -08:00
target SCSI misc on 20211112 2021-11-12 12:25:50 -08:00
tc
tee Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
thermal thermal: int340x: fix build on 32-bit targets 2021-11-12 10:56:25 -08:00
thunderbolt thunderbolt: Changes for v5.16 merge window 2021-10-25 13:17:29 +02:00
tty TTY / Serial driver update for 5.16-rc1 2021-11-04 09:09:37 -07:00
uio Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM 2021-10-28 11:22:23 +00:00
usb USB fixes for 5.16-rc1 2021-11-11 09:40:15 -08:00
vdpa vhost,virtio,vhost: fixes,features 2021-11-03 15:00:39 -07:00
vfio
vhost vdpa: Introduce and use vdpa device get, set config helpers 2021-11-01 05:26:49 -04:00
video drm next/fixes for 5.16-rc1 2021-11-12 12:11:07 -08:00
virt
virtio virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE 2021-11-10 15:32:38 +01:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 5.16-rc1 tag 2021-11-10 09:41:22 -08:00
xen xen: branch for v5.16-rc1 2021-11-10 11:14:21 -08:00
zorro
Kconfig
Makefile