linux-stable/drivers/net/ethernet/amd
Brett Creeley 38407914d4 pds_core: Fix pdsc_check_pci_health function to use work thread
[ Upstream commit 81665adf25 ]

When the driver notices fw_status == 0xff it tries to perform a PCI
reset on itself via pci_reset_function() in the context of the driver's
health thread. However, pdsc_reset_prepare calls
pdsc_stop_health_thread(), which attempts to stop/flush the health
thread. This results in a deadlock because the stop/flush will never
complete since the driver called pci_reset_function() from the health
thread context. Fix by changing the pdsc_check_pci_health_function()
to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
work queue.

Unloading the driver in the fw_down/dead state uncovered another issue,
which can be seen in the following trace:

WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
[...]
RIP: 0010:__queue_work+0x358/0x440
[...]
Call Trace:
 <TASK>
 ? __warn+0x85/0x140
 ? __queue_work+0x358/0x440
 ? report_bug+0xfc/0x1e0
 ? handle_bug+0x3f/0x70
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? __queue_work+0x358/0x440
 queue_work_on+0x28/0x30
 pdsc_devcmd_locked+0x96/0xe0 [pds_core]
 pdsc_devcmd_reset+0x71/0xb0 [pds_core]
 pdsc_teardown+0x51/0xe0 [pds_core]
 pdsc_remove+0x106/0x200 [pds_core]
 pci_device_remove+0x37/0xc0
 device_release_driver_internal+0xae/0x140
 driver_detach+0x48/0x90
 bus_remove_driver+0x6d/0xf0
 pci_unregister_driver+0x2e/0xa0
 pdsc_cleanup_module+0x10/0x780 [pds_core]
 __x64_sys_delete_module+0x142/0x2b0
 ? syscall_trace_enter.isra.18+0x126/0x1a0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fbd9d03a14b
[...]

Fix this by preventing the devcmd reset if the FW is not running.

Fixes: d9407ff118 ("pds_core: Prevent health thread from running during reset/remove")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17 11:23:31 +02:00
..
pds_core pds_core: Fix pdsc_check_pci_health function to use work thread 2024-04-17 11:23:31 +02:00
xgbe net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops 2023-12-13 22:07:16 -08:00
7990.c
7990.h
Kconfig pds_core: add AUXILIARY_BUS and NET_DEVLINK to Kconfig 2023-05-03 09:16:53 +01:00
Makefile pds_core: Kconfig and pds_core.rst 2023-04-21 08:29:14 +01:00
a2065.c net: amd: Unified the comparison between pointers and NULL to the same writing 2022-09-16 10:27:47 +01:00
a2065.h
amd8111e.c net: amd: Switch and case should be at the same indent 2022-09-16 10:27:47 +01:00
amd8111e.h net: amd: Correct spelling errors 2022-09-16 10:27:47 +01:00
ariadne.c net: amd: Unified the comparison between pointers and NULL to the same writing 2022-09-16 10:27:47 +01:00
ariadne.h
atarilance.c ethernet: atarilance: mark init function static 2023-08-11 18:24:02 -07:00
au1000_eth.c net: ethernet: amd: Convert to platform remove callback returning void 2023-09-20 09:06:37 +01:00
au1000_eth.h au1000_eth: stop using virt_to_bus() 2022-06-08 11:32:02 -07:00
declance.c amd: declance: use eth_hw_addr_set() 2022-01-25 09:00:53 -08:00
hplance.c
hplance.h
lance.c net: isa: include net/Space.h 2023-05-17 21:27:30 -07:00
mvme147.c amd: mvme147: use eth_hw_addr_set() 2021-11-19 11:05:20 +00:00
nmclan_cs.c net: amd: Fix link leak when verifying config failed 2023-04-25 09:41:18 +01:00
pcnet32.c net: amd: Unified the comparison between pointers and NULL to the same writing 2022-09-16 10:27:47 +01:00
sun3lance.c net: amd: Unified the comparison between pointers and NULL to the same writing 2022-09-16 10:27:47 +01:00
sunlance.c net: ethernet: amd: Convert to platform remove callback returning void 2023-09-20 09:06:37 +01:00