linux-stable/drivers/clk
Stephen Boyd a29ec0465d clk: Get runtime PM before walking tree during disable_unused
[ Upstream commit e581cf5d21 ]

Doug reported [1] the following hung task:

 INFO: task swapper/0:1 blocked for more than 122 seconds.
       Not tainted 5.15.149-21875-gf795ebc40eb8 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00000008
 Call trace:
  __switch_to+0xf4/0x1f4
  __schedule+0x418/0xb80
  schedule+0x5c/0x10c
  rpm_resume+0xe0/0x52c
  rpm_resume+0x178/0x52c
  __pm_runtime_resume+0x58/0x98
  clk_pm_runtime_get+0x30/0xb0
  clk_disable_unused_subtree+0x58/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused_subtree+0x38/0x208
  clk_disable_unused+0x4c/0xe4
  do_one_initcall+0xcc/0x2d8
  do_initcall_level+0xa4/0x148
  do_initcalls+0x5c/0x9c
  do_basic_setup+0x24/0x30
  kernel_init_freeable+0xec/0x164
  kernel_init+0x28/0x120
  ret_from_fork+0x10/0x20
 INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
       Not tainted 5.15.149-21875-gf795ebc40eb8 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/u16:0   state:D stack:    0 pid:    9 ppid:     2 flags:0x00000008
 Workqueue: events_unbound deferred_probe_work_func
 Call trace:
  __switch_to+0xf4/0x1f4
  __schedule+0x418/0xb80
  schedule+0x5c/0x10c
  schedule_preempt_disabled+0x2c/0x48
  __mutex_lock+0x238/0x488
  __mutex_lock_slowpath+0x1c/0x28
  mutex_lock+0x50/0x74
  clk_prepare_lock+0x7c/0x9c
  clk_core_prepare_lock+0x20/0x44
  clk_prepare+0x24/0x30
  clk_bulk_prepare+0x40/0xb0
  mdss_runtime_resume+0x54/0x1c8
  pm_generic_runtime_resume+0x30/0x44
  __genpd_runtime_resume+0x68/0x7c
  genpd_runtime_resume+0x108/0x1f4
  __rpm_callback+0x84/0x144
  rpm_callback+0x30/0x88
  rpm_resume+0x1f4/0x52c
  rpm_resume+0x178/0x52c
  __pm_runtime_resume+0x58/0x98
  __device_attach+0xe0/0x170
  device_initial_probe+0x1c/0x28
  bus_probe_device+0x3c/0x9c
  device_add+0x644/0x814
  mipi_dsi_device_register_full+0xe4/0x170
  devm_mipi_dsi_device_register_full+0x28/0x70
  ti_sn_bridge_probe+0x1dc/0x2c0
  auxiliary_bus_probe+0x4c/0x94
  really_probe+0xcc/0x2c8
  __driver_probe_device+0xa8/0x130
  driver_probe_device+0x48/0x110
  __device_attach_driver+0xa4/0xcc
  bus_for_each_drv+0x8c/0xd8
  __device_attach+0xf8/0x170
  device_initial_probe+0x1c/0x28
  bus_probe_device+0x3c/0x9c
  deferred_probe_work_func+0x9c/0xd8
  process_one_work+0x148/0x518
  worker_thread+0x138/0x350
  kthread+0x138/0x1e0
  ret_from_fork+0x10/0x20

The first thread is walking the clk tree and calling
clk_pm_runtime_get() to power on devices required to read the clk
hardware via struct clk_ops::is_enabled(). This thread holds the clk
prepare_lock, and is trying to runtime PM resume a device, when it finds
that the device is in the process of resuming so the thread schedule()s
away waiting for the device to finish resuming before continuing. The
second thread is runtime PM resuming the same device, but the runtime
resume callback is calling clk_prepare(), trying to grab the
prepare_lock waiting on the first thread.

This is a classic ABBA deadlock. To properly fix the deadlock, we must
never runtime PM resume or suspend a device with the clk prepare_lock
held. Actually doing that is near impossible today because the global
prepare_lock would have to be dropped in the middle of the tree, the
device runtime PM resumed/suspended, and then the prepare_lock grabbed
again to ensure consistency of the clk tree topology. If anything
changes with the clk tree in the meantime, we've lost and will need to
start the operation all over again.

Luckily, most of the time we're simply incrementing or decrementing the
runtime PM count on an active device, so we don't have the chance to
schedule away with the prepare_lock held. Let's fix this immediate
problem that can be triggered more easily by simply booting on Qualcomm
sc7180.

Introduce a list of clk_core structures that have been registered, or
are in the process of being registered, that require runtime PM to
operate. Iterate this list and call clk_pm_runtime_get() on each of them
without holding the prepare_lock during clk_disable_unused(). This way
we can be certain that the runtime PM state of the devices will be
active and resumed so we can't schedule away while walking the clk tree
with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
the prepare_lock held to properly drop the runtime PM reference. We
remove the calls to clk_pm_runtime_{get,put}() in this path because
they're superfluous now that we know the devices are runtime resumed.

Reported-by: Douglas Anderson <dianders@chromium.org>
Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
Closes: https://issuetracker.google.com/328070191
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Fixes: 9a34b45397 ("clk: Add support for runtime PM")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20240325184204.745706-5-sboyd@kernel.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-27 17:05:26 +02:00
..
actions clk: actions: Terminate clk_div_table with sentinel element 2022-04-08 14:23:48 +02:00
analogbits
at91 clk: at91: clk-sam9x60-pll: fix return value check 2023-05-11 23:00:35 +09:00
axis
axs10x
baikal-t1 clk: baikal-t1: Add SATA internal ref clock buffer 2022-10-26 12:35:20 +02:00
bcm clk: bcm2835: Round UART input clock up 2022-10-26 12:35:52 +02:00
berlin clk: berlin: Add of_node_put() for of_get_parent() 2022-10-26 12:35:04 +02:00
davinci
h8300
hisilicon clk: hisilicon: hi3559a: Fix an erroneous devm_kfree() 2024-03-26 18:21:29 -04:00
imgtec
imx clk: imx8mp: add clkout1/2 support 2024-03-01 13:21:53 +01:00
ingenic clk: ingenic: jz4760: Update M/N/OD calculation algorithm 2023-02-14 19:18:03 +01:00
keystone clk: keystone: pll: fix a couple NULL vs IS_ERR() checks 2023-11-20 11:08:17 +01:00
loongson1 clk: loongson1: Terminate clk_div_table with sentinel element 2022-04-08 14:23:48 +02:00
mediatek clk: mediatek: clk-mt2701: Add check for mtk_alloc_clk_data 2023-11-20 11:08:18 +01:00
meson clk: meson: Add missing clocks to axg_clk_regmaps 2024-03-26 18:21:26 -04:00
microchip
mmp clk: mmp: pxa168: Fix memory leak in pxa168_clk_init() 2024-02-23 08:54:48 +01:00
mstar
mvebu clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths 2021-11-18 19:16:46 +01:00
mxs
nxp
pistachio clk: pistachio: Make it selectable for generic MIPS kernel 2021-08-12 16:01:49 +02:00
pxa
qcom clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays 2024-04-10 16:18:37 +02:00
ralink clk: ralink: avoid to set 'CLK_IS_CRITICAL' flag for gates 2021-08-28 22:24:06 -07:00
renesas clk: renesas: cpg-mssr: Remove superfluous check in resume code 2024-03-01 13:21:51 +01:00
rockchip clk: rockchip: rk3128: Fix HCLK_OTG gate register 2024-01-25 14:52:29 -08:00
samsung clk: samsung: Fix memory leak in _samsung_clk_register_pll() 2022-12-31 13:14:18 +01:00
sifive
socfpga clk: socfpga: Fix undefined behavior bug in struct stratix10_clock_data 2023-11-28 16:56:28 +00:00
spear
sprd clk: sprd: set max_register according to mapping range 2023-04-20 12:13:52 +02:00
st clk: st: Fix memory leak in st_of_quadfs_setup() 2022-12-31 13:14:43 +01:00
sunxi clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() 2022-05-09 09:14:37 +02:00
sunxi-ng clk: sunxi-ng: Modify mismatched function name 2023-09-19 12:22:40 +02:00
tegra clk: tegra: fix error return case for recalc_rate 2023-10-06 13:18:12 +02:00
ti clk: ti: fix double free in of_ti_divider_clk_setup() 2023-11-20 11:08:18 +01:00
uniphier clk: uniphier: Fix fixed-rate initialization 2022-04-08 14:22:50 +02:00
ux500 mfd: db8500-prcmu: Handle missing FW variant 2021-08-09 09:33:29 +01:00
versatile
x86 clk: mxl: syscon_node_to_regmap() returns error pointers 2023-02-25 12:06:43 +01:00
xilinx
zynq clk: zynq: Prevent null pointer dereference caused by kmalloc failure 2024-03-26 18:21:32 -04:00
zynqmp drivers: clk: zynqmp: update divider round rate logic 2024-01-25 14:52:44 -08:00
clk-asm9260.c clk: asm9260: use parent index to link the reference clock 2024-01-25 14:52:44 -08:00
clk-aspeed.c
clk-aspeed.h
clk-ast2600.c clk: ast2600: BCLK comes from EPLL 2022-10-26 12:35:20 +02:00
clk-axi-clkgen.c
clk-axm5516.c
clk-bd718x7.c
clk-bm1880.c clk: bm1880: remove kfrees on static allocations 2022-01-27 11:04:20 +01:00
clk-bulk.c
clk-cdce706.c
clk-cdce925.c clk: cdce925: check return value of kasprintf() 2023-07-23 13:47:09 +02:00
clk-clps711x.c clk: clps711x: Terminate clk_div_table with sentinel element 2022-04-08 14:23:48 +02:00
clk-composite.c clk: composite: Also consider .determine_rate for rate + mux composites 2021-10-18 12:59:42 -07:00
clk-conf.c clk: add missing of_node_put() in "assigned-clocks" property parsing 2023-05-11 23:00:36 +09:00
clk-cs2000-cp.c
clk-devres.c clk: Fix slab-out-of-bounds error in devm_clk_release() 2023-08-30 16:18:16 +02:00
clk-divider.c clk: divider: Implement and wire up .determine_rate by default 2021-08-05 17:35:58 -07:00
clk-fixed-factor.c
clk-fixed-mmio.c
clk-fixed-rate.c clk: fixed-rate: add devm_clk_hw_register_fixed_rate 2024-01-25 14:52:44 -08:00
clk-fractional-divider.c clk: fractional-divider: Document the arithmetics used behind the code 2021-08-12 12:42:00 -07:00
clk-fractional-divider.h clk: fractional-divider: Hide clk_fractional_divider_ops from wide audience 2021-08-12 12:42:00 -07:00
clk-fsl-flexspi.c
clk-fsl-sai.c
clk-gate.c
clk-gemini.c
clk-gpio.c
clk-hi655x.c
clk-highbank.c
clk-hsdk-pll.c
clk-k210.c
clk-lmk04832.c
clk-lochnagar.c
clk-max9485.c
clk-max77686.c
clk-milbeaut.c
clk-moxart.c
clk-multiplier.c
clk-mux.c
clk-nomadik.c
clk-npcm7xx.c clk: npcm7xx: Fix incorrect kfree 2023-11-20 11:08:18 +01:00
clk-nspire.c
clk-oxnas.c clk: oxnas: Hold reference returned by of_get_parent() 2022-10-26 12:35:03 +02:00
clk-palmas.c clk: palmas: Add a missing SPDX license header 2021-08-05 17:34:30 -07:00
clk-plldig.c
clk-pwm.c
clk-qoriq.c clk: qoriq: Hold reference returned by of_get_parent() 2022-10-26 12:35:04 +02:00
clk-rk808.c
clk-s2mps11.c
clk-scmi.c clk: scmi: Free scmi_clk allocated when the clocks with invalid info are skipped 2023-11-20 11:08:21 +01:00
clk-scpi.c
clk-si514.c
clk-si544.c
clk-si570.c
clk-si5341.c clk: si5341: fix an error code problem in si5341_output_clk_set_rate 2024-01-25 14:52:44 -08:00
clk-si5351.c
clk-si5351.h
clk-sparx5.c
clk-stm32f4.c clk: stm32: Fix ltdc's clock turn off by clk_disable_unused() after system enter shell 2022-01-27 11:04:12 +01:00
clk-stm32h7.c clk: stm32h7: Switch to clk_divider.determine_rate 2021-08-05 17:36:10 -07:00
clk-stm32mp1.c clk: stm32mp1: Switch to clk_divider.determine_rate 2021-08-05 17:36:10 -07:00
clk-twl6040.c
clk-versaclock5.c clk: vc5: check memory returned by kasprintf() 2023-07-23 13:47:09 +02:00
clk-vt8500.c
clk-wm831x.c
clk-xgene.c
clk.c clk: Get runtime PM before walking tree during disable_unused 2024-04-27 17:05:26 +02:00
clk.h
clkdev.c
Kconfig clk: fixed-mmio: make COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM 2023-09-19 12:22:28 +02:00
Makefile clk: pistachio: Make it selectable for generic MIPS kernel 2021-08-12 16:01:49 +02:00