linux-stable/drivers/nvme
Chunguang Xu ff2f90f88d nvme: fix reconnection fail due to reserved tag allocation
[ Upstream commit de105068fe ]

We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I
think we should keep two reserved tags for admin queue.

Fixes: ed01fee283 ("nvme-fabrics: only reserve a single tag")
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-26 18:20:09 -04:00
..
common treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
host nvme: fix reconnection fail due to reserved tag allocation 2024-03-26 18:20:09 -04:00
target nvmet-fc: take ref count on tgtport before delete assoc 2024-03-01 13:34:52 +01:00
Kconfig nvme: implement In-Band authentication 2022-08-02 17:14:49 -06:00
Makefile nvme: implement In-Band authentication 2022-08-02 17:14:49 -06:00