linux-stable/drivers/infiniband/hw/hfi1
Kaike Wan b2ff0d5101 IB/hfi1: Adjust flow PSN with the correct resync_psn
When a TID RDMA ACK to RESYNC request is received, the flow PSNs for
pending TID RDMA WRITE segments will be adjusted with the next flow
generation number, based on the resync_psn value extracted from the flow
PSN of the TID RDMA ACK packet. The resync_psn value indicates the last
flow PSN for which a TID RDMA WRITE DATA packet has been received by the
responder and the requester should resend TID RDMA WRITE DATA packets,
starting from the next flow PSN.

However, if resync_psn points to the last flow PSN for a segment and the
next segment flow PSN starts with a new generation number, use of the old
resync_psn to adjust the flow PSN for the next segment will lead to
miscalculation, resulting in WARN_ON and sge rewinding errors:

  WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
  Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common
   crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt]
  CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.el7.x86_64 #1
  Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018
  Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1]
  Call Trace:
   <IRQ>  [<ffffffff9e361dc1>] dump_stack+0x19/0x1b
   [<ffffffff9dc97648>] __warn+0xd8/0x100
   [<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20
   [<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1]
   [<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1]
   [<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1]
   [<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1]
   [<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1]
   [<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1]
   [<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0
   [<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80
   [<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60
   [<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150
   [<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0
   [<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0
   [<ffffffff9e36b362>] common_interrupt+0x162/0x162
   <EOI>  [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150
   [<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1]
   [<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1]
   [<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1]
   [<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1]
   [<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440
   [<ffffffff9dcbade6>] worker_thread+0x126/0x3c0
   [<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
   [<ffffffff9dcc1c31>] kthread+0xd1/0xe0
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40
   [<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21
   [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40

This patch fixes the issue by adjusting the resync_psn first if the flow
generation has been advanced for a pending segment.

Fixes: 9e93e967f7 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")
Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 16:48:01 -04:00
..
affinity.c sched/core: Provide a pointer to the valid CPU mask 2019-06-03 11:49:37 +02:00
affinity.h
aspm.c IB/hfi1: Reduce excessive aspm inlines 2019-06-28 22:34:26 -03:00
aspm.h IB/hfi1: Reduce excessive aspm inlines 2019-06-28 22:34:26 -03:00
chip.c IB/{rdmavt, hfi1, qib}: Add a counter for credit waits 2019-09-13 16:59:55 -03:00
chip.h IB/{rdmavt, hfi1, qib}: Add a counter for credit waits 2019-09-13 16:59:55 -03:00
chip_registers.h IB/hfi1: Add selected Rcv counters 2019-04-24 11:48:10 -03:00
common.h IB/hfi1: Remove reference to RHF.VCRCErr 2019-04-24 11:48:11 -03:00
debugfs.c IB/hfi1: No need to use try_module_get for debugfs 2019-06-28 22:34:26 -03:00
debugfs.h infiniband: hfi1: drop crazy DEBUGFS_SEQ_FILE_CREATE() macro 2019-01-24 09:22:29 -07:00
device.c
device.h
driver.c IB/hfi1: Remove reference to RHF.VCRCErr 2019-04-24 11:48:11 -03:00
efivar.c
efivar.h
eprom.c
eprom.h
exp_rcv.c IB/hfi1: Remove WARN_ON when freeing expected receive groups 2019-04-03 15:27:30 -03:00
exp_rcv.h
fault.c infiniband: hfi1: fix memory leaks 2019-08-20 13:44:45 -04:00
fault.h
file_ops.c RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv 2019-11-23 19:56:44 -04:00
firmware.c
hfi.h RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv 2019-11-23 19:56:44 -04:00
init.c IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA 2019-11-06 13:15:36 -04:00
intr.c
iowait.c IB/hfi1: Don't cancel unused work item 2020-01-03 16:41:51 -04:00
iowait.h IB/hfi1: Prioritize the sending of ACK packets 2019-02-05 18:07:44 -05:00
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
mad.c RDMA: Change MAD processing function to remove extra casting and parameter 2019-11-12 20:20:15 -04:00
mad.h
Makefile IB/hfi1: Reduce excessive aspm inlines 2019-06-28 22:34:26 -03:00
mmu_rb.c mm/mmu_notifier: use structure for invalidate_range_start/end callback 2018-12-28 12:11:50 -08:00
mmu_rb.h
msix.c
msix.h
opa_compat.h
opfn.c IB/hfi1: Add TID RDMA retry timer 2019-02-05 18:07:43 -05:00
opfn.h IB/hfi1: Make opfn.h self sufficient 2019-04-24 11:31:49 -03:00
pcie.c IB/hfi1: Ensure full Gen3 speed in a Gen4 system 2019-11-06 13:13:43 -04:00
pio.c Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
pio.h IB/hfi1: Reduce lock contention on iowait_lock for sdma and pio 2018-12-06 20:15:36 -07:00
pio_copy.c
platform.c IB/hfi1: remove redundant assignment to variable ret 2019-11-25 10:31:47 -04:00
platform.h
qp.c IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR details 2019-06-28 22:34:26 -03:00
qp.h IB/hfi1: Add the dual leg code 2019-02-05 18:07:44 -05:00
qsfp.c
qsfp.h
rc.c IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR 2019-11-06 13:15:36 -04:00
rc.h IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA 2019-04-03 15:27:30 -03:00
ruc.c IB/{rdmavt, hfi1): Miscellaneous comment fixes 2019-04-24 11:31:48 -03:00
sdma.c treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
sdma.h IB/hfi1: Reduce lock contention on iowait_lock for sdma and pio 2018-12-06 20:15:36 -07:00
sdma_txreq.h IB/hfi1: Prioritize the sending of ACK packets 2019-02-05 18:07:44 -05:00
sysfs.c RDMA: Introduce and use rdma_device_to_ibdev() 2019-01-14 13:12:03 -07:00
tid_rdma.c IB/hfi1: Adjust flow PSN with the correct resync_psn 2020-01-03 16:48:01 -04:00
tid_rdma.h IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA 2019-11-06 13:15:36 -04:00
trace.c IB/hfi1: Add static trace for TID RDMA WRITE protocol 2019-02-05 18:07:44 -05:00
trace.h IB/hfi1: Add static trace for OPFN 2019-01-31 11:37:40 -05:00
trace_ctxts.h
trace_dbg.h IB/hfi1: Fix two format strings 2019-03-28 11:03:49 -03:00
trace_ibhdrs.h IB/hfi1: Add missing INVALIDATE opcodes for trace 2019-06-28 22:34:26 -03:00
trace_iowait.h IB/hfi1: Add static trace for iowait 2018-09-30 19:21:12 -06:00
trace_misc.h
trace_mmu.h
trace_rc.h IB/hfi1: Add static trace for TID RDMA READ protocol 2019-02-05 17:53:56 -05:00
trace_rx.h IB/hfi1: Add static trace for OPFN 2019-01-31 11:37:40 -05:00
trace_tid.h IB/hfi1: Add traces for TID RDMA READ 2019-09-13 16:59:55 -03:00
trace_tx.h IB/hfi1: Add static trace for TID RDMA WRITE protocol 2019-02-05 18:07:44 -05:00
uc.c IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full 2019-06-28 22:34:26 -03:00
ud.c IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR details 2019-06-28 22:34:26 -03:00
user_exp_rcv.c RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv 2019-11-23 19:56:44 -04:00
user_exp_rcv.h RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv 2019-11-23 19:56:44 -04:00
user_pages.c mm/gup: add make_dirty arg to put_user_pages_dirty_lock() 2019-09-24 15:54:08 -07:00
user_sdma.c IB/hfi1: Close PSM sdma_progress sleep window 2019-06-11 17:06:45 -03:00
user_sdma.h IB/hfi1: Remove unused define 2019-07-22 16:10:48 -03:00
verbs.c IB/hfi1: Use a common pad buffer for 9B and 16B packets 2019-10-17 16:32:25 -04:00
verbs.h treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
verbs_txreq.c IB/hfi1: Silence txreq allocation warnings 2019-06-17 21:15:40 -04:00
verbs_txreq.h IB/hfi1: Silence txreq allocation warnings 2019-06-17 21:15:40 -04:00
vnic.h
vnic_main.c 5.2 Merge Window pull request 2019-05-09 09:02:46 -07:00
vnic_sdma.c net: Use skb_frag_off accessors 2019-07-30 14:21:32 -07:00