linux-stable/drivers
Vadim Pasternak 4b5e32df66 platform/mellanox: mlxreg-hotplug: add extra cycle for hotplug work queue
Add extra cycle for hotplug work queue to handle the case when a signal is
It adds missed logic for signal acknowledge, by adding an extra run for
received, but no specific signal assertion is detected. Such case
theoretically can happen for example in case several units are removed or
inserted at the same time. In this situation acknowledge for some signal
can be missed at signal top aggreagation status level. The extra run will
allow to handler to acknowledge the missed signal.

The interrupt handling flow performs the next steps:
(1)
Enter mlxreg_hotplug_work_handler due to signal assertion.
Aggregation status register is changed for example from 0xff to 0xfd
(event signal group related to bit 1).
(2)
Mask aggregation interrupts, read aggregation status register and save it
(0xfd) in aggr_cache, then traverse down to handle signal from groups
related to the changed bit.
(3)
Read and mask group related signal.
Acknowledge and unmask group related signal (acknowledge should clear
aggregation status register from 0xfd back to 0xff).
(4)
Re-schedule work queue for the immediate execution.
(5)
Enter mlxreg_hotplug_work_handler due to re-scheduling.
Aggregation status is changed from previous 0xfd to 0xff.
Go over steps (2) - (5) and in case no new signal assertion
is detected - unmask aggregation interrupts.

The possible race could happen in case new signal from the same group is
asserted after step (3) and prior step (5). In such case aggregation
status will change back from 0xff to 0xfd and the value read from the
aggregation status register will be the same as a value saved in
aggr_cache. As a result the handler will not traverse down and signal
will stay unhandled.

Example of faulty flow:
The signal routing flow is as following (f.e. for of FANi removing):
 - FAN status and event registers related bit is changed;
 -- intermediate aggregation status register is changed;
 --- top aggregation status register is changed;
 ---- interrupt routed to CPU and interrupt handler is invoked.

When interrupt handler is invoked it follows the next simple logic (f.e
FAN3 is removed):
 (a1) mask top aggregation interrupt mask register;
 (a2) read top aggregation interrupt status register and test to which
      underling group belongs a signal (FANs in this case and is changed
	  from 0xff to 0xfb and 0xfb is saved as a last status value);
   (b1) mask FANs interrupt mask register;
   (b2) read FANs status register and test which FAN has been changed
        FAN3 in this example);
     (c1) perform relevant action;
              <--------------- (FAN2 is removed at this point)
   (b3) clear FANs interrupt event register to acknowledge FAN3 signal;
   (b4) unmask FANs interrupt mask register
 (a3) unmask top aggregation interrupt mask register;

 An interrupt handler is invoked, since FAN2 interrupt is not acknowledge.
 It should set top aggregation interrupt status register bit 6 (0xfb).
 In step (a2)
 (a2) read top aggregation interrupt and comparing it with saved value
      does not show change (same 0xfb) and after (a2) execution jumps to
	  (a3) and signal leaved unhandled

The fix will enforce handler to traverse down in case the signal is
received, but signal assertion is not detected.

Fixes: 304887041d ("platform/x86: Introduce support for Mellanox hotplug driver")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-06-01 09:54:34 -07:00
..
accessibility
acpi xen: fixes for 4.17-rc1 2018-04-12 11:04:35 -07:00
amba
android
ata Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2018-04-03 17:42:25 -07:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
auxdisplay
base mm: check __highest_present_section_nr directly in memory_dev_init() 2018-04-11 10:28:31 -07:00
bcma
block for-linus-20180413 2018-04-13 15:15:15 -07:00
bluetooth Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME 2018-04-01 21:43:02 +03:00
bus pci-v4.17-changes 2018-04-06 18:31:06 -07:00
cdrom
char RTC for 4.17 2018-04-10 10:22:27 -07:00
clk The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
clocksource ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
connector
cpufreq cpufreq: Drop cpufreq_table_validate_and_show() 2018-04-10 08:40:45 +02:00
cpuidle cpuidle: menu: Avoid selecting shallow states with stopped tick 2018-04-09 11:54:57 +02:00
crypto .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore 2018-04-07 19:04:02 +09:00
dax libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
dca
devfreq
dio
dma DMAengine updates for v4.17-rc1 2018-04-10 12:14:37 -07:00
dma-buf
edac * Add NVDIMM support to EDAC (Tony Luck) 2018-04-05 14:21:13 -07:00
eisa
extcon Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
firewire
firmware Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging 2018-04-13 16:32:16 -07:00
fmc
fpga
fsi
gpio DeviceTree updates for 4.17: 2018-04-05 21:03:42 -07:00
gpu amdgpu, omap and snd regression fix 2018-04-12 20:56:10 -07:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2018-04-05 11:53:34 -07:00
hsi
hv ARM: 2018-04-09 11:42:31 -07:00
hwmon hwmon updates for v4.17 2018-04-09 19:59:54 -07:00
hwspinlock
hwtracing Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
i2c i2c: add param sanity check to i2c_transfer() 2018-04-11 23:33:46 +02:00
ide for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
idle
iio This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
infiniband Merge candidates for 4.17 merge window 2018-04-06 17:35:43 -07:00
input Changes to chrome-platform for v4.17 2018-04-13 16:20:36 -07:00
iommu IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
ipack
irqchip IOMMU Updates for Linux v4.17 2018-04-11 18:50:41 -07:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-04-03 14:04:18 -07:00
leds
lightnvm lightnvm: pblk: remove some unnecessary NULL checks 2018-03-29 17:29:09 -06:00
macintosh powerpc updates for 4.17 2018-04-07 12:08:19 -07:00
mailbox
mcb
md libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
media remoteproc updates for v4.17 2018-04-10 12:09:27 -07:00
memory
memstick
message
mfd platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angle 2018-04-10 22:25:07 -07:00
misc * Fix 2032 time access issues and new compiler warnings 2018-04-12 10:21:19 -07:00
mmc MMC core: 2018-04-12 10:59:03 -07:00
mtd This pull request contains updates for both UBI and UBIFS: 2018-04-11 16:39:34 -07:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-12 11:09:05 -07:00
nfc
ntb
nubus
nvdimm libnvdimm for 4.17 2018-04-10 10:25:57 -07:00
nvme nvme: expand nvmf_check_if_ready checks 2018-04-12 09:58:27 -06:00
nvmem Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
of Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
opp
oprofile oprofilefs: don't oops on allocation failure 2018-03-29 15:07:48 -04:00
parisc parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode 2018-03-27 18:52:22 +02:00
parport Char/Misc patches for 4.17-rc1 2018-04-04 20:07:20 -07:00
pci PCI: Remove messages about reassigning resources 2018-04-11 08:46:50 -05:00
pcmcia Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-04-09 09:26:36 -07:00
perf ARM: SoC driver updates for 4.17 2018-04-05 21:29:35 -07:00
phy ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
pinctrl This is the bulk of GPIO changes for the v4.17 kernel cycle: 2018-04-05 09:51:41 -07:00
platform platform/mellanox: mlxreg-hotplug: add extra cycle for hotplug work queue 2018-06-01 09:54:34 -07:00
pnp
power ARM: SoC platform updates for 4.17 2018-04-05 21:21:08 -07:00
powercap
pps
ps3
ptp
pwm pwm: Changes for v4.17-rc1 2018-04-13 15:46:21 -07:00
rapidio rapidio: use a reference count for struct mport_dma_req 2018-04-11 10:28:37 -07:00
ras
regulator Merge remote-tracking branches 'regulator/topic/88pg86x', 'regulator/topic/dt', 'regulator/topic/formatting' and 'regulator/topic/gpio' into regulator-next 2018-03-28 10:33:53 +08:00
remoteproc remoteproc: fix null pointer dereference on glink only platforms 2018-04-05 22:53:16 -07:00
reset
rpmsg rpmsg: smd: Use announce_create to process any receive work 2018-03-27 21:54:37 -07:00
rtc RTC for 4.17 2018-04-10 10:22:27 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2018-04-13 09:43:20 -07:00
sbus sparc64: Properly range check DAX completion index 2018-04-01 20:07:00 -04:00
scsi SCSI fixes on 20180415 2018-04-15 17:24:12 -07:00
sfi
sh
siox
slimbus
sn
soc The large diff this time around is from the addition of a new clk driver 2018-04-13 15:51:06 -07:00
soundwire
spi spi: SPI updates for v4.17 2018-04-03 12:06:21 -07:00
spmi
ssb
staging page cache: use xa_lock 2018-04-11 10:28:39 -07:00
target for-4.17/block-20180402 2018-04-05 14:27:02 -07:00
tc
tee
thermal Merge branches 'thermal-core' and 'thermal-soc' into next 2018-04-13 14:11:53 +08:00
thunderbolt
tty Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2018-04-09 09:04:10 -07:00
uio
usb Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2018-04-07 11:11:41 -07:00
uwb
vfio VFIO updates for v4.17-rc1 2018-04-06 19:44:27 -07:00
vhost vhost: return bool from *_access_ok() functions 2018-04-11 10:54:06 -04:00
video fbdev changes for v4.17: 2018-04-10 10:20:00 -07:00
virt
virtio virtio: feature 2018-04-11 18:58:27 -07:00
visorbus
vlynq
vme
w1
watchdog linux-watchdog 4.17-rc1 merge window tag 2018-04-13 15:43:50 -07:00
xen xen: fixes for 4.17-rc1 2018-04-12 11:04:35 -07:00
zorro
Kconfig hwtracing: Add HW tracing support menu 2018-03-29 13:38:10 +03:00
Makefile