linux-stable/drivers
Yufeng Mo f710323dcd bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
[ Upstream commit 220ade7745 ]

Some time ago, I reported a calltrace issue
"did not find a suitable aggregator", please see[1].
After a period of analysis and reproduction, I find
that this problem is caused by concurrency.

Before the problem occurs, the bond structure is like follows:

bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                      \
                        port0
      \
        slaver1(eth1) - agg1.lag_ports -> NULL
                      \
                        port1

If we run 'ifenslave bond0 -d eth1', the process is like below:

excuting __bond_release_one()
|
bond_upper_dev_unlink()[step1]
|                       |                       |
|                       |                       bond_3ad_lacpdu_recv()
|                       |                       ->bond_3ad_rx_indication()
|                       |                       spin_lock_bh()
|                       |                       ->ad_rx_machine()
|                       |                       ->__record_pdu()[step2]
|                       |                       spin_unlock_bh()
|                       |                       |
|                       bond_3ad_state_machine_handler()
|                       spin_lock_bh()
|                       ->ad_port_selection_logic()
|                       ->try to find free aggregator[step3]
|                       ->try to find suitable aggregator[step4]
|                       ->did not find a suitable aggregator[step5]
|                       spin_unlock_bh()
|                       |
|                       |
bond_3ad_unbind_slave() |
spin_lock_bh()
spin_unlock_bh()

step1: already removed slaver1(eth1) from list, but port1 remains
step2: receive a lacpdu and update port0
step3: port0 will be removed from agg0.lag_ports. The struct is
       "agg0.lag_ports -> port1" now, and agg0 is not free. At the
	   same time, slaver1/agg1 has been removed from the list by step1.
	   So we can't find a free aggregator now.
step4: can't find suitable aggregator because of step2
step5: cause a calltrace since port->aggregator is NULL

To solve this concurrency problem, put bond_upper_dev_unlink()
after bond_3ad_unbind_slave(). In this way, we can invalid the port
first and skip this port in bond_3ad_state_machine_handler(). This
eliminates the situation that the slaver has been removed from the
list but the port is still valid.

[1]https://lore.kernel.org/netdev/10374.1611947473@famine/

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-18 13:40:24 +02:00
..
accessibility
acpi ACPI: NFIT: Fix support for virtual SPA ranges 2021-08-18 08:59:07 +02:00
amba
android
ata ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init() 2021-09-18 13:40:23 +02:00
atm atm: nicstar: register the interrupt handler in the right place 2021-07-19 09:44:52 +02:00
auxdisplay
base driver core: Fix error return code in really_probe() 2021-09-15 09:50:33 +02:00
bcma bcma: Fix memory leak for internally-handled cores 2021-09-15 09:50:45 +02:00
block Revert "block: nbd: add sanity check for first_minor" 2021-09-16 12:51:23 +02:00
bluetooth Bluetooth: btusb: check conditions before enabling USB ALT 3 for WBS 2021-09-03 10:09:28 +02:00
bus bus: fsl-mc: fix mmio base address for child DPRCs 2021-09-18 13:40:20 +02:00
cdrom
char tpm: ibmvtpm: Avoid error message when process gets signal while waiting 2021-09-15 09:50:30 +02:00
clk clk: at91: clk-generated: Limit the requested rate to our range 2021-09-18 13:40:16 +02:00
clocksource clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel 2021-09-15 09:50:29 +02:00
connector
counter counter: 104-quad-8: Return error when invalid mode during ceiling_write 2021-09-15 09:50:38 +02:00
cpufreq cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev 2021-09-03 10:09:26 +02:00
cpuidle cpuidle: pseries: Mark pseries_idle_proble() as __init 2021-09-18 13:40:12 +02:00
crypto crypto: mxs-dcp - Use sg_mapping_iter to copy data 2021-09-18 13:40:17 +02:00
dax
dca
devfreq PM / devfreq: Add missing error code in devfreq_add_device() 2021-07-14 16:56:11 +02:00
dio
dma dmaengine: imx-sdma: remove duplicated sdma_load_context 2021-09-18 13:40:09 +02:00
dma-buf dma-buf/sync_file: Don't leak fences on merge failure 2021-07-25 14:36:20 +02:00
edac EDAC/i10nm: Fix NVDIMM detection 2021-09-15 09:50:30 +02:00
eisa
extcon extcon: intel-mrfld: Sync hardware and software state on init 2021-07-19 09:45:00 +02:00
firewire
firmware firmware: raspberrypi: Fix a leak in 'rpi_firmware_get()' 2021-09-15 09:50:41 +02:00
fpga fpga: dfl: fme: Fix cpu hotplug issue in performance reporting 2021-08-12 13:22:15 +02:00
fsi fsi: Add missing MODULE_DEVICE_TABLE 2021-07-20 16:05:42 +02:00
gnss
gpio Revert "gpio: mpc8xxx: change the gpio interrupt flags." 2021-08-12 13:22:16 +02:00
gpu drm/bridge: nwl-dsi: Avoid potential multiplication overflow on 32-bit 2021-09-18 13:40:21 +02:00
greybus
hid HID: i2c-hid: Fix Elan touchpad regression 2021-09-18 13:40:15 +02:00
hsi
hv drivers: hv: Fix missing error code in vmbus_connect() 2021-07-14 16:55:59 +02:00
hwmon hwmon: (max31790) Fix fan speed reporting for fan7..12 2021-07-14 16:56:08 +02:00
hwspinlock
hwtracing intel_th: Wait until port is in reset before programming it 2021-07-20 16:05:46 +02:00
i2c i2c: xlp9xx: fix main IRQ check 2021-09-15 09:50:44 +02:00
i3c
ide
idle
iio iio: dac: ad5624r: Fix incorrect handling of an optional regulator. 2021-09-18 13:40:18 +02:00
infiniband RDMA/hns: Fix QP's resp incomplete assignment 2021-09-18 13:40:15 +02:00
input Input: hideep - fix the uninitialized use in hideep_nvm_unlock() 2021-07-20 16:05:44 +02:00
interconnect interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes 2021-08-12 13:22:18 +02:00
iommu iommu/vt-d: Update the virtual command related registers 2021-09-18 13:40:14 +02:00
ipack ipack: tpci200: fix memory leak in the tpci200_register 2021-08-26 08:35:55 -04:00
irqchip irqchip/gic-v3: Fix priority comparison when non-secure priorities are used 2021-09-15 09:50:29 +02:00
isdn mISDN: fix possible use-after-free in HFC_cleanup() 2021-07-19 09:44:38 +02:00
leds leds: trigger: audio: Add an activate callback to ensure the initial brightness is set 2021-09-15 09:50:36 +02:00
lightnvm
macintosh
mailbox soc: mediatek: cmdq: add address shift in jump 2021-09-18 13:40:16 +02:00
mcb
md dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() 2021-09-18 13:40:08 +02:00
media media: platform: stm32: unprepare clocks at handling errors in probe 2021-09-18 13:40:20 +02:00
memory memory: tegra: Fix compilation warnings on 64bit platforms 2021-07-25 14:36:14 +02:00
memstick memstick: rtsx_usb_ms: fix UAF 2021-07-14 16:55:53 +02:00
message
mfd mfd: cpcap: Fix cpcap dmamask not set warnings 2021-07-20 16:05:42 +02:00
misc VMCI: fix NULL pointer dereference when unmapping queue pair 2021-09-18 13:40:09 +02:00
mmc mmc: moxart: Fix issue with uninitialized dma_slave_config 2021-09-15 09:50:43 +02:00
most
mtd mtd: spinand: Fix incorrect parameters for on-die ECC 2021-09-03 10:09:28 +02:00
mux
net bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() 2021-09-18 13:40:24 +02:00
nfc nfc: nfcsim: fix use after free during module unload 2021-08-04 12:46:41 +02:00
ntb
nubus
nvdimm libnvdimm/region: Fix label activation vs errors 2021-08-18 08:59:07 +02:00
nvme nvmet: pass back cntlid on successful completion 2021-09-15 09:50:25 +02:00
nvmem nvmem: core: add a missing of_node_put 2021-07-19 09:45:00 +02:00
of of: Fix truncation of memory sizes on 32-bit platforms 2021-07-14 16:56:46 +02:00
opp opp: remove WARN when no valid OPPs remain 2021-09-03 10:09:26 +02:00
oprofile
parisc
parport
pci PCI: Use pci_update_current_state() in pci_enable_device_flags() 2021-09-18 13:40:17 +02:00
pcmcia pcmcia: i82092: fix a null pointer dereference bug 2021-08-12 13:22:16 +02:00
perf perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number 2021-07-14 16:56:08 +02:00
phy phy: intel: Fix for warnings due to EMMC clock 175Mhz change in FIP 2021-07-20 16:05:46 +02:00
pinctrl pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry() 2021-09-18 13:40:14 +02:00
platform platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call 2021-09-18 13:40:15 +02:00
pnp
power power: supply: max17042: handle fails of reading status register 2021-09-18 13:40:08 +02:00
powercap
pps
ps3
ptp ptp_pch: Restore dependency on PCI 2021-08-26 08:35:46 -04:00
pwm pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped 2021-07-28 14:35:34 +02:00
rapidio
ras
regulator regulator: vctrl: Avoid lockdep warning in enable/disable ops 2021-09-15 09:50:30 +02:00
remoteproc remoteproc: k3-r5: Fix an error message 2021-07-20 16:05:50 +02:00
reset reset: reset-zynqmp: Fixed the argument data type 2021-09-08 08:49:00 +02:00
rpmsg
rtc rtc: tps65910: Correct driver module alias 2021-09-18 13:40:05 +02:00
s390 s390/qdio: cancel the ESTABLISH ccw after timeout 2021-09-18 13:40:09 +02:00
sbus
scsi scsi: ufs: ufs-exynos: Fix static checker warning 2021-09-18 13:40:15 +02:00
sfi
sh
siox
slimbus slimbus: ngd: reset dma setup during runtime pm 2021-08-26 08:35:55 -04:00
soc soc: aspeed: p2a-ctrl: Fix boundary check for mmap 2021-09-18 13:40:08 +02:00
soundwire soundwire: stream: Fix test for DP prepare complete 2021-07-14 16:56:47 +02:00
spi spi: spi-zynq-qspi: use wait_for_completion_timeout to make zynq_qspi_exec_mem_op not interruptible 2021-09-15 09:50:30 +02:00
spmi
ssb ssb: Fix error return code in ssb_bus_scan() 2021-07-14 16:56:21 +02:00
staging staging: ks7010: Fix the initialization of the 'sleep_status' structure 2021-09-18 13:40:23 +02:00
target scsi: target: Fix protect handling in WRITE SAME(32) 2021-07-28 14:35:39 +02:00
tc
tee tee: Correct inappropriate usage of TEE_SHM_DMA_BUF flag 2021-08-15 14:00:24 +02:00
thermal thermal/core/thermal_of: Stop zone device before unregistering it 2021-07-25 14:36:17 +02:00
thunderbolt thunderbolt: Bond lanes only when dual_link_port != NULL in alloc_dev_default() 2021-07-14 16:56:44 +02:00
tty serial: 8250_pci: make setup_port() parameters explicitly unsigned 2021-09-18 13:40:23 +02:00
uio
usb usb: gadget: composite: Allow bMaxPower=0 if self-powered 2021-09-18 13:40:20 +02:00
vdpa vdpa/mlx5: Avoid destroying MR on empty iotlb 2021-08-26 08:35:42 -04:00
vfio vfio: Use config not menuconfig for VFIO_NOIOMMU 2021-09-18 13:40:12 +02:00
vhost vringh: Use wiov->used to check for read/write desc order 2021-09-03 10:09:27 +02:00
video video: fbdev: riva: Error out if 'pixclock' equals zero 2021-09-18 13:40:22 +02:00
virt
virtio virtio_vdpa: reject invalid vq indices 2021-09-03 10:09:27 +02:00
visorbus visorbus: fix error return code in visorchipset_init() 2021-07-14 16:56:41 +02:00
vlynq
vme
w1 w1: ds2438: fixing bug that would always get page0 2021-07-20 16:05:39 +02:00
watchdog Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout" 2021-08-08 09:05:24 +02:00
xen xen/events: Fix race in set_evtchn_to_irq 2021-08-18 08:59:14 +02:00
zorro
Kconfig
Makefile