linux-stable/drivers
Shay Drory 9c2d080109 net/mlx5: Free irqs only on shutdown callback
Whenever a shutdown is invoked, free irqs only and keep mlx5_irq
synthetic wrapper intact in order to avoid use-after-free on
system shutdown.

for example:
==================================================================
BUG: KASAN: use-after-free in _find_first_bit+0x66/0x80
Read of size 8 at addr ffff88823fc0d318 by task kworker/u192:0/13608

CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
Call Trace:
  <TASK>
  dump_stack_lvl+0x34/0x48
  print_report+0x170/0x473
  ? _find_first_bit+0x66/0x80
  kasan_report+0xad/0x130
  ? _find_first_bit+0x66/0x80
  _find_first_bit+0x66/0x80
  mlx5e_open_channels+0x3c5/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

Freed by task 1:
  kasan_save_stack+0x23/0x50
  kasan_set_track+0x21/0x30
  kasan_save_free_info+0x2a/0x40
  ____kasan_slab_free+0x169/0x1d0
  slab_free_freelist_hook+0xd2/0x190
  __kmem_cache_free+0x1a1/0x2f0
  irq_pool_free+0x138/0x200 [mlx5_core]
  mlx5_irq_table_destroy+0xf6/0x170 [mlx5_core]
  mlx5_core_eq_free_irqs+0x74/0xf0 [mlx5_core]
  shutdown+0x194/0x1aa [mlx5_core]
  pci_device_shutdown+0x75/0x120
  device_shutdown+0x35c/0x620
  kernel_restart+0x60/0xa0
  __do_sys_reboot+0x1cb/0x2c0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x4b/0xb5

The buggy address belongs to the object at ffff88823fc0d300
  which belongs to the cache kmalloc-192 of size 192
The buggy address is located 24 bytes inside of
  192-byte region [ffff88823fc0d300, ffff88823fc0d3c0)

The buggy address belongs to the physical page:
page:0000000010139587 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x23fc0c
head:0000000010139587 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004ca00
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88823fc0d200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88823fc0d280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 >ffff88823fc0d300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff88823fc0d380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
  ffff88823fc0d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
general protection fault, probably for non-canonical address
0xdffffc005c40d7ac: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: probably user-memory-access in range [0x00000002e206bd60-0x00000002e206bd67]
CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G    B   W  O  6.1.21-cloudflare-kasan-2023.3.21 #1
Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
RIP: 0010:__alloc_pages+0x141/0x5c0
Call Trace:
  <TASK>
  ? sysvec_apic_timer_interrupt+0xa0/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? __alloc_pages_slowpath.constprop.0+0x1ec0/0x1ec0
  ? _raw_spin_unlock_irqrestore+0x3d/0x80
  __kmalloc_large_node+0x80/0x120
  ? kvmalloc_node+0x4e/0x170
  __kmalloc_node+0xd4/0x150
  kvmalloc_node+0x4e/0x170
  mlx5e_open_channels+0x631/0x3a10 [mlx5_core]
  ? console_unlock+0x2fa/0x430
  ? _raw_spin_lock_irqsave+0x8d/0xf0
  ? _raw_spin_unlock_irqrestore+0x42/0x80
  ? preempt_count_add+0x7d/0x150
  ? __wake_up_klogd.part.0+0x7d/0xc0
  ? vprintk_emit+0xfe/0x2c0
  ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core]
  ? dev_attr_show.cold+0x35/0x35
  ? devlink_health_do_dump.part.0+0x174/0x340
  ? devlink_health_report+0x504/0x810
  ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  ? process_one_work+0x680/0x1050
  mlx5e_safe_switch_params+0x156/0x220 [mlx5_core]
  ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core]
  ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core]
  mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core]
  ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0
  devlink_health_reporter_recover+0xa6/0x1f0
  devlink_health_report+0x2f7/0x810
  ? vsnprintf+0x854/0x15e0
  mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core]
  ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core]
  ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core]
  ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core]
  ? newidle_balance+0x9b7/0xe30
  ? psi_group_change+0x6a7/0xb80
  ? mutex_lock+0x96/0xf0
  ? __mutex_lock_slowpath+0x10/0x10
  mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core]
  process_one_work+0x680/0x1050
  worker_thread+0x5a0/0xeb0
  ? process_one_work+0x1050/0x1050
  kthread+0x2a2/0x340
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>
---[ end trace 0000000000000000  ]---
RIP: 0010:__alloc_pages+0x141/0x5c0
Code: e0 39 a3 96 89 e9 b8 22 01 32 01 83 e1 0f 48 89 fa 01 c9 48 c1 ea
03 d3 f8 83 e0 03 89 44 24 6c 48 b8 00 00 00 00 00 fc ff df <80> 3c 02
00 0f 85 fc 03 00 00 89 e8 4a 8b 14 f5 e0 39 a3 96 4c 89
RSP: 0018:ffff888251f0f438 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 1ffff1104a3e1e8b RCX: 0000000000000000
RDX: 000000005c40d7ac RSI: 0000000000000003 RDI: 00000002e206bd60
RBP: 0000000000052dc0 R08: ffff8882b0044218 R09: ffff8882b0045e8a
R10: fffffbfff300fefc R11: ffff888167af4000 R12: 0000000000000003
R13: 0000000000000000 R14: 00000000696c7070 R15: ffff8882373f4380
FS:  0000000000000000(0000) GS:ffff88bf2be80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005641d031eee8 CR3: 0000002e7ca14000 CR4: 0000000000350ee0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception  ]---]

Reported-by: Frederick Lawler <fred@cloudflare.com>
Link: https://lore.kernel.org/netdev/be5b9271-7507-19c5-ded1-fa78f1980e69@cloudflare.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22 22:38:06 -07:00
..
accel Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
accessibility
acpi ACPI: video: Remove acpi_backlight=video quirk for Lenovo ThinkPad W530 2023-05-04 20:23:41 +02:00
amba
android
ata Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
atm
auxdisplay
base - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
bcma
block ublk: fix command op code check 2023-05-12 09:09:06 -06:00
bluetooth Bluetooth: btnxpuart: Fix compiler warnings 2023-05-19 15:38:29 -07:00
bus modules-6.4-rc1 2023-04-27 16:36:55 -07:00
cdrom
cdx cdx: fix build failure due to sysfs 'bus_type' argument needing to be const 2023-04-27 16:21:32 -07:00
char tpm/tpm_tis: Disable interrupts for more Lenovo devices 2023-05-16 02:48:23 +03:00
clk A couple more patches that would be good to get into -rc1. 2023-05-07 10:31:45 -07:00
clocksource Timekeeping and clocksource/event driver updates the second batch: 2023-04-29 10:24:30 -07:00
comedi
connector
counter - New Drivers 2023-05-02 10:41:31 -07:00
cpufreq Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
cpuidle RISC-V: Align SBI probe implementation with spec 2023-04-29 13:04:50 -07:00
crypto This push fixes the following problems: 2023-05-07 10:57:14 -07:00
cxl cxl: Add missing return to cdat read error path 2023-05-13 00:20:06 -07:00
dax
dca Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
devfreq Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
dio
dma dmaengine updates for v6.4 2023-05-03 11:11:56 -07:00
dma-buf - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
edac Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
eisa
extcon
firewire firewire: net: fix unexpected release of object for asynchronous request packet 2023-05-11 09:06:49 +09:00
firmware Merge tag 'drm-misc-fixes-2023-05-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes 2023-05-12 05:32:36 +10:00
fpga Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
fsi
gnss
gpio hte: Changes for v6.4-rc1 2023-05-03 11:00:27 -07:00
gpu Merge tag 'amd-drm-fixes-6.4-2023-05-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes 2023-05-12 06:46:34 +10:00
greybus
hid Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
hsi
hte Devicetree updates for v6.4, part 2: 2023-04-27 10:09:05 -07:00
hv hyperv-next for v6.4 2023-04-27 17:17:12 -07:00
hwmon hwmon: (k10temp) Add PCI ID for family 19, model 78h 2023-05-08 11:36:19 +02:00
hwspinlock
hwtracing coresight: Updates for v6.4 2023-04-19 15:08:11 +02:00
i2c i2c: gxp: fix build failure without CONFIG_I2C_SLAVE 2023-05-03 17:27:29 +02:00
i3c i3c: ast2600: set variable ast2600_i3c_ops storage-class-specifier to static 2023-04-30 23:50:26 +02:00
idle intel_idle: mark few variables as __read_mostly 2023-04-27 19:37:36 +02:00
iio Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
infiniband v6.4 merge window RDMA pull request 2023-04-29 17:21:24 -07:00
input Input updates for 6.4 merge window: 2023-05-01 17:18:56 -07:00
interconnect modules-6.4-rc1 2023-04-27 16:36:55 -07:00
iommu IOMMU Updates for Linux 6.4 2023-04-30 13:00:38 -07:00
ipack
irqchip - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
isdn Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
leds - New Drivers 2023-05-02 10:36:02 -07:00
macintosh powerpc updates for 6.4 2023-04-28 16:24:32 -07:00
mailbox - mailbox api: allow direct registration to a channel 2023-05-07 10:17:33 -07:00
mcb mcb-lpc: Reallocate memory region to avoid memory overlapping 2023-04-20 14:24:01 +02:00
md for-6.4/block-2023-05-06 2023-05-06 08:28:58 -07:00
media media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221 2023-05-14 16:04:48 +01:00
memory ARM: SoC drivers for v6.4 2023-04-25 12:02:16 -07:00
memstick
message Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
mfd - New Drivers 2023-05-02 10:41:31 -07:00
misc Objtool changes for v6.4: 2023-04-28 14:02:54 -07:00
mmc TTY/Serial changes for 6.4-rc1 2023-04-27 11:46:26 -07:00
most
mtd This pull request contains updates for UBI and UBIFS 2023-05-03 18:58:59 -07:00
mux
net net/mlx5: Free irqs only on shutdown callback 2023-05-22 22:38:06 -07:00
nfc drivers: nfc: nfcsim: remove return value check of `dev_dir` 2023-04-24 18:12:42 -07:00
ntb
nubus
nvdimm
nvme for-6.4/io_uring-2023-05-07 2023-05-07 10:00:09 -07:00
nvmem modules-6.4-rc1 2023-04-27 16:36:55 -07:00
of Devicetree fixes for 6.4, part 1: 2023-05-05 13:27:59 -07:00
opp Devicetree updates for v6.4, part 2: 2023-04-27 10:09:05 -07:00
parisc parisc: Replace regular spinlock with spin_trylock on panic path 2023-05-03 17:43:26 +02:00
parport
pci cxl for v6.4 2023-04-30 11:51:51 -07:00
pcmcia
peci
perf RISC-V: Align SBI probe implementation with spec 2023-04-29 13:04:50 -07:00
phy phy fixes for 6.4 2023-05-05 11:57:29 -07:00
pinctrl Pin control bulk changes for the v6.4 kernel: 2023-05-02 15:40:41 -07:00
platform platform/mellanox: fix potential race in mlxbf-tmfifo driver 2023-05-09 11:54:35 +02:00
pnp
power power supply and reset changes for the v6.4 series 2023-04-29 17:37:02 -07:00
powercap
pps
ps3
ptp Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
pwm pwm: Changes for v6.4-rc1 2023-05-03 11:25:01 -07:00
rapidio Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
ras
regulator modules-6.4-rc1 2023-04-27 16:36:55 -07:00
remoteproc Mainly singleton patches all over the place. Series of note are: 2023-04-27 19:57:00 -07:00
reset Nothing looks out of the ordinary in this batch of clk driver updates. There 2023-04-29 17:29:39 -07:00
rpmsg Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
rtc - New Drivers 2023-05-02 10:41:31 -07:00
s390 s390 updates for the 6.4 merge window 2023-04-30 11:43:31 -07:00
sbus Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
scsi SCSI misc on 20230506 2023-05-06 08:37:28 -07:00
sh
siox
slimbus
soc modules-6.4-rc1 2023-04-27 16:36:55 -07:00
soundwire
spi Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
spmi spmi: Add a check for remove callback when removing a SPMI driver 2023-04-20 14:16:39 +02:00
ssb
staging modules-6.4-rc1 2023-04-27 16:36:55 -07:00
target
tc
tee Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
thermal thermal: intel: powerclamp: Fix NULL pointer access issue 2023-05-04 20:30:18 +02:00
thunderbolt thunderbolt: Changes for v6.4 merge window 2023-04-19 11:42:44 +02:00
tty Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
ufs scsi: ufs: core: Fix I/O hang that occurs when BKOPS fails in W-LUN suspend 2023-05-08 07:15:05 -04:00
uio
usb Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
vdpa virtio,vhost,vdpa: features, fixes, cleanups 2023-04-27 17:05:34 -07:00
vfio VFIO updates for v6.4-rc1 2023-05-02 11:56:43 -07:00
vhost Scheduler changes for v6.4: 2023-04-28 14:53:30 -07:00
video fbdev: stifb: Fix info entry in sti_struct on error path 2023-05-12 11:50:33 +02:00
virt Devicetree updates for v6.4, part 2: 2023-04-27 10:09:05 -07:00
virtio - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of 2023-04-27 19:42:02 -07:00
vlynq
w1 Char/Misc drivers for 6.4-rc1 2023-04-27 12:07:50 -07:00
watchdog linux-watchdog 6.4-rc1 tag 2023-05-04 18:33:56 -07:00
xen xen: branch for v6.4-rc1 2023-04-27 17:27:06 -07:00
zorro
Kconfig
Makefile