Commit graph

67996 commits

Author SHA1 Message Date
Eric Dumazet
d95d6320ba ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e24 ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 09:48:24 -08:00
Eric Dumazet
9fcf986cc4 ipv4: fix data races in fib_alias_hw_flags_set
fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.

We need to annotate accesses to following fields of struct fib_alias:

    offload, trap, offload_failed

Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.

BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set

read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
 fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
 fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
 nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
 nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
 nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
 nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 process_scheduled_works kernel/workqueue.c:2370 [inline]
 worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

value changed: 0x00 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 90b93f1b31 ("ipv4: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 09:48:24 -08:00
Mans Rullgard
017b355bbd net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
where the VLAN tag has already been consumed by the master.

Fixes: a1292595e0 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 09:32:06 -08:00
Seth Forshee
b9208492fc vsock: remove vsock from connected table when connect is interrupted by a signal
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.

Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.

Note for backporting: this patch requires d5afa82c97 ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 08:56:02 -08:00
Xin Long
35a79e64de ping: fix the dif and sdif check in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:

   # sysctl -w net.ipv4.ping_group_range="0 100"

There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:

  # ip link add dummy0 type dummy
  # ip link set dummy0 up
  # ip addr add 192.168.111.1/24 dev dummy0
  # ping -I dummy0 192.168.111.1 -c1

The issue was also reported on:

  https://github.com/iputils/iputils/issues/104

But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.

This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17 15:01:20 +00:00
Eric Dumazet
5740d06890 net: sched: limit TC_ACT_REPEAT loops
We have been living dangerously, at the mercy of malicious users,
abusing TC_ACT_REPEAT, as shown by this syzpot report [1].

Add an arbitrary limit (32) to the number of times an action can
return TC_ACT_REPEAT.

v2: switch the limit to 32 instead of 10.
    Use net_warn_ratelimited() instead of pr_err_once().

[1] (C repro available on demand)

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
        (t=10502 jiffies g=5305 q=190)
rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu:    Possible timer handling issue on cpu=0 timer-softirq=3527
rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:29344 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4986 [inline]
 __schedule+0xab2/0x4db0 kernel/sched/core.c:6295
 schedule+0xd2/0x260 kernel/sched/core.c:6368
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e
RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e
R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80
FS:  00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
 queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
 queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
 do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
 spin_lock_bh include/linux/spinlock.h:354 [inline]
 sch_tree_lock include/net/sch_generic.h:610 [inline]
 sch_tree_lock include/net/sch_generic.h:605 [inline]
 prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
 prio_init+0x5c/0x80 net/sched/sch_prio.c:244
 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ee98aae99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0
R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
NMI backtrace for cpu 1
CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
 rcu_pending kernel/rcu/tree.c:3919 [inline]
 rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246
RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0
R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8
 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
 tcf_action_exec net/sched/act_api.c:1049 [inline]
 tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
 tcf_exts_exec include/net/pkt_cls.h:326 [inline]
 route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
 __tcf_classify net/sched/cls_api.c:1549 [inline]
 tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
 prio_classify net/sched/sch_prio.c:42 [inline]
 prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
 __dev_xmit_skb net/core/dev.c:3756 [inline]
 __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
 __ip_finish_output net/ipv4/ip_output.c:306 [inline]
 __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0x196/0x310 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
 geneve_xmit_skb drivers/net/geneve.c:966 [inline]
 geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one net/core/dev.c:3473 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
 neigh_hh_output include/net/neighbour.h:533 [inline]
 neigh_output include/net/neighbour.h:547 [inline]
 ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
 __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
 dst_output include/net/dst.h:451 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
 mld_send_cr net/ipv6/mcast.c:2127 [inline]
 mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:   48 89 eb                mov    %rbp,%rbx
   3:   c6 45 01 01             movb   $0x1,0x1(%rbp)
   7:   41 bc 00 80 00 00       mov    $0x8000,%r12d
   d:   48 c1 e9 03             shr    $0x3,%rcx
  11:   83 e3 07                and    $0x7,%ebx
  14:   41 be 01 00 00 00       mov    $0x1,%r14d
  1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  21:   fc ff df
  24:   4c 8d 2c 01             lea    (%rcx,%rax,1),%r13
  28:   eb 0c                   jmp    0x36
* 2a:   f3 90                   pause <-- trapping instruction
  2c:   41 83 ec 01             sub    $0x1,%r12d
  30:   0f 84 72 04 00 00       je     0x4a8
  36:   41 0f b6 45 00          movzbl 0x0(%r13),%eax
  3b:   38 d8                   cmp    %bl,%al
  3d:   7f 08                   jg     0x47
  3f:   84                      .byte 0x84

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16 20:45:55 -08:00
Jon Maloy
c08e58438d tipc: fix wrong notification node addresses
The previous bug fix had an unfortunate side effect that broke
distribution of binding table entries between nodes. The updated
tipc_sock_addr struct is also used further down in the same
function, and there the old value is still the correct one.

Fixes: 032062f363 ("tipc: fix wrong publisher node address in link publications")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20220216020009.3404578-1-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16 20:44:40 -08:00
Willem de Bruijn
0b0dff5b3b ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).

Commit 59c820b231 ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.

That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:

  This is an optimization. Robust applications still have to revert to
  requesting leases if the fast path fails due to an exclusive lease.

Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.

Changes
  v2
    - wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled

Fixes: 59c820b231 ("ipv6: elide flowlabel check if no exclusive leases exist")
Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/
Reported-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Congyu Liu <liu3101@purdue.edu>
Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16 20:37:47 -08:00
Oleksandr Mazur
c832962ac9 net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).

This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.

Fix this by adding a notify message in such case.

Fixes: 147c1e9b90 ("switchdev: bridge: Offload multicast disabled")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16 20:35:00 -08:00
Tom Rix
7e5b6a5c8c mctp: fix use after free
Clang static analysis reports this problem
route.c:425:4: warning: Use of memory after it is freed
  trace_mctp_key_acquire(key);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
When mctp_key_add() fails, key is freed but then is later
used in trace_mctp_key_acquire().  Add an else statement
to use the key only when mctp_key_add() is successful.

Fixes: 4f9e1ba6de ("mctp: Add tracepoints for tag/key handling")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15 14:54:40 +00:00
Zhang Yunkai
9d2d38c35e ipv4: add description about martian source
When multiple containers are running in the environment and multiple
macvlan network port are configured in each container, a lot of martian
source prints will appear after martian_log is enabled. they are almost
the same, and printed by net_warn_ratelimited. Each arp message will
trigger this print on each network port.

Such as:
IPv4: martian source 173.254.95.16 from 173.254.100.109,
on dev eth0
ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d
08 06        ......@...dm..
IPv4: martian source 173.254.95.16 from 173.254.100.109,
on dev eth1
ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d
08 06        ......@...dm..

There is no description of this kind of source in the RFC1812.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15 14:30:48 +00:00
Jon Maloy
032062f363 tipc: fix wrong publisher node address in link publications
When a link comes up we add its presence to the name table to make it
possible for users to subscribe for link up/down events. However, after
a previous call signature change the binding is wrongly published with
the peer node as publishing node, instead of the own node as it should
be. This has the effect that the command 'tipc name table show' will
list the link binding (service type 2) with node scope and a peer node
as originator, something that obviously is impossible.

We correct this bug here.

Fixes: 50a3499ab8 ("tipc: simplify signature of tipc_namtbl_publish()")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20220214013852.2803940-1-jmaloy@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-14 21:18:18 -08:00
Alex Maydanik
0fc95dec09 net: fix documentation for kernel_getsockname
Fixes return value documentation of kernel_getsockname()
and kernel_getpeername() functions.

The previous documentation wrongly specified that the return
value is 0 in case of success, however sock->ops->getname returns
the length of the address in bytes in case of success.

Signed-off-by: Alex Maydanik <alexander.maydanik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 14:01:19 +00:00
Eric Dumazet
5891cd5ec4 net_sched: add __rcu annotation to netdev->qdisc
syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.

[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu

write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
 attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
 dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
 __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
 rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
 qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 470502de5b ("net: sched: unlock rules update API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:36:36 +00:00
Vladimir Oltean
a2614140dc net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.

Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f87 ("net: dsa: Add support for learning FDB through
notification").

Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.

Fixes: c9eb3e0f87 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:31:12 +00:00
Ignat Korchagin
26394fc118 ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
Some time ago 8965779d2c ("ipv6,mcast: always hold idev->lock before mca_lock")
switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe
version. That was OK, because idev->lock was held for these codepaths.

In 88e2ca3080 ("mld: convert ifmcaddr6 to RCU") these external locks were
removed, so we probably need to restore the original rcu-safe call.

Otherwise, we occasionally get a machine crashed/stalled with the following
in dmesg:

[ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI
[ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G           O      5.15.19-cloudflare-2022.2.1 #1
[ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV
[ 3406.009552][T230589] Workqueue: mld mld_ifc_work
[ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60
[ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b
[ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202
[ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040
[ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008
[ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000
[ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100
[ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000
[ 3406.125730][T230589] FS:  0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000
[ 3406.138992][T230589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0
[ 3406.162421][T230589] Call Trace:
[ 3406.170235][T230589]  <TASK>
[ 3406.177736][T230589]  mld_newpack+0xfe/0x1a0
[ 3406.186686][T230589]  add_grhead+0x87/0xa0
[ 3406.195498][T230589]  add_grec+0x485/0x4e0
[ 3406.204310][T230589]  ? newidle_balance+0x126/0x3f0
[ 3406.214024][T230589]  mld_ifc_work+0x15d/0x450
[ 3406.223279][T230589]  process_one_work+0x1e6/0x380
[ 3406.232982][T230589]  worker_thread+0x50/0x3a0
[ 3406.242371][T230589]  ? rescuer_thread+0x360/0x360
[ 3406.252175][T230589]  kthread+0x127/0x150
[ 3406.261197][T230589]  ? set_kthread_struct+0x40/0x40
[ 3406.271287][T230589]  ret_from_fork+0x22/0x30
[ 3406.280812][T230589]  </TASK>
[ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders]
[ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---

Fixes: 88e2ca3080 ("mld: convert ifmcaddr6 to RCU")
Reported-by: David Pinilla Caparros <dpini@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:30:37 +00:00
Hangyu Hua
143de8d97d tipc: fix a bit overflow in tipc_crypto_key_rcv()
msg_data_sz return a 32bit value, but size is 16bit. This may lead to a
bit overflow.

Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13 12:12:25 +00:00
David S. Miller
85d24ad38b wireless fixes for v5.17
Second set of fixes for v5.17. This is the first pull request with
 both driver and stack patches.
 
 Most important here are a regression fix for brcmfmac USB devices and
 an iwlwifi fix for use after free when the firmware was missing. We
 have new maintainers for ath9k and wcn36xx as well as ath6kl is now
 orphaned. Also smaller fixes to iwlwifi and stack.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmIGLQoRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZvMSwf/QSGm9vlSC1rfiG/+cy65pg1bY4m7DM1g
 fiZJP3X690jBzi45B6ld3sI47DdycJH5jhyj7A9DIp2sPNGuULaGNjUrV2eQpjjk
 BU2tBOBLy/P9aoyMuRTeF4H2QXMgRXGTVhVF4oydmzOigKyJx1GxOt4tfkqPATYb
 3SEKp2wbwUjGA0K4xCD0LFnJHFcp3XSHFRNLN6QpZz/Ku+BcLq+cUWfHUKqDZFl4
 rNa4BH5xyydFNNURM6K20iKacq/8s5JOxdGAS+OmBGXKyZT/kbxvZDNz3rUsXuhd
 C7MYkQPGeXayfTrJbHwV2CBpaN+hy3KkU0ecOG6FACu9H4hfoJfynA==
 =ioe+
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2022-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

wireless fixes for v5.17

Second set of fixes for v5.17. This is the first pull request with
both driver and stack patches.

Most important here are a regression fix for brcmfmac USB devices and
an iwlwifi fix for use after free when the firmware was missing. We
have new maintainers for ath9k and wcn36xx as well as ath6kl is now
orphaned. Also smaller fixes to iwlwifi and stack.
2022-02-11 14:18:39 +00:00
David S. Miller
525de9a793 Merge ra.kernel.org:/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Add selftest for nft_synproxy, from Florian Westphal.

2) xt_socket destroy path incorrectly disables IPv4 defrag for
   IPv6 traffic (typo), from Eric Dumazet.

3) Fix exit value selftest nft_concat_range.sh, from Hangbin Liu.

4) nft_synproxy disables the IPv4 hooks if the IPv6 hooks fail
   to be registered.

5) disable rp_filter on router in selftest nft_fib.sh, also
   from Hangbin Liu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11 11:55:08 +00:00
Eric Dumazet
dcd54265c8 drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
trace_napi_poll_hit() is reading stat->dev while another thread can write
on it from dropmon_net_event()

Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
we only have to take care of load/store tearing.

BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit

write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
 dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
 notifier_call_chain kernel/notifier.c:84 [inline]
 raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
 call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
 call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
 call_netdevice_notifiers net/core/dev.c:1945 [inline]
 unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
 ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
 vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
 ops_exit_list net/core/net_namespace.c:173 [inline]
 cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
 trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
 trace_napi_poll include/trace/events/napi.h:14 [inline]
 __napi_poll+0x36b/0x3f0 net/core/dev.c:6366
 napi_poll net/core/dev.c:6432 [inline]
 net_rx_action+0x29e/0x650 net/core/dev.c:6519
 __do_softirq+0x158/0x2de kernel/softirq.c:558
 do_softirq+0xb1/0xf0 kernel/softirq.c:459
 __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
 _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
 spin_unlock_bh include/linux/spinlock.h:394 [inline]
 ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
 wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

value changed: 0xffff88815883e000 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker

Fixes: 4ea7e38696 ("dropmon: add ability to detect when hardware dropsrxpackets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11 11:20:32 +00:00
Wen Gu
1de9770d12 net/smc: Avoid overwriting the copies of clcsock callback functions
The callback functions of clcsock will be saved and replaced during
the fallback. But if the fallback happens more than once, then the
copies of these callback functions will be overwritten incorrectly,
resulting in a loop call issue:

clcsk->sk_error_report
 |- smc_fback_error_report() <------------------------------|
     |- smc_fback_forward_wakeup()                          | (loop)
         |- clcsock_callback()  (incorrectly overwritten)   |
             |- smc->clcsk_error_report() ------------------|

So this patch fixes the issue by saving these function pointers only
once in the fallback and avoiding overwriting.

Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
Fixes: 341adeec9a ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11 11:10:29 +00:00
Linus Torvalds
f1baf68e13 Networking fixes for 5.17-rc4, including fixes from netfilter and can.
Current release - new code bugs:
 
  - sparx5: fix get_stat64 out-of-bound access and crash
 
  - smc: fix netdev ref tracker misuse
 
 Previous releases - regressions:
 
  - eth: ixgbevf: require large buffers for build_skb on 82599VF,
    avoid overflows
 
  - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
    over IP
 
  - bonding: fix rare link activation misses in 802.3ad mode
 
 Previous releases - always broken:
 
  - tcp: fix tcp sock mem accounting in zero-copy corner cases
 
  - remove the cached dst when uncloning an skb dst and its metadata,
    since we only have one ref it'd lead to an UaF
 
  - netfilter:
    - conntrack: don't refresh sctp entries in closed state
    - conntrack: re-init state for retransmitted syn-ack, avoid
      connection establishment getting stuck with strange stacks
    - ctnetlink: disable helper autoassign, avoid it getting lost
    - nft_payload: don't allow transport header access for fragments
 
  - dsa: fix use of devres for mdio throughout drivers
 
  - eth: amd-xgbe: disable interrupts during pci removal
 
  - eth: dpaa2-eth: unregister netdev before disconnecting the PHY
 
  - eth: ice: fix IPIP and SIT TSO offload
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIFcW0ACgkQMUZtbf5S
 IrsloQ/+LrzlXgYrWf60DEFT4+AfJ20YxeakB+SpT0cZdvylU6NHF/U4rSHdUJRn
 xZiHfhfKzNWV5miJ2wzOuvufUvoz173dtrdgJJmp6G43qfAFyqiqtxrbVuuTQk0G
 C2i5M66zJ2svSj3EO5dYKV8zeydd14eVeoaJqfN5rjMARTmvpT/ssdzw0LTf0aXp
 87CF/WyeH8NyfQxQwPmbGRxDpnV2RqDJYSNdA4kOtDrnQmoKet32rhE6liHeP2jD
 OFQo70QXEMVyIZEh4wT17lMqA4M1zIEQtgrB0NsmVNU3jQnDI4UQ0aA2gHzwK31S
 glORZWGqTGGrhVy9uQBGKUK29RW8vb0D2ojZpi3zd0htWpIqpfFf6wuMAdUwEZag
 mTlZ1Yi6YgzAQEdSALoXLKps1GiHQzQNYPK6fNxoSOxnmoQTMQL9KxU/HB9d8W+L
 hjuYOGDw9vtGyUpNF7lktCQR/sWjFCmevLk97d2mhofFqfHBJYzFHGyd0GgNVzgl
 o3CMWnyiqTVapLRMTFPnEADEXGWq4DYAjRPkfjDCHeITB9Neg9S85Geth+xwfdi1
 cwSu64xFb7k7hiAbnymB3xg5sBQjXHrIQo03GHVymQE5p6XxIhn+5UD9CYHGJ2TF
 7PGyHeJJOZyWGQ/iSZJbo+BpsVGthIgk+9tTKPeTToej0tic27w=
 =jlhJ
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

Current release - new code bugs:

   - sparx5: fix get_stat64 out-of-bound access and crash

   - smc: fix netdev ref tracker misuse

  Previous releases - regressions:

   - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
     overflows

   - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
     over IP

   - bonding: fix rare link activation misses in 802.3ad mode

  Previous releases - always broken:

   - tcp: fix tcp sock mem accounting in zero-copy corner cases

   - remove the cached dst when uncloning an skb dst and its metadata,
     since we only have one ref it'd lead to an UaF

   - netfilter:
      - conntrack: don't refresh sctp entries in closed state
      - conntrack: re-init state for retransmitted syn-ack, avoid
        connection establishment getting stuck with strange stacks
      - ctnetlink: disable helper autoassign, avoid it getting lost
      - nft_payload: don't allow transport header access for fragments

   - dsa: fix use of devres for mdio throughout drivers

   - eth: amd-xgbe: disable interrupts during pci removal

   - eth: dpaa2-eth: unregister netdev before disconnecting the PHY

   - eth: ice: fix IPIP and SIT TSO offload"

* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
  net: mscc: ocelot: fix mutex lock error during ethtool stats read
  ice: Avoid RTNL lock when re-creating auxiliary device
  ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
  ice: fix IPIP and SIT TSO offload
  ice: fix an error code in ice_cfg_phy_fec()
  net: mpls: Fix GCC 12 warning
  dpaa2-eth: unregister the netdev before disconnecting from the PHY
  skbuff: cleanup double word in comment
  net: macb: Align the dma and coherent dma masks
  mptcp: netlink: process IPv6 addrs in creating listening sockets
  selftests: mptcp: add missing join check
  net: usb: qmi_wwan: Add support for Dell DW5829e
  vlan: move dev_put into vlan_dev_uninit
  vlan: introduce vlan_dev_free_egress_priority
  ax25: fix UAF bugs of net_device caused by rebinding operation
  net: dsa: fix panic when DSA master device unbinds on shutdown
  net: amd-xgbe: disable interrupts during pci removal
  tipc: rate limit warning for received illegal binding update
  net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
  ...
2022-02-10 16:01:22 -08:00
Pablo Neira Ayuso
2b4e5fb4d3 netfilter: nft_synproxy: unregister hooks on init error path
Disable the IPv4 hooks if the IPv6 hooks fail to be registered.

Fixes: ad49d86e07 ("netfilter: nf_tables: Add synproxy support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-10 16:33:57 +01:00
Victor Erminpour
c4416f5c2e net: mpls: Fix GCC 12 warning
When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable outside the switch, which silences the warning:

./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable]
  1624 |                 int err;
       |                     ^~~

Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10 15:29:39 +00:00
Tom Rix
58e61e416b skbuff: cleanup double word in comment
Remove the second 'to'.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10 15:11:51 +00:00
Jon Maloy
9aa422ad32 tipc: improve size validations for received domain records
The function tipc_mon_rcv() allows a node to receive and process
domain_record structs from peer nodes to track their views of the
network topology.

This patch verifies that the number of members in a received domain
record does not exceed the limit defined by MAX_MON_DOMAIN, something
that may otherwise lead to a stack overflow.

tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where
we are reading a 32 bit message data length field into a uint16.  To
avert any risk of bit overflow, we add an extra sanity check for this in
that function.  We cannot see that happen with the current code, but
future designers being unaware of this risk, may introduce it by
allowing delivery of very large (> 64k) sk buffers from the bearer
layer.  This potential problem was identified by Eric Dumazet.

This fixes CVE-2022-0435

Reported-by: Samuel Page <samuel.page@appgate.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 35c55c9877 ("tipc: add neighbor monitoring framework")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Samuel Page <samuel.page@appgate.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-10 05:37:44 -08:00
Kishen Maloor
029744cd4b mptcp: netlink: process IPv6 addrs in creating listening sockets
This change updates mptcp_pm_nl_create_listen_socket() to create
listening sockets bound to IPv6 addresses (where IPv6 is supported).

Fixes: 1729cf186d ("mptcp: create the listening socket for new port")
Acked-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 21:44:35 -08:00
Xin Long
d6ff94afd9 vlan: move dev_put into vlan_dev_uninit
Shuang Li reported an QinQ issue by simply doing:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name dummy0.1 type vlan id 1
  # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
  # rmmod 8021q

 unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1

When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().

When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
vlan_dev_free().

This issue was introduced by Commit 563bcbae3b ("net: vlan: fix a UAF in
vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
netdev_wait_allrefs().

Fixes: 563bcbae3b ("net: vlan: fix a UAF in vlan_dev_real_dev()")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 13:33:39 +00:00
Xin Long
37aa50c539 vlan: introduce vlan_dev_free_egress_priority
This patch is to introduce vlan_dev_free_egress_priority() to
free egress priority for vlan dev, and keep vlan_dev_uninit()
static as .ndo_uninit. It makes the code more clear and safer
when adding new code in vlan_dev_uninit() in the future.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 13:33:39 +00:00
Duoming Zhou
feef318c85 ax25: fix UAF bugs of net_device caused by rebinding operation
The ax25_kill_by_device() will set s->ax25_dev = NULL and
call ax25_disconnect() to change states of ax25_cb and
sock, if we call ax25_bind() before ax25_kill_by_device().

However, if we call ax25_bind() again between the window of
ax25_kill_by_device() and ax25_dev_device_down(), the values
and states changed by ax25_kill_by_device() will be reassigned.

Finally, ax25_dev_device_down() will deallocate net_device.
If we dereference net_device in syscall functions such as
ax25_release(), ax25_sendmsg(), ax25_getsockopt(), ax25_getname()
and ax25_info_show(), a UAF bug will occur.

One of the possible race conditions is shown below:

      (USE)                   |      (FREE)
ax25_bind()                   |
                              |  ax25_kill_by_device()
ax25_bind()                   |
ax25_connect()                |    ...
                              |  ax25_dev_device_down()
                              |    ...
                              |    dev_put_track(dev, ...) //FREE
ax25_release()                |    ...
  ax25_send_control()         |
    alloc_skb()      //USE    |

the corresponding fail log is shown below:
===============================================================
BUG: KASAN: use-after-free in ax25_send_control+0x43/0x210
...
Call Trace:
  ...
  ax25_send_control+0x43/0x210
  ax25_release+0x2db/0x3b0
  __sock_release+0x6d/0x120
  sock_close+0xf/0x20
  __fput+0x11f/0x420
  ...
Allocated by task 1283:
  ...
  __kasan_kmalloc+0x81/0xa0
  alloc_netdev_mqs+0x5a/0x680
  mkiss_open+0x6c/0x380
  tty_ldisc_open+0x55/0x90
  ...
Freed by task 1969:
  ...
  kfree+0xa3/0x2c0
  device_release+0x54/0xe0
  kobject_put+0xa5/0x120
  tty_ldisc_kill+0x3e/0x80
  ...

In order to fix these UAF bugs caused by rebinding operation,
this patch adds dev_hold_track() into ax25_bind() and
corresponding dev_put_track() into ax25_kill_by_device().

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 13:30:07 +00:00
Vladimir Oltean
ee534378f0 net: dsa: fix panic when DSA master device unbinds on shutdown
Rafael reports that on a system with LX2160A and Marvell DSA switches,
if a reboot occurs while the DSA master (dpaa2-eth) is up, the following
panic can be seen:

systemd-shutdown[1]: Rebooting.
Unable to handle kernel paging request at virtual address 00a0000800000041
[00a0000800000041] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00042-g8f5585009b24 #32
pc : dsa_slave_netdevice_event+0x130/0x3e4
lr : raw_notifier_call_chain+0x50/0x6c
Call trace:
 dsa_slave_netdevice_event+0x130/0x3e4
 raw_notifier_call_chain+0x50/0x6c
 call_netdevice_notifiers_info+0x54/0xa0
 __dev_close_many+0x50/0x130
 dev_close_many+0x84/0x120
 unregister_netdevice_many+0x130/0x710
 unregister_netdevice_queue+0x8c/0xd0
 unregister_netdev+0x20/0x30
 dpaa2_eth_remove+0x68/0x190
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver_internal+0xac/0xb0
 device_links_unbind_consumers+0xd4/0x100
 __device_release_driver+0x94/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_device_remove+0x24/0x40
 __fsl_mc_device_remove+0xc/0x20
 device_for_each_child+0x58/0xa0
 dprc_remove+0x90/0xb0
 fsl_mc_driver_remove+0x20/0x5c
 __device_release_driver+0x21c/0x220
 device_release_driver+0x28/0x40
 bus_remove_device+0x118/0x124
 device_del+0x174/0x420
 fsl_mc_bus_remove+0x80/0x100
 fsl_mc_bus_shutdown+0xc/0x1c
 platform_shutdown+0x20/0x30
 device_shutdown+0x154/0x330
 __do_sys_reboot+0x1cc/0x250
 __arm64_sys_reboot+0x20/0x30
 invoke_syscall.constprop.0+0x4c/0xe0
 do_el0_svc+0x4c/0x150
 el0_svc+0x24/0xb0
 el0t_64_sync_handler+0xa8/0xb0
 el0t_64_sync+0x178/0x17c

It can be seen from the stack trace that the problem is that the
deregistration of the master causes a dev_close(), which gets notified
as NETDEV_GOING_DOWN to dsa_slave_netdevice_event().
But dsa_switch_shutdown() has already run, and this has unregistered the
DSA slave interfaces, and yet, the NETDEV_GOING_DOWN handler attempts to
call dev_close_many() on those slave interfaces, leading to the problem.

The previous attempt to avoid the NETDEV_GOING_DOWN on the master after
dsa_switch_shutdown() was called seems improper. Unregistering the slave
interfaces is unnecessary and unhelpful. Instead, after the slaves have
stopped being uppers of the DSA master, we can now reset to NULL the
master->dsa_ptr pointer, which will make DSA start ignoring all future
notifier events on the master.

Fixes: 0650bf52b3 ("net: dsa: be compatible with masters which unregister on shutdown")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 13:21:39 +00:00
Jon Maloy
c7223d6877 tipc: rate limit warning for received illegal binding update
It would be easy to craft a message containing an illegal binding table
update operation. This is handled correctly by the code, but the
corresponding warning printout is not rate limited as is should be.
We fix this now.

Fixes: b97bf3fd8f ("[TIPC] Initial merge")
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 12:48:22 +00:00
David S. Miller
6d072066ab linux-can-fixes-for-5.17-20220209
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmIDdE4THG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXdbzB/9aYlSVCqS1tKOS57MmXBam1OrD5M6Y
 R0PuMISj8zUvwKp/nLfTGG6MyPtlxLh5qK5Wu9Eq38jfKlHh64XhFbmZmVwc34ji
 JzZO6znObncaWsDAUSIdv6uDB7OIuGQ94xgUXBvTkXhXnBn9P8kEWPKV6F1jSzGS
 fpE3WVlRLq5DT322B51O/QXn4ET9bNacEIX9Y+teU8YkTqRowokkCwTgGt7PHo7M
 M2f4ojHGUaABzkpyhK7eUOmGcmZYdQF84MrQhkQ+CF1NRzBnulH6NIJr5/PRCmy0
 PWVDfnxBgOjNWSIMwQ9Mwg70GZBEoKlcByYLbU0tkA27CN/1Nwio8RkY
 =21SZ
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-5.17-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-02-09

this is a pull request of 2 patches for net/master.

Oliver Hartkopp contributes 2 fixes for the CAN ISOTP protocol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 12:02:28 +00:00
Duoming Zhou
7ec02f5ac8 ax25: fix NPD bug in ax25_disconnect
The ax25_disconnect() in ax25_kill_by_device() is not
protected by any locks, thus there is a race condition
between ax25_disconnect() and ax25_destroy_socket().
when ax25->sk is assigned as NULL by ax25_destroy_socket(),
a NULL pointer dereference bug will occur if site (1) or (2)
dereferences ax25->sk.

ax25_kill_by_device()                | ax25_release()
  ax25_disconnect()                  |   ax25_destroy_socket()
    ...                              |
    if(ax25->sk != NULL)             |     ...
      ...                            |     ax25->sk = NULL;
      bh_lock_sock(ax25->sk); //(1)  |     ...
      ...                            |
      bh_unlock_sock(ax25->sk); //(2)|

This patch moves ax25_disconnect() into lock_sock(), which can
synchronize with ax25_destroy_socket() in ax25_release().

Fail log:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000088
...
RIP: 0010:_raw_spin_lock+0x7e/0xd0
...
Call Trace:
ax25_disconnect+0xf6/0x220
ax25_device_event+0x187/0x250
raw_notifier_call_chain+0x5e/0x70
dev_close_many+0x17d/0x230
rollback_registered_many+0x1f1/0x950
unregister_netdevice_queue+0x133/0x200
unregister_netdev+0x13/0x20
...

Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 11:55:02 +00:00
Eric Dumazet
75063c9294 netfilter: xt_socket: fix a typo in socket_mt_destroy()
Calling nf_defrag_ipv4_disable() instead of nf_defrag_ipv6_disable()
was probably not the intent.

I found this by code inspection, while chasing a possible issue in TPROXY.

Fixes: de8c12110a ("netfilter: disable defrag once its no longer needed")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-09 11:06:59 +01:00
Oliver Hartkopp
8375dfac4f can: isotp: fix error path in isotp_sendmsg() to unlock wait queue
Commit 43a08c3bda ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent
access in isotp_sendmsg()") introduced a new locking scheme that may render
the userspace application in a locking state when an error is detected.
This issue shows up under high load on simultaneously running isotp channels
with identical configuration which is against the ISO specification and
therefore breaks any reasonable PDU communication anyway.

Fixes: 43a08c3bda ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()")
Link: https://lore.kernel.org/all/20220209073601.25728-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Cc: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-09 08:47:47 +01:00
Oliver Hartkopp
7c759040c1 can: isotp: fix potential CAN frame reception race in isotp_rcv()
When receiving a CAN frame the current code logic does not consider
concurrently receiving processes which do not show up in real world
usage.

Ziyang Xuan writes:

The following syz problem is one of the scenarios. so->rx.len is
changed by isotp_rcv_ff() during isotp_rcv_cf(), so->rx.len equals
0 before alloc_skb() and equals 4096 after alloc_skb(). That will
trigger skb_over_panic() in skb_put().

=======================================================
CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc8-syzkaller #0
RIP: 0010:skb_panic+0x16c/0x16e net/core/skbuff.c:113
Call Trace:
 <TASK>
 skb_over_panic net/core/skbuff.c:118 [inline]
 skb_put.cold+0x24/0x24 net/core/skbuff.c:1990
 isotp_rcv_cf net/can/isotp.c:570 [inline]
 isotp_rcv+0xa38/0x1e30 net/can/isotp.c:668
 deliver net/can/af_can.c:574 [inline]
 can_rcv_filter+0x445/0x8d0 net/can/af_can.c:635
 can_receive+0x31d/0x580 net/can/af_can.c:665
 can_rcv+0x120/0x1c0 net/can/af_can.c:696
 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5465
 __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5579

Therefore we make sure the state changes and data structures stay
consistent at CAN frame reception time by adding a spin_lock in
isotp_rcv(). This fixes the issue reported by syzkaller but does not
affect real world operation.

Fixes: e057dd3fc2 ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/linux-can/d7e69278-d741-c706-65e1-e87623d9a8e8@huawei.com/T/
Link: https://lore.kernel.org/all/20220208200026.13783-1-socketcan@hartkopp.net
Cc: stable@vger.kernel.org
Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com
Reported-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-09 08:47:25 +01:00
Eric Dumazet
5611a00697 ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
ip[6]mr_free_table() can only be called under RTNL lock.

RTNL: assertion failed at net/core/dev.c (10367)
WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Modules linked in:
CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee
RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4
R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000
FS:  00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509
 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline]
 ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline]
 ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline]
 ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298
 ops_init+0xaf/0x470 net/core/net_namespace.c:140
 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331
 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178
 copy_process+0x2e0c/0x7300 kernel/fork.c:2167
 kernel_clone+0xe7/0xab0 kernel/fork.c:2555
 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4ab89f9059
Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f.
RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059
RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000
RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300
R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000
 </TASK>

Fixes: f243e5a785 ("ipmr,ip6mr: call ip6mr_free_table() on failure path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-08 20:49:52 -08:00
Linus Torvalds
e6251ab455 NFS Client Bugfixes for Linux 5.17-rc
- Stable Fixes:
   - Fix initialization of nfs_client cl_flags
 
 - Other Fixes:
   - Fix performance issues with uncached readdir calls
   - Fix potential pointer dereferences in rpcrdma_ep_create
   - Fix nfs4_proc_get_locations() kernel-doc comment
   - Fix locking during sunrpc sysfs reads
   - Update my email address in the MAINTAINERS file to my new kernel.org email
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmICusMACgkQ18tUv7Cl
 QOu0dhAA3o1X0b4vjpZldfRSBzf5dL6smzvqiA1gOFSCdpfST7Tp5VWGjZNxgYBP
 wuwJMFuVpo8UQkbRrxo4zmqzZbiukpt7JsS4a3W0cCZ0iNIXtKkE4YIBWs67rUWZ
 z4LQ33ouJJdwQikJqycMbR/4Mg1j7VFzjmbVEM/FRJSv7IbYDRGvs7TqrzTndoeX
 ll0jAHsuUvFfq29WXTMq2PH0AiY+CxLnkvTuLL7QnG+bGVj2psTQi/kqC3R5+qLN
 M3aYDaKfb9SYZ3pdQkZV88IQsNQjlde3B1KWc7vJdXx8eUvxBF9tWU9TC85Y4mdg
 5ZNK/AX3DnICI1AcTq7/8nxUhOH7rEeroWIsA52oPEzG7f9Qkt1hUKzQRg5N7VMA
 VHxLY5i39zOyYnDP/lkcnGaExMefG0imB+ZnYYbmIH70tzhheGYPaBNjwLCU8Jvh
 NDjjEWBvjCR33EyOA/j/ZB/o1Xgu6FJU+oCgd82oNeVlcqDTPYxYWtSZqcIN1yp7
 DUGJ893WpMgFFSAERaMX0JpjdbnH6/u8BvkKKwd6ERGGBqXuMwrdL9DmOpJroqUP
 3gq2l+fFgDbAryvKs8ipneUUGbOiIdP4spiP44Rg/mZQrEajeUyuuFhxqM//Bfe4
 O60nIrVNwIhjzu/ciIe4Z+KVs9KQm34LEtzRxEfp1SMFGrsGhRQ=
 =JA1A
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Stable Fixes:

   - Fix initialization of nfs_client cl_flags

  Other Fixes:

   - Fix performance issues with uncached readdir calls

   - Fix potential pointer dereferences in rpcrdma_ep_create

   - Fix nfs4_proc_get_locations() kernel-doc comment

   - Fix locking during sunrpc sysfs reads

   - Update my email address in the MAINTAINERS file to my new
     kernel.org email"

* tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: lock against ->sock changing during sysfs read
  MAINTAINERS: Update my email address
  NFS: Fix nfs4_proc_get_locations() kernel-doc comment
  xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
  NFS: Fix initialisation of nfs_client cl_flags field
  NFS: Avoid duplicate uncached readdir calls on eof
  NFS: Don't skip directory entries when doing uncached readdir
  NFS: Don't overfill uncached readdir pages
2022-02-08 12:03:07 -08:00
NeilBrown
b49ea673e1 SUNRPC: lock against ->sock changing during sysfs read
->sock can be set to NULL asynchronously unless ->recv_mutex is held.
So it is important to hold that mutex.  Otherwise a sysfs read can
trigger an oops.
Commit 17f09d3f61 ("SUNRPC: Check if the xprt is connected before
handling sysfs reads") appears to attempt to fix this problem, but it
only narrows the race window.

Fixes: 17f09d3f61 ("SUNRPC: Check if the xprt is connected before handling sysfs reads")
Fixes: a8482488a7 ("SUNRPC query transport's source port")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08 09:14:26 -05:00
Dan Aloni
a9c10b5b3b xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
If there are failures then we must not leave the non-NULL pointers with
the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries
free them, resulting in an Oops.

Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08 09:14:26 -05:00
Eric Dumazet
94fdd7c02a net/smc: use GFP_ATOMIC allocation in smc_pnet_add_eth()
My last patch moved the netdev_tracker_alloc() call to a section
protected by a write_lock().

I should have replaced GFP_KERNEL with GFP_ATOMIC to avoid the infamous:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256

Fixes: 28f9222138 ("net/smc: fix ref_tracker issue in smc_pnet_add()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07 12:02:49 +00:00
Eric Dumazet
28f9222138 net/smc: fix ref_tracker issue in smc_pnet_add()
I added the netdev_tracker_alloc() right after ndev was
stored into the newly allocated object:

  new_pe->ndev = ndev;
  if (ndev)
      netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL);

But I missed that later, we could end up freeing new_pe,
then calling dev_put(ndev) to release the reference on ndev.

The new_pe->dev_tracker would not be freed.

To solve this issue, move the netdev_tracker_alloc() call to
the point we know for sure new_pe will be kept.

syzbot report (on net-next tree, but the bug is present in net tree)
WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18
RSP: 0018:ffffc900043b7400 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85
R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000
FS:  00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __refcount_dec include/linux/refcount.h:344 [inline]
 refcount_dec include/linux/refcount.h:359 [inline]
 ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119
 netdev_tracker_free include/linux/netdevice.h:3867 [inline]
 dev_put_track include/linux/netdevice.h:3884 [inline]
 dev_put_track include/linux/netdevice.h:3880 [inline]
 dev_put include/linux/netdevice.h:3910 [inline]
 smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline]
 smc_pnet_enter net/smc/smc_pnet.c:493 [inline]
 smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556
 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
 genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:725
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: b60645248a ("net/smc: add net device tracker to struct smc_pnetentry")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06 11:08:03 +00:00
Eric Dumazet
f8d9d93851 tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) case
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY)
calls over the same TCP socket would again trigger the
infamous warning in inet_sock_destruct()

	WARN_ON(sk_forward_alloc_get(sk));

While Talal took into account a mix of regular copied data
and MSG_ZEROCOPY one in the same skb, the sendpage() path
has been forgotten.

We want the charging to happen for sendpage(), because
pages could be coming from a pipe. What is missing is the
downgrading of pure zerocopy status to make sure
sk_forward_alloc will stay synced.

Add tcp_downgrade_zcopy_pure() helper so that we can
use it from the two callers.

Fixes: 9b65b17db7 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-04 20:07:12 -08:00
Linus Torvalds
cff7f2237c A patch to make it possible to disable zero copy path in the messenger
to avoid checksum or authentication tag mismatches and ensuing session
 resets in case the destination buffer isn't guaranteed to be stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmH9JWMTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi6GVB/0QZtlzCwL0JVNlF1kro96/Sbyb4kNi
 vUgD9L1RLBBDBuGAgVHgIch3E8KxAwTia0BHWH/kxLAV84RqmpcIwuZiAjrqJaoz
 9JbXmO47+2/lul6YOrzTLDwWzvoMcv/ngUJYbulD0F6oeVqD9Kl3qUkrpf5cy+mJ
 uJpzXDhqrx9A5ruopQRFlx2br1sPp3Jn/45WXejoEUSxnbyKtejK6aZBmatvBgsX
 gtfVqSCQeY+bWXkhkg4ZaYAHRqH1lG5we6FEbB0RIG5gY9ygf1w2OWr33S2qrbTg
 DKz96jQ4nDAsMpvlis1y0IjpxiAY0c1A0B06E0Xxov/d4fNdtAlnaci5
 =pHA5
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "A patch to make it possible to disable zero copy path in the messenger
  to avoid checksum or authentication tag mismatches and ensuing session
  resets in case the destination buffer isn't guaranteed to be stable"

* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
  libceph: optionally use bounce buffer on recv path in crc mode
  libceph: make recv path in secure mode work the same as send path
2022-02-04 09:54:02 -08:00
Johannes Berg
f0a6fd1527 cfg80211: fix race in netlink owner interface destruction
My previous fix here to fix the deadlock left a race where
the exact same deadlock (see the original commit referenced
below) can still happen if cfg80211_destroy_ifaces() already
runs while nl80211_netlink_notify() is still marking some
interfaces as nl_owner_dead.

The race happens because we have two loops here - first we
dev_close() all the netdevs, and then we destroy them. If we
also have two netdevs (first one need only be a wdev though)
then we can find one during the first iteration, close it,
and go to the second iteration -- but then find two, and try
to destroy also the one we didn't close yet.

Fix this by only iterating once.

Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Fixes: ea6b2098dd ("cfg80211: fix locking in netlink owner interface destruction")
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220201130951.22093-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04 16:31:44 +01:00
Florian Westphal
d1ca60efc5 netfilter: ctnetlink: disable helper autoassign
When userspace, e.g. conntrackd, inserts an entry with a specified helper,
its possible that the helper is lost immediately after its added:

ctnetlink_create_conntrack
  -> nf_ct_helper_ext_add + assign helper
    -> ctnetlink_setup_nat
      -> ctnetlink_parse_nat_setup
         -> parse_nat_setup -> nfnetlink_parse_nat_setup
	                       -> nf_nat_setup_info
                                 -> nf_conntrack_alter_reply
                                   -> __nf_ct_try_assign_helper

... and __nf_ct_try_assign_helper will zero the helper again.

Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
when helper is assigned via ruleset.

Dropped old 'not strictly necessary' comment, it referred to use of
rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().

NB: Fixes tag intentionally incorrect, this extends the referenced commit,
but this change won't build without IPS_HELPER introduced there.

Fixes: 6714cf5465 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 05:39:57 +01:00
Florian Westphal
82b72cb946 netfilter: conntrack: re-init state for retransmitted syn-ack
TCP conntrack assumes that a syn-ack retransmit is identical to the
previous syn-ack.  This isn't correct and causes stuck 3whs in some more
esoteric scenarios.  tcpdump to illustrate the problem:

 client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7]
 server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663]

Note the invalid/outdated synack ack number.
Conntrack marks this syn-ack as out-of-window/invalid, but it did
initialize the reply direction parameters based on this packets content.

 client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7]

... retransmit...

 server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663]

and another bogus synack. This repeats, then client re-uses for a new
attempt:

client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7]
server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663]

... but still gets a invalid syn-ack.

This repeats until:

 server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663]
 server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663]
 client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7]
 server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583]

This syn-ack has the correct ack number, but conntrack flags it as
invalid: The internal state was created from the first syn-ack seen
so the sequence number of the syn-ack is treated as being outside of
the announced window.

Don't assume that retransmitted syn-ack is identical to previous one.
Treat it like the first syn-ack and reinit state.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 05:39:51 +01:00
Florian Westphal
cc4f9d6203 netfilter: conntrack: move synack init code to helper
It seems more readable to use a common helper in the followup fix rather
than copypaste or goto.

No functional change intended.  The function is only called for syn-ack
or syn in repy direction in case of simultaneous open.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 05:39:27 +01:00
Florian Westphal
a9e8503def netfilter: nft_payload: don't allow th access for fragments
Loads relative to ->thoff naturally expect that this points to the
transport header, but this is only true if pkt->fragoff == 0.

This has little effect for rulesets with connection tracking/nat because
these enable ip defra. For other rulesets this prevents false matches.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04 05:38:15 +01:00