Commit graph

1200636 commits

Author SHA1 Message Date
Pablo Neira Ayuso
5f68718b34 netfilter: nf_tables: GC transaction API to avoid race with control plane
The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.

The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.

We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:

   cpu 1                                cpu2
     GC work                            transaction comes in , lock nft mutex
       `acquire nft mutex // BLOCKS
                                        transaction asks to remove the set
                                        set destruction calls cancel_work_sync()

cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.

This patch adds a new API that deals with garbage collection in two
steps:

1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
   so they are not visible via lookup. Annotate current GC sequence in
   the GC transaction. Enqueue GC transaction work as soon as it is
   full. If ruleset is updated, then GC transaction is aborted and
   retried later.

2) GC work grabs the mutex. If GC sequence has changed then this GC
   transaction lost race with control plane, abort it as it contains
   stale references to objects and let GC try again later. If the
   ruleset is intact, then this GC transaction deactivates and removes
   the elements and it uses call_rcu() to destroy elements.

Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.

To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.

Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.

We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.

This gives us the choice of ABBA deadlock or UaF.

To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.

Set backends are adapted to use the GC transaction API in a follow up
patch entitled:

  ("netfilter: nf_tables: use gc transaction API in set backends")

This is joint work with Florian Westphal.

Fixes: cfed7e1b1f ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-10 08:25:16 +02:00
Florian Westphal
24138933b9 netfilter: nf_tables: don't skip expired elements during walk
There is an asymmetry between commit/abort and preparation phase if the
following conditions are met:

1. set is a verdict map ("1.2.3.4 : jump foo")
2. timeouts are enabled

In this case, following sequence is problematic:

1. element E in set S refers to chain C
2. userspace requests removal of set S
3. kernel does a set walk to decrement chain->use count for all elements
   from preparation phase
4. kernel does another set walk to remove elements from the commit phase
   (or another walk to do a chain->use increment for all elements from
    abort phase)

If E has already expired in 1), it will be ignored during list walk, so its use count
won't have been changed.

Then, when set is culled, ->destroy callback will zap the element via
nf_tables_set_elem_destroy(), but this function is only safe for
elements that have been deactivated earlier from the preparation phase:
lack of earlier deactivate removes the element but leaks the chain use
count, which results in a WARN splat when the chain gets removed later,
plus a leak of the nft_chain structure.

Update pipapo_get() not to skip expired elements, otherwise flush
command reports bogus ENOENT errors.

Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-08-09 14:39:28 +02:00
Jakub Kicinski
c5ccff7050 Merge branch 'net-sched-bind-logic-fixes-for-cls_fw-cls_u32-and-cls_route'
valis says:

====================
net/sched Bind logic fixes for cls_fw, cls_u32 and cls_route

Three classifiers (cls_fw, cls_u32 and cls_route) always copy
tcf_result struct into the new instance of the filter on update.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

This patch set fixes this issue in all affected classifiers by no longer
copying the tcf_result struct from the old filter.
====================

Link: https://lore.kernel.org/r/20230729123202.72406-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 20:10:39 -07:00
valis
b80b829e9e net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free
When route4_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

Fix this by no longer copying the tcf_result struct from the old filter.

Fixes: 1109c00547 ("net: sched: RCU cls_route")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 20:10:37 -07:00
valis
76e42ae831 net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free
When fw_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

Fix this by no longer copying the tcf_result struct from the old filter.

Fixes: e35a8ee599 ("net: sched: fw use RCU")
Reported-by: valis <sec@valis.email>
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 20:10:36 -07:00
valis
3044b16e7c net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free
When u32_change() is called on an existing filter, the whole
tcf_result struct is always copied into the new instance of the filter.

This causes a problem when updating a filter bound to a class,
as tcf_unbind_filter() is always called on the old instance in the
success path, decreasing filter_cnt of the still referenced class
and allowing it to be deleted, leading to a use-after-free.

Fix this by no longer copying the tcf_result struct from the old filter.

Fixes: de5df63228 ("net: sched: cls_u32 changes to knode must appear atomic to readers")
Reported-by: valis <sec@valis.email>
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: valis <sec@valis.email>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: M A Ramdhan <ramdhan@starlabs.sg>
Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 20:10:36 -07:00
Michal Schmidt
611e1b016c octeon_ep: initialize mbox mutexes
The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(),
but the corresponding mutex_init calls were missing.
A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with
CONFIG_DEBUG_MUTEXES on.

Initialize the two mutexes in octep_ctrl_mbox_init().

Fixes: 577f0d1b1c ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:31:26 -07:00
Jakub Kicinski
37b61cda9c bnxt: don't handle XDP in netpoll
Similarly to other recently fixed drivers make sure we don't
try to access XDP or page pool APIs when NAPI budget is 0.
NAPI budget of 0 may mean that we are in netpoll.

This may result in running software IRQs in hard IRQ context,
leading to deadlocks or crashes.

To make sure bnapi->tx_pkts don't get wiped without handling
the events, move clearing the field into the handler itself.
Remember to clear tx_pkts after reset (bnxt_enable_napi())
as it's technically possible that netpoll will accumulate
some tx_pkts and then a reset will happen, leaving tx_pkts
out of sync with reality.

Fixes: 322b87ca55 ("bnxt_en: add page_pool support")
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:28:39 -07:00
Rafal Rogalski
4b31fd4d77 ice: Fix RDMA VSI removal during queue rebuild
During qdisc create/delete, it is necessary to rebuild the queue
of VSIs. An error occurred because the VSIs created by RDMA were
still active.

Added check if RDMA is active. If yes, it disallows qdisc changes
and writes a message in the system logs.

Fixes: 348048e724 ("ice: Implement iidc operations")
Signed-off-by: Rafal Rogalski <rafalx.rogalski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:28:21 -07:00
Edward Cree
55c1528f9b sfc: fix field-spanning memcpy in selftest
Add a struct_group for the whole packet body so we can copy it in one
 go without triggering FORTIFY_SOURCE complaints.

Fixes: cf60ed4696 ("sfc: use padding to fix alignment in loopback test")
Fixes: 30c24dd87f ("sfc: siena: use padding to fix alignment in loopback test")
Fixes: 1186c6b31e ("sfc: falcon: use padding to fix alignment in loopback test")
Reviewed-by: Andy Moreton <andy.moreton@amd.com>
Tested-by: Andy Moreton <andy.moreton@amd.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230728165528.59070-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:27:53 -07:00
Martin Kohn
d4480c9bb9 net: usb: qmi_wwan: add Quectel EM05GV2
Add support for Quectel EM05GV2 (G=global) with vendor ID
0x2c7c and product ID 0x030e

Enabling DTR on this modem was necessary to ensure stable operation.
Patch for usb: serial: option: is also in progress.

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=2c7c ProdID=030e Rev= 3.18
S:  Manufacturer=Quectel
S:  Product=Quectel EM05-G
C:* #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Signed-off-by: Martin Kohn <m.kohn@welotec.com>
Link: https://lore.kernel.org/r/AM0PR04MB57648219DE893EE04FA6CC759701A@AM0PR04MB5764.eurprd04.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-31 14:09:38 -07:00
Duoming Zhou
1e7417c188 net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs
The timer dev->stat_monitor can schedule the delayed work dev->wq and
the delayed work dev->wq can also arm the dev->stat_monitor timer.

When the device is detaching, the net_device will be deallocated. but
the net_device private data could still be dereferenced in delayed work
or timer handler. As a result, the UAF bugs will happen.

One racy situation is shown below:

      (Thread 1)                 |      (Thread 2)
lan78xx_stat_monitor()           |
 ...                             |  lan78xx_disconnect()
 lan78xx_defer_kevent()          |    ...
  ...                            |    cancel_delayed_work_sync(&dev->wq);
  schedule_delayed_work()        |    ...
  (wait some time)               |    free_netdev(net); //free net_device
  lan78xx_delayedwork()          |
  //use net_device private data  |
  dev-> //use                    |

Although we use cancel_delayed_work_sync() to cancel the delayed work
in lan78xx_disconnect(), it could still be scheduled in timer handler
lan78xx_stat_monitor().

Another racy situation is shown below:

      (Thread 1)                |      (Thread 2)
lan78xx_delayedwork             |
 mod_timer()                    |  lan78xx_disconnect()
                                |   cancel_delayed_work_sync()
 (wait some time)               |   if (timer_pending(&dev->stat_monitor))
             	                |       del_timer_sync(&dev->stat_monitor);
 lan78xx_stat_monitor()         |   ...
  lan78xx_defer_kevent()        |   free_netdev(net); //free
   //use net_device private data|
   dev-> //use                  |

Although we use del_timer_sync() to delete the timer, the function
timer_pending() returns 0 when the timer is activated. As a result,
the del_timer_sync() will not be executed and the timer could be
re-armed.

In order to mitigate this bug, We use timer_shutdown_sync() to shutdown
the timer and then use cancel_delayed_work_sync() to cancel the delayed
work. As a result, the net_device could be deallocated safely.

What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in
lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE
in lan78xx_stat_monitor(). So this patch put the set_bit() behind
timer_shutdown_sync().

Fixes: 77dfff5bb7 ("lan78xx: Fix race condition in disconnect handling")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:58:29 +01:00
Rafał Miłecki
8469c7f547 dt-bindings: net: mediatek,net: fixup MAC binding
1. Use unevaluatedProperties
It's needed to allow ethernet-controller.yaml properties work correctly.

2. Drop unneeded phy-handle/phy-mode

3. Don't require phy-handle
Some SoCs may use fixed link.

For in-kernel MT7621 DTS files this fixes following errors:
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
        From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'phy-handle' is a required property
        From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+'
        From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml
arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'phy-handle' is a required property
        From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:16:04 +01:00
Kuniyuki Iwashima
e739718444 net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX.
syzkaller found zero division error [0] in div_s64_rem() called from
get_cycle_time_elapsed(), where sched->cycle_time is the divisor.

We have tests in parse_taprio_schedule() so that cycle_time will never
be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed().

The problem is that the types of divisor are different; cycle_time is
s64, but the argument of div_s64_rem() is s32.

syzkaller fed this input and 0x100000000 is cast to s32 to be 0.

  @TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000}

We use s64 for cycle_time to cast it to ktime_t, so let's keep it and
set max for cycle_time.

While at it, we prevent overflow in setup_txtime() and add another
test in parse_taprio_schedule() to check if cycle_time overflows.

Also, we add a new tdc test case for this issue.

[0]:
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline]
RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline]
RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344
Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10
RSP: 0018:ffffc90000acf260 EFLAGS: 00010206
RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000
RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934
R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800
R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
 <TASK>
 get_packet_txtime net/sched/sch_taprio.c:508 [inline]
 taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577
 taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658
 dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732
 __dev_xmit_skb net/core/dev.c:3821 [inline]
 __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169
 dev_queue_xmit include/linux/netdevice.h:3088 [inline]
 neigh_resolve_output net/core/neighbour.c:1552 [inline]
 neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532
 neigh_output include/net/neighbour.h:544 [inline]
 ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135
 __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196
 ip6_finish_output net/ipv6/ip6_output.c:207 [inline]
 NF_HOOK_COND include/linux/netfilter.h:292 [inline]
 ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228
 dst_output include/net/dst.h:458 [inline]
 NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303
 ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508
 ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666
 addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175
 process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597
 worker_thread+0x60f/0x1240 kernel/workqueue.c:2748
 kthread+0x2fe/0x3f0 kernel/kthread.c:389
 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
 </TASK>
Modules linked in:

Fixes: 4cfd5779bd ("taprio: Add support for txtime-assist mode")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Co-developed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-31 09:12:27 +01:00
David S. Miller
37e3cecb4c Merge branch 'net-data-races'
Eric Dumazet says:

====================
net: annotate data-races

This series was inspired by a syzbot/KCSAN report.

This will later also permit some optimizations,
like not having to lock the socket while reading/writing
some of its fields.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
8bf43be799 net: annotate data-races around sk->sk_priority
sk_getsockopt() runs locklessly. This means sk->sk_priority
can be read while other threads are changing its value.

Other reads also happen without socket lock being held.

Add missing annotations where needed.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
e5f0d2dd3c net: add missing data-race annotation for sk_ll_usec
In a prior commit I forgot that sk_getsockopt() reads
sk->sk_ll_usec without holding a lock.

Fixes: 0dbffbb533 ("net: annotate data race around sk_ll_usec")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
11695c6e96 net: add missing data-race annotations around sk->sk_peek_off
sk_getsockopt() runs locklessly, thus we need to annotate the read
of sk->sk_peek_off.

While we are at it, add corresponding annotations to sk_set_peek_off()
and unix_set_peek_off().

Fixes: b9bb53f383 ("sock: convert sk_peek_offset functions to WRITE_ONCE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
3c5b4d69c3 net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.

Fixes: 4a19ec5800 ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
b4b5532530 net: add missing READ_ONCE(sk->sk_rcvbuf) annotation
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvbuf locklessly.

Fixes: ebb3b78db7 ("tcp: annotate sk->sk_rcvbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
74bc084327 net: add missing READ_ONCE(sk->sk_sndbuf) annotation
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_sndbuf locklessly.

Fixes: e292f05e0d ("tcp: annotate sk->sk_sndbuf lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
285975dd67 net: annotate data-races around sk->sk_{rcv|snd}timeo
sk_getsockopt() runs without locks, we must add annotations
to sk->sk_rcvtimeo and sk->sk_sndtimeo.

In the future we might allow fetching these fields before
we lock the socket in TCP fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
e6d12bdb43 net: add missing READ_ONCE(sk->sk_rcvlowat) annotation
In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvlowat locklessly.

Fixes: eac66402d1 ("net: annotate sk->sk_rcvlowat lockless reads")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
ea7f45ef77 net: annotate data-races around sk->sk_max_pacing_rate
sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
can be read while other threads are changing its value.

Fixes: 62748f32d5 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
c76a032889 net: annotate data-race around sk->sk_txrehash
sk_getsockopt() runs locklessly. This means sk->sk_txrehash
can be read while other threads are changing its value.

Other locations were handled in commit cb6cd2cec7
("tcp: Change SYN ACK retransmit behaviour to account for rehash")

Fixes: 26859240e4 ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:41 +01:00
Eric Dumazet
fe11fdcb42 net: annotate data-races around sk->sk_reserved_mem
sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
can be read while other threads are changing its value.

Add missing annotations where they are needed.

Fixes: 2bb2f5fb21 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 18:13:40 +01:00
Richard Gobert
7938cd1543 net: gro: fix misuse of CB in udp socket lookup
This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.

Fixes: a6024562ff ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 17:10:27 +01:00
Konstantin Khorenko
e346e231b4 qed: Fix scheduling in a tasklet while getting stats
Here we've got to a situation when tasklet called usleep_range() in PTT
acquire logic, thus welcome to the "scheduling while atomic" BUG().

  BUG: scheduling while atomic: swapper/24/0/0x00000100

   [<ffffffffb41c6199>] schedule+0x29/0x70
   [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
   [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
   [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
   [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
   [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
   [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
   [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
                        qed_mcp_send_protocol_stats
            case MFW_DRV_MSG_GET_LAN_STATS:
            case MFW_DRV_MSG_GET_FCOE_STATS:
            case MFW_DRV_MSG_GET_ISCSI_STATS:
            case MFW_DRV_MSG_GET_RDMA_STATS:
   [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
                        qed_int_assertion
                        qed_int_attentions
   [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
   [<ffffffffb3aa7623>] tasklet_action+0x83/0x140
   [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
   [<ffffffffb41d560c>] call_softirq+0x1c/0x30
   [<ffffffffb3a30645>] do_softirq+0x65/0xa0
   [<ffffffffb3aa78d5>] irq_exit+0x105/0x110
   [<ffffffffb41d8996>] do_IRQ+0x56/0xf0

Fix this by making caller to provide the context whether it could be in
atomic context flow or not when getting stats from QED driver.
QED driver based on the context provided decide to schedule out or not
when acquiring the PTT BAR window.

We faced the BUG_ON() while getting vport stats, but according to the
code same issue could happen for fcoe and iscsi statistics as well, so
fixing them too.

Fixes: 6c75424612 ("qed: Add support for NCSI statistics.")
Fixes: 1e128c8129 ("qed: Add support for hardware offloaded FCoE.")
Fixes: 2f2b2614e8 ("qed: Provide iSCSI statistics to management")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Cc: Manish Chopra <manishc@marvell.com>

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 17:09:18 +01:00
Lukasz Majewski
8d7ae22ae9 net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries
The commit (SHA1: 5c844d57aa) provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.

Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.

As a result, following error is observed and KSZ9477 is not properly
configured:

ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0

The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 17:00:40 +01:00
Thierry Reding
a0b1b2055b net: stmmac: tegra: Properly allocate clock bulk data
The clock data is an array of struct clk_bulk_data, so make sure to
allocate enough memory.

Fixes: d8ca113724 ("net: stmmac: tegra: Add MGBE support")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-29 16:59:51 +01:00
Chengfeng Ye
56c6be35fc mISDN: hfcpci: Fix potential deadlock on &hc->lock
As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
hfcpci_int(), the timer should disable irq before lock acquisition
otherwise deadlock could happen if the timmer is preemtped by the hadr irq.

Possible deadlock scenario:
hfcpci_softirq() (timer)
    -> _hfcpci_softirq()
    -> spin_lock(&hc->lock);
        <irq interruption>
        -> hfcpci_int()
        -> spin_lock(&hc->lock); (deadlock here)

This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.

The tentative patch fixes the potential deadlock by spin_lock_irq()
in timer.

Fixes: b36b654a7e ("mISDN: Create /sys/class/mISDN")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:49:28 -07:00
Jamal Hadi Salim
e68409db99 net: sched: cls_u32: Fix match key mis-addressing
A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).

When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
reasonable in-kernel defaults are used. The in-kernel default for a table id is
0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
was none for a nodeid i.e. the code assumed that the user passed the correct
nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
was used. But nodeid of 0 is reserved for identifying the table. This is not
a problem until we dump. The dump code notices that the nodeid is zero and
assumes it is referencing a table and therefore references table struct
tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.

Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok

Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000

dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591

Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).

The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.

After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
  match 0a000001/ffffffff at 12
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1

Reported-by: Mingi Cho <mgcho.minic@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28 18:05:04 -07:00
Eugen Hristev
5416d7925e dt-bindings: net: rockchip-dwmac: fix {tx|rx}-delay defaults/range in schema
The range and the defaults are specified in the description instead of
being specified in the schema.
Fix it by adding the default value in the `default` field and specifying
the range as `minimum` and `maximum`.

Fixes: b331b8ef86 ("dt-bindings: net: convert rockchip-dwmac to json-schema")
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-28 10:56:40 +01:00
Jakub Kicinski
4a0822608e mlx5-fixes-2023-07-26
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmTBkJkACgkQSD+KveBX
 +j75KQf/Z2jvvcoH5aXCunGgF6+cbZGQ7KRIuxChgeHH1bYofNEAIiG4MmJEk7yi
 8fs9ZJwAvQSZ3L7/VgeBrQ7l/qypIdg40QrA2b0vUUqLvObHbtM3sRcXGXkFmIL9
 6gMiUsnhTeyBV4R3c7ViqPeEvPh0R7SLvvYd1QLemfJuWG9F7v8Tx87I0fVU9U3n
 jqRpIcQjVg/o4ZlW0Jyh1BPqE9INPdmTTUiBt3fmfbLtbP8xRVcLpabeXyd2tDLl
 yIXettLF97GwyzkMXu8zAx7mjRFZvbz7Tb2/g7sWNC7UlZmd4ho4jUHCTFuIUMwh
 0+AVBW7RxhgL+VjXsGcPhSn8gnb1rw==
 =MY4I
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-07-26

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Unregister devlink params in case interface is down
  net/mlx5: DR, Fix peer domain namespace setting
  net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
  net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
  net/mlx5: Bridge, set debugfs access right to root-only
  net/mlx5e: xsk: Fix crash on regular rq reactivation
  net/mlx5e: xsk: Fix invalid buffer access for legacy rq
  net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
  net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
  net/mlx5e: Don't hold encap tbl lock if there is no encap action
  net/mlx5: Honor user input for migratable port fn attr
  net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
  net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
  net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
  net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
====================

Link: https://lore.kernel.org/r/20230726213206.47022-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:03:41 -07:00
Yuanjun Gong
dadc5b86cc net: dsa: fix value check in bcm_sf2_sw_probe()
in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.

Fixes: e9ec5c3bd2 ("net: dsa: bcm_sf2: request and handle clocks")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:02:54 -07:00
Eric Dumazet
4d50e50045 net: flower: fix stack-out-of-bounds in fl_set_key_cfm()
Typical misuse of

	nla_parse_nested(array, XXX_MAX, ...);

array must be declared as

	struct nlattr *array[XXX_MAX + 1];

v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
definitions.

syzbot reported:

BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014

CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0x163/0x540 mm/kasan/report.c:475
kasan_report+0x175/0x1b0 mm/kasan/report.c:588
kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187
__asan_memset+0x23/0x40 mm/kasan/shadow.c:84
__nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
__nla_parse+0x40/0x50 lib/nlattr.c:700
nla_parse_nested include/net/netlink.h:1262 [inline]
fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718
fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884
fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666
tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline]
tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068
rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365
netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f54c6150759
Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0
R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001
</TASK>

The buggy address belongs to stack of task syz-executor296/5014
and is located at offset 32 in frame:
fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374

This frame has 1 object:
[32, 56) 'nla_cfm_opt'

The buggy address belongs to the virtual mapping at
[ffffc90003a08000, ffffc90003a11000) created by:
copy_process+0x5c8/0x4290 kernel/fork.c:2330

Fixes: 7cfffd5fed ("net: flower: add support for matching cfm fields")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Zahari Doychev <zdoychev@maxlinear.com>
Link: https://lore.kernel.org/r/20230726145815.943910-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 20:01:29 -07:00
Jakub Kicinski
fa46722666 MAINTAINERS: stmmac: retire Giuseppe Cavallaro
I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.

All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:20:23 -07:00
Russell King (Oracle)
9945c1fb03 net: dsa: fix older DSA drivers using phylink
Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.

Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.

Reported-by: Sergei Antonov <saproj@gmail.com>
Fixes: de5c9bf40c ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:19:46 -07:00
Lin Ma
d73ef2d69c rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.

By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.

To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.

Fixes: b1edc14a3f ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27 17:14:01 -07:00
Linus Torvalds
57012c5753 Networking fixes for 6.5-rc4, including fixes from can, netfilter
Current release - regressions:
 
   - core: fix splice_to_socket() for O_NONBLOCK socket
 
   - af_unix: fix fortify_panic() in unix_bind_bsd().
 
   - can: raw: fix lockdep issue in raw_release()
 
 Previous releases - regressions:
 
   - tcp: reduce chance of collisions in inet6_hashfn().
 
   - netfilter: skip immediate deactivate in _PREPARE_ERROR
 
   - tipc: stop tipc crypto on failure in tipc_node_create
 
   - eth: igc: fix kernel panic during ndo_tx_timeout callback
 
   - eth: iavf: fix potential deadlock on allocation failure
 
 Previous releases - always broken:
 
   - ipv6: fix bug where deleting a mngtmpaddr can create a new temporary address
 
   - eth: ice: fix memory management in ice_ethtool_fdir.c
 
   - eth: hns3: fix the imp capability bit cannot exceed 32 bits issue
 
   - eth: vxlan: calculate correct header length for GPE
 
   - eth: stmmac: apply redundant write work around on 4.xx too
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTCbIwSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkbGIP/RBALM+vg1ZpPWMUXRtjcdvuBqWFB2jB
 GsAfOj1PpHhusHx/CCyxl80oCtkmnLW3dE9HdoZJ6FYwxTYfhfwDhoPy02QOJ0OQ
 yy4xbtrczFekBoQECEzUHT+0oBTZXoU7eR+3LOhx5IGNnP2zMX8rQkbnjU21dahq
 Kkqo0Ir2L7VxGck67WDOaMAxJukO/WFB97KFsTJATwkbiQxhpw07NJi1fV4SzeFQ
 WXKfe7MiXBXmq53QWLScbUxRAcq3kduDNl0UCpz+L9ks5kayVJ3MOHkEfnJ5LIAQ
 dL4IJO6ugNj2FSZb9ulw6Kj3ZAjKXbrSWE0gHzU3vO8g6uqs4/yjz0uzbFSffiNs
 mbwASGxYRb48JO22Hn92xHNz9Wjpc1TXzLABp5dA2ykEqzw+XJ39qP0LUndVLlAW
 UBAKNK9w5+8UprN6HQpFq4pTlXN3Tr/WCGzRsB1x4rNVIoYHn5Y1VMtM8IZRODf3
 VcEenHg7k8SP8q4aFknmCueHdXWI4Rc66W4pUbcmyqDfH/+Xl4Q9qXxnH0a/SUx8
 3gxAfKCjFhnCqsXvlvHxwexY4TSN05jE+y5ZjQH0xSKkOFZsr5Qch1h75q5IGo2b
 /d87HwCP5eWPArR8eIl8WiONA94wbjWma04y65KsnoLtRz2iKZHXsei2jP0UtCHn
 zK3gbyXHq+iW
 =rx5z
 -----END PGP SIGNATURE-----

Merge tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from can, netfilter.

  Current release - regressions:

   - core: fix splice_to_socket() for O_NONBLOCK socket

   - af_unix: fix fortify_panic() in unix_bind_bsd().

   - can: raw: fix lockdep issue in raw_release()

  Previous releases - regressions:

   - tcp: reduce chance of collisions in inet6_hashfn().

   - netfilter: skip immediate deactivate in _PREPARE_ERROR

   - tipc: stop tipc crypto on failure in tipc_node_create

   - eth: igc: fix kernel panic during ndo_tx_timeout callback

   - eth: iavf: fix potential deadlock on allocation failure

  Previous releases - always broken:

   - ipv6: fix bug where deleting a mngtmpaddr can create a new
     temporary address

   - eth: ice: fix memory management in ice_ethtool_fdir.c

   - eth: hns3: fix the imp capability bit cannot exceed 32 bits issue

   - eth: vxlan: calculate correct header length for GPE

   - eth: stmmac: apply redundant write work around on 4.xx too"

* tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  tipc: stop tipc crypto on failure in tipc_node_create
  af_unix: Terminate sun_path when bind()ing pathname socket.
  tipc: check return value of pskb_trim()
  benet: fix return value check in be_lancer_xmit_workarounds()
  virtio-net: fix race between set queues and probe
  net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
  splice, net: Fix splice_to_socket() for O_NONBLOCK socket
  net: fec: tx processing does not call XDP APIs if budget is 0
  mptcp: more accurate NL event generation
  selftests: mptcp: join: only check for ip6tables if needed
  tools: ynl-gen: fix parse multi-attr enum attribute
  tools: ynl-gen: fix enum index in _decode_enum(..)
  netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
  netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
  netfilter: nft_set_rbtree: fix overlap expiration walk
  igc: Fix Kernel Panic during ndo_tx_timeout callback
  net: dsa: qca8k: fix mdb add/del case with 0 VID
  net: dsa: qca8k: fix broken search_and_del
  net: dsa: qca8k: fix search_and_insert wrong handling of new rule
  net: dsa: qca8k: enable use_single_write for qca8xxx
  ...
2023-07-27 12:27:37 -07:00
Linus Torvalds
bc168790de soundwire fixes for 6.4
- Core fix for enumeration completion
  - Qualcomm driver fix to update status
  - AMD driver fix for probe error check
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmTCWH4ACgkQfBQHDyUj
 g0cfjA//Q8QwotUsvQmTCmY5wSlfp/Rbh2onqOKU9OMszwlKOmS7nK8E7Yj8SXSH
 XmMDJ2GpB/TuVRMlNxJ0tz5gD8kp61411diDXA27qcKaVCmoFE7cC5jfqwTHobFD
 Rk8+2K8QSBksEM7MFSxBhs+E2Fdv9qurtKx2qxkzCRrPv7jVgP4bRvZKln/MXi5r
 5Y4zGaY0TeZmwy6sv1xtCNqfw69yc5DWtrFrVhuSTnXOY6T+yPQiPblWUFdi4877
 cwE9utQqv73GPOdAw7WUUsixhp1aiwdB/MZVPLcr3JAmFtiYGO3/NDwidwTHnjG1
 raOsynFOATybqxL0152e51TdOflYWXrun+IYBvh33iJnoRuKvNZqZ0jUhRXmwNi2
 4KOr3t6UuCN4tAedT0hjwfpjna8uQaWEqd90yZG6ZNKHK5VXlacqa9dF62+dtrsT
 EGEonXZ+A0uvNyWq+zMWGknle8gxNe1uxCs8cIkDnLFB1tcuE+ejMbJscQt6NVOe
 X7pR8qUAePZLyaNBApmE54BItIve00GTDrTHcdchZSVVYX/mSCalywuzKB2k1Bl/
 N8Dhws+IzbnnDUsr8jYY9TJDUg+TNqn1+KAIRjTzoMnno75Dh1zJm/t3ARw+xvkZ
 EDyf9LtMaKMoZuhVhrk/yFuTznLGJvVo6Z5GEVDe5NIDEmBGG5g=
 =+KVr
 -----END PGP SIGNATURE-----

Merge tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire fixes from Vinod Koul:

 - Core fix for enumeration completion

 - Qualcomm driver fix to update status

 - AMD driver fix for probe error check

* tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: amd: Fix a check for errors in probe()
  soundwire: qcom: update status correctly with mask
  soundwire: fix enumeration completion
2023-07-27 12:07:41 -07:00
Linus Torvalds
53c8621b9e phy fixes for 6.5
- Driver fixes for
    - Out of bound fix for hisilicon phy
    - Qualcomm synopsis femto phy for keeping clock enabled during suspend
      and enabling ref clocks
    - Mediatek driver fixes for upper limit test and error code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmTCVkUACgkQfBQHDyUj
 g0cwAg/+KSYSB1//gIqKQDEzCztwGOeyXPBedF7x/hsxmd0PgJKTJjPKhVMvwDhs
 7Nfk/TR5LcEfZvVUrwQ2kykZTCR03Kqn7egSOq+W+vx57zVyLpQqxV0Gm79LEPPk
 w7ncXf//b2pQ5yRI164skzCWcB6BkpyTaUlH/jZdWXB8FXGkxbdMLDhwmU2Xd5YU
 49Jv3fH5Poym92RRALGZoFJg/jZxmniX/sV8NXqNVkND2mwKuz9psH/iCWh2+ofU
 1TZYyvwfdODCdR3+kYDXY46uUttPy/xeh2vEHCKYlfr/Zp0pbnLM0/jBD3o/1x2C
 /Dgk1eQjMoZNNUgLtaIv2FjSdQRNPX0u8Ml9iLvoFNgQR5LB1gzTHDmtwIRtOHpN
 BInIWWkoGgEgOjxybzPkRTCXhWS/eErAK1K4ZXh24xQfkivBH3YGiDiMpG3eR4Wy
 4rurv/UxDKqqvCdNkweiYCI41N4ThyzG6qr8fcat2a9ZZmdqlvc9lBFlCVXdxmQj
 1p4ZAwriGmo4zFP1/jxTKk1wU/peEh7hzBa8dsyTP77LYO9adcFi+OVnkNM24CJh
 w9cO4EqDl01rZ30vHbh1J1b1yRX3Cwa1+JGHMnGg96g1dJb+mC+1xU6nZ60sMcew
 NKnFeANY94SGEeN2Bn4BDaIcFCZ1UBfcxs0PHeaGO+81s7tj5j8=
 =nyhr
 -----END PGP SIGNATURE-----

Merge tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

 - Out of bound fix for hisilicon phy

 - Qualcomm synopsis femto phy for keeping clock enabled during suspend
   and enabling ref clocks

 - Mediatek driver fixes for upper limit test and error code

* tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
  phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
  phy: qcom-snps-femto-v2: properly enable ref clock
  phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
  phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test
  phy: phy-mtk-dp: Fix an error code in probe()
2023-07-27 11:52:41 -07:00
Linus Torvalds
64de76ce8e for-6.5-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmTCTAsACgkQxWXV+ddt
 WDvhYhAAluWrfM2ZzhY/tdDeKUNpf0NIAFGZIV4QP/2E43yIPC2+xMPqW/AnBnIP
 k28gOhgoH7LBP/cr0IrFHMz8Glges3cHz1UxFjJZgjiU3mAA0mgkIttPpzms7vqi
 3SVUxL2bJkebJy53nOpZcHlrcWveg+q0hTUslquCYBb3dA4gb61HBwzA2e0wKFeB
 wYw/gQtEy3TkHQPAxVjUF28ASoaroNKsE9QjfLZV0FDn0u0zBFxqpqj7bFUay++i
 sG3nPVZsqKcgIX7sUSwrpv4XAFu8fHz+GAQqCNqTxKCJ0ZZzsgzJtKs+12rv7dZC
 EvRt0jEt+DgwvmEy7j250TEbcI9rMaQuny8yt2j9sNKH/m9bW0BjptCtwghDoL89
 0D6qqicHbA+dJNq8/kDyxV6xC2Git2Ck0fpOfiU7YzhAFECZc/DkidvXa1keMUay
 usspO+YHOjDtlq0zJ0xixbxCseJfrj4habieVKZ/CnAvb84082ZiLcMxFqop/ewB
 WHKNB0O2+P78xoa7/Be6tp/w1HaaW8ZHvkPicD9d4khKJrXAKLNc/Xny4OqRT14z
 sWWaFuNjC7kIUT15EAQNj0wgymA7XcTL9gM1uuSO95PN+M3j4CleApzEvR3dn9FX
 gmoxuwVfVsJKKcwo6WFByqzu03kuSladEFasSHQAJbh3jyU9LUY=
 =Y/po
 -----END PGP SIGNATURE-----

Merge tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix accounting of global block reserve size when block group tree is
   enabled

 - the async discard has been enabled in 6.2 unconditionally, but for
   zoned mode it does not make that much sense to do it asynchronously
   as the zones are reset as needed

 - error handling and proper error value propagation fixes

* tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: check for commit error at btrfs_attach_transaction_barrier()
  btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
  btrfs: remove BUG_ON()'s in add_new_free_space()
  btrfs: account block group tree when calculating global reserve size
  btrfs: zoned: do not enable async discard
2023-07-27 11:44:08 -07:00
Linus Torvalds
379e66711b memblock: reset memblock.reserved to system init state to prevent UAF
A call to memblock_free() or memblock_phys_free() issued after memblock
 data is discarded will result in use after free in
 memblock_isolate_range().
 
 When CONFIG_KASAN is enabled, this will cause a panic early in boot.
 Without CONFIG_KASAN, there is a chance that memblock_isolate_range() might
 scribble on memory that is now in use by somebody else.
 
 Avoid those issues by making sure that memblock_discard points
 memblock.reserved.regions back at the static buffer.
 
 If memblock_free() or memblock_phys_free() is called after memblock memory
 is discarded, that will print a warning in memblock_remove_region().
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmTB94cQHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kesHB/4rNvGFGEI8LFxooARLt8glcv0Hn7oJ+z3L
 Xyczw1ZkglT3DEYsoY78bSriddWPqrV3wWkr+p2NYXPBJWgQZ6t3DRZviqzXcj2l
 Ew2XwLAfT6Vay1eqEFfJJvkGg27QLhnmJPnjDzCWweiXUaR5xOESwKCBmZBWeXUU
 t5EFJMIXLVEoBDLGW5kk+Q4RZDqhU/sJWDqf4ciWQ5vDS8OFTr56hfth7T8XoMxm
 BPlC21+cEJUWrbb1gAJUMbIERTzvYg8odZqSAESlHyNyDEtYjyLce5W6HA6zHK+H
 2gqiti+Pd1OyHbJUc1lN7iRTE8FJ7DQcBr6H9sk81Po5af02Ky7m
 =FRx8
 -----END PGP SIGNATURE-----

Merge tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
 "A call to memblock_free() or memblock_phys_free() issued after
  memblock data is discarded will result in use after free in
  memblock_isolate_range().

  Avoid those issues by making sure that memblock_discard points
  memblock.reserved.regions back at the static buffer"

* tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm,memblock: reset memblock.reserved to system init state to prevent UAF
2023-07-27 11:37:34 -07:00
Jann Horn
657b514695 mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
lock_vma_under_rcu() tries to guarantee that __anon_vma_prepare() can't
be called in the VMA-locked page fault path by ensuring that
vma->anon_vma is set.

However, this check happens before the VMA is locked, which means a
concurrent move_vma() can concurrently call unlink_anon_vmas(), which
disassociates the VMA's anon_vma.

This means we can get UAF in the following scenario:

  THREAD 1                   THREAD 2
  ========                   ========
  <page fault>
    lock_vma_under_rcu()
      rcu_read_lock()
      mas_walk()
      check vma->anon_vma

                             mremap() syscall
                               move_vma()
                                vma_start_write()
                                 unlink_anon_vmas()
                             <syscall end>

    handle_mm_fault()
      __handle_mm_fault()
        handle_pte_fault()
          do_pte_missing()
            do_anonymous_page()
              anon_vma_prepare()
                __anon_vma_prepare()
                  find_mergeable_anon_vma()
                    mas_walk() [looks up VMA X]

                             munmap() syscall (deletes VMA X)

                    reusable_anon_vma() [called on freed VMA X]

This is a security bug if you can hit it, although an attacker would
have to win two races at once where the first race window is only a few
instructions wide.

This patch is based on some previous discussion with Linus Torvalds on
the security list.

Cc: stable@vger.kernel.org
Fixes: 5e31275cc9 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-27 11:13:22 -07:00
Fedor Pchelkin
de52e17326 tipc: stop tipc crypto on failure in tipc_node_create
If tipc_link_bc_create() fails inside tipc_node_create() for a newly
allocated tipc node then we should stop its tipc crypto and free the
resources allocated with a call to tipc_crypto_start().

As the node ref is initialized to one to that point, just put the ref on
tipc_link_bc_create() error case that would lead to tipc_node_free() be
eventually executed and properly clean the node and its crypto resources.

Found by Linux Verification Center (linuxtesting.org).

Fixes: cb8092d70a ("tipc: move bc link creation back to tipc_node_create")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20230725214628.25246-1-pchelkin@ispras.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27 11:45:05 +02:00
Kuniyuki Iwashima
ecb4534b6a af_unix: Terminate sun_path when bind()ing pathname socket.
kernel test robot reported slab-out-of-bounds access in strlen(). [0]

Commit 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
removed unix_mkname_bsd() call in unix_bind_bsd().

If sunaddr->sun_path is not terminated by user and we don't enable
CONFIG_INIT_STACK_ALL_ZERO=y, strlen() will do the out-of-bounds access
during file creation.

Let's go back to strlen()-with-sockaddr_storage way and pack all 108
trickiness into unix_mkname_bsd() with bold comments.

[0]:
BUG: KASAN: slab-out-of-bounds in strlen (lib/string.c:?)
Read of size 1 at addr ffff000015492777 by task fortify_strlen_/168

CPU: 0 PID: 168 Comm: fortify_strlen_ Not tainted 6.5.0-rc1-00333-g3329b603ebba #16
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
 show_stack (arch/arm64/kernel/stacktrace.c:242)
 dump_stack_lvl (lib/dump_stack.c:107)
 print_report (mm/kasan/report.c:365 mm/kasan/report.c:475)
 kasan_report (mm/kasan/report.c:590)
 __asan_report_load1_noabort (mm/kasan/report_generic.c:378)
 strlen (lib/string.c:?)
 getname_kernel (./include/linux/fortify-string.h:? fs/namei.c:226)
 kern_path_create (fs/namei.c:3926)
 unix_bind (net/unix/af_unix.c:1221 net/unix/af_unix.c:1324)
 __sys_bind (net/socket.c:1792)
 __arm64_sys_bind (net/socket.c:1801)
 invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52)
 el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147)
 do_el0_svc (arch/arm64/kernel/syscall.c:189)
 el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648)
 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?)
 el0t_64_sync (arch/arm64/kernel/entry.S:591)

Allocated by task 168:
 kasan_set_track (mm/kasan/common.c:45 mm/kasan/common.c:52)
 kasan_save_alloc_info (mm/kasan/generic.c:512)
 __kasan_kmalloc (mm/kasan/common.c:383)
 __kmalloc (mm/slab_common.c:? mm/slab_common.c:998)
 unix_bind (net/unix/af_unix.c:257 net/unix/af_unix.c:1213 net/unix/af_unix.c:1324)
 __sys_bind (net/socket.c:1792)
 __arm64_sys_bind (net/socket.c:1801)
 invoke_syscall (arch/arm64/kernel/syscall.c:? arch/arm64/kernel/syscall.c:52)
 el0_svc_common (./include/linux/thread_info.h:127 arch/arm64/kernel/syscall.c:147)
 do_el0_svc (arch/arm64/kernel/syscall.c:189)
 el0_svc (./arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:648)
 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:?)
 el0t_64_sync (arch/arm64/kernel/entry.S:591)

The buggy address belongs to the object at ffff000015492700
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes to the right of
 allocated 119-byte region [ffff000015492700, ffff000015492777)

The buggy address belongs to the physical page:
page:00000000aeab52ba refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x55492
anon flags: 0x3fffc0000000200(slab|node=0|zone=0|lastcpupid=0xffff)
page_type: 0xffffffff()
raw: 03fffc0000000200 ffff0000084018c0 fffffc00003d0e00 0000000000000005
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff000015492600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff000015492680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff000015492700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 fc
                                                             ^
 ffff000015492780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff000015492800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 06d4c8a808 ("af_unix: Fix fortify_panic() in unix_bind_bsd().")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202307262110.659e5e8-oliver.sang@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230726190828.47874-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27 11:36:55 +02:00
Yuanjun Gong
e46e06ffc6 tipc: check return value of pskb_trim()
goto free_skb if an unexpected result is returned by pskb_tirm()
in tipc_crypto_rcv_complete().

Fixes: fc1b6d6de2 ("tipc: introduce TIPC encryption & authentication")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230725064810.5820-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27 10:46:05 +02:00
Yuanjun Gong
5c85f70657 benet: fix return value check in be_lancer_xmit_workarounds()
in be_lancer_xmit_workarounds(), it should go to label 'tx_drop'
if an unexpected value is returned by pskb_trim().

Fixes: 93040ae5cc ("be2net: Fix to trim skb for padded vlan packets to workaround an ASIC Bug")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Link: https://lore.kernel.org/r/20230725032726.15002-1-ruc_gongyuanjun@163.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-27 10:31:38 +02:00
Jakub Kicinski
ff0df20827 netfilter pull request 2023-07-26
-----BEGIN PGP SIGNATURE-----
 
 iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTBN7QNHGZ3QHN0cmxl
 bi5kZQAKCRBwkajZrV/2AMg4D/wLs+Nm4XZmvz3ZOtgbHrw3xSbkLgJ563cCGwHw
 1k4/726FrfbVHqMvOWgHNVdmsVCcw0jeLVddlIN3QmbMlxn2YUwQ16nRg6SUMeKZ
 u/9KsI0wLr9zGKJxiUDWhe+7An3oY5G/7uFngJZ1M/vbGTxrl1GqOEU7bMvkm/G/
 08z8/ho9Qv8CbPUfn4ZcYfxDCPKTB74O0UZiItHDAp+tQcD729xvTZ+AGOPe+632
 YifWe7FzY+SUNu/sesr8kyeMU0UPsxETO7pnvgn5PJ3osLDQB1mxj+Kpa5YSuQ4H
 3gO2S6T07iQZ74//aUTzexEzss4HNsdXKDPcTfXyyiZPJsfVkZRuKTcuA16bGt0l
 zCVzCz2Aj5brZEjYhFlmgCnWzlnWDNGHJF7WBxF9UAHoEqXQa3h4im2RTj9/8RvZ
 ZRXCZnmL+UIr5k3m5NzXYDM7vWxHvavNbKRl594XVq1fI6GdSGGQ9SOyPkKFUYQt
 BYxDsbg9RY/piZt7vH+YQfmjuQ8sCkxqPEqJvSzy6U/TVtTOdsjfCCeijSrEIlvg
 i/B+P24GI2dZW09trHfVVcn5YQc8/HkzCr029BcmfSu9+4+GeTmjMArSJfO1hThd
 d3IpsNye9OgdhxsR75kZusCoZpRfuEtySKdkfdzvvAWLqPh8nNzhIzGx6WHmcTt6
 TYYquA==
 =4wB/
 -----END PGP SIGNATURE-----

Merge tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter fixes for net

1. On-demand overlap detection in 'rbtree' set can cause memory leaks.
   This is broken since 6.2.

2. An earlier fix in 6.4 to address an imbalance in refcounts during
   transaction error unwinding was incomplete, from Pablo Neira.

3. Disallow adding a rule to a deleted chain, also from Pablo.
   Broken since 5.9.

* tag 'nf-23-07-26' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
  netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
  netfilter: nft_set_rbtree: fix overlap expiration walk
====================

Link: https://lore.kernel.org/r/20230726152524.26268-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-26 22:18:00 -07:00