Commit graph

76094 commits

Author SHA1 Message Date
Paolo Abeni
4934446297 linux-can-next-for-6.9-20240220
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXUZiITHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAoOKI+ei28b8KLB/9MKkUjbbBh9nXezyWdXnulj5jpHWlJ
 Xa7Sz7e+Gw5HbpK1/RF3Mb3/uf5D+DTMa2jjUJhezGCugW6ugoFapDC1bJxdafIN
 pAZQG/7EYi4TqHEO3/aS5sMh3pISs29COnmHHdQCYfyTMZPKGcDkJuwa7POhHhR1
 zrjavD0N2ihBfhoadlT+GQ9QYu+JyWnjrB27hSznsktW9Jeju1u6F9nvOXn60aZU
 e7QXgsKe94YXLEed3hj7buPAIirY+tLKIpbw7TtJJwk6EBnnK17S+2wydR0N7yWK
 SSsaKJxZCiiaoYkl9chkKTyqh2I3qa/HsxUrFY3TGx5VMhWLiiH/r5eI
 =pIjg
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.9-20240220' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-02-20

this is a pull request of 9 patches for net-next/master.

The first patch is by Francesco Dolcini and removes a redundant check
for pm_clock_support from the m_can driver.

Martin Hundebøll contributes 3 patches to the m_can/tcan4x5x driver to
allow resume upon RX of a CAN frame.

3 patches by Srinivas Goud add support for ECC statistics to the
xilinx_can driver.

The last 2 patches are by Oliver Hartkopp and me, target the CAN RAW
protocol and fix an error in the getsockopt() for CAN-XL introduced in
the previous pull request to net-next (linux-can-next-for-6.9-20240213).

linux-can-next-for-6.9-20240220

* tag 'linux-can-next-for-6.9-20240220' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: raw: raw_getsockopt(): reduce scope of err
  can: raw: fix getsockopt() for new CAN_RAW_XL_VCID_OPTS
  can: xilinx_can: Add ethtool stats interface for ECC errors
  can: xilinx_can: Add ECC support
  dt-bindings: can: xilinx_can: Add 'xlnx,has-ecc' optional property
  can: tcan4x5x: support resuming from rx interrupt signal
  can: m_can: allow keeping the transceiver running in suspend
  dt-bindings: can: tcan4x5x: Document the wakeup-source flag
  can: m_can: remove redundant check for pm_clock_support
====================

Link: https://lore.kernel.org/r/20240220085130.2936533-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 15:32:45 +01:00
Eric Dumazet
5d4cc87414 net: reorganize "struct sock" fields
Last major reorg happened in commit 9115e8cd2a ("net: reorganize
struct sock for better data locality")

Since then, many changes have been done.

Before SO_PEEK_OFF support is added to TCP, we need
to move sk_peek_off to a better location.

It is time to make another pass, and add six groups,
without explicit alignment.

- sock_write_rx (following sk_refcnt) read-write fields in rx path.
- sock_read_rx read-mostly fields in rx path.
- sock_read_rxtx read-mostly fields in both rx and tx paths.
- sock_write_rxtx read-write fields in both rx and tx paths.
- sock_write_tx read-write fields in tx paths.
- sock_read_tx read-mostly fields in tx paths.

Results on TCP_RR benchmarks seem to show a gain (4 to 5 %).

It is possible UDP needs a change, because sk_peek_off
shares a cache line with sk_receive_queue.
If this the case, we can exchange roles of sk->sk_receive
and up->reader_queue queues.

After this change, we have the following layout:

struct sock {
	struct sock_common         __sk_common;          /*     0  0x88 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	__u8                       __cacheline_group_begin__sock_write_rx[0]; /*  0x88     0 */
	atomic_t                   sk_drops;             /*  0x88   0x4 */
	__s32                      sk_peek_off;          /*  0x8c   0x4 */
	struct sk_buff_head        sk_error_queue;       /*  0x90  0x18 */
	struct sk_buff_head        sk_receive_queue;     /*  0xa8  0x18 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct {
		atomic_t           rmem_alloc;           /*  0xc0   0x4 */
		int                len;                  /*  0xc4   0x4 */
		struct sk_buff *   head;                 /*  0xc8   0x8 */
		struct sk_buff *   tail;                 /*  0xd0   0x8 */
	} sk_backlog;                                    /*  0xc0  0x18 */
	struct {
		atomic_t                   rmem_alloc;           /*     0   0x4 */
		int                        len;                  /*   0x4   0x4 */
		struct sk_buff *           head;                 /*   0x8   0x8 */
		struct sk_buff *           tail;                 /*  0x10   0x8 */

		/* size: 24, cachelines: 1, members: 4 */
		/* last cacheline: 24 bytes */
	};

	__u8                       __cacheline_group_end__sock_write_rx[0]; /*  0xd8     0 */
	__u8                       __cacheline_group_begin__sock_read_rx[0]; /*  0xd8     0 */
	rcu *                      sk_rx_dst;            /*  0xd8   0x8 */
	int                        sk_rx_dst_ifindex;    /*  0xe0   0x4 */
	u32                        sk_rx_dst_cookie;     /*  0xe4   0x4 */
	unsigned int               sk_ll_usec;           /*  0xe8   0x4 */
	unsigned int               sk_napi_id;           /*  0xec   0x4 */
	u16                        sk_busy_poll_budget;  /*  0xf0   0x2 */
	u8                         sk_prefer_busy_poll;  /*  0xf2   0x1 */
	u8                         sk_userlocks;         /*  0xf3   0x1 */
	int                        sk_rcvbuf;            /*  0xf4   0x4 */
	rcu *                      sk_filter;            /*  0xf8   0x8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	union {
		rcu *              sk_wq;                /* 0x100   0x8 */
		struct socket_wq * sk_wq_raw;            /* 0x100   0x8 */
	};                                               /* 0x100   0x8 */
	union {
		rcu *                      sk_wq;                /*     0   0x8 */
		struct socket_wq *         sk_wq_raw;            /*     0   0x8 */
	};

	void                       (*sk_data_ready)(struct sock *); /* 0x108   0x8 */
	long                       sk_rcvtimeo;          /* 0x110   0x8 */
	int                        sk_rcvlowat;          /* 0x118   0x4 */
	__u8                       __cacheline_group_end__sock_read_rx[0]; /* 0x11c     0 */
	__u8                       __cacheline_group_begin__sock_read_rxtx[0]; /* 0x11c     0 */
	int                        sk_err;               /* 0x11c   0x4 */
	struct socket *            sk_socket;            /* 0x120   0x8 */
	struct mem_cgroup *        sk_memcg;             /* 0x128   0x8 */
	rcu *                      sk_policy[2];         /* 0x130  0x10 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	__u8                       __cacheline_group_end__sock_read_rxtx[0]; /* 0x140     0 */
	__u8                       __cacheline_group_begin__sock_write_rxtx[0]; /* 0x140     0 */
	socket_lock_t              sk_lock;              /* 0x140  0x20 */
	u32                        sk_reserved_mem;      /* 0x160   0x4 */
	int                        sk_forward_alloc;     /* 0x164   0x4 */
	u32                        sk_tsflags;           /* 0x168   0x4 */
	__u8                       __cacheline_group_end__sock_write_rxtx[0]; /* 0x16c     0 */
	__u8                       __cacheline_group_begin__sock_write_tx[0]; /* 0x16c     0 */
	int                        sk_write_pending;     /* 0x16c   0x4 */
	atomic_t                   sk_omem_alloc;        /* 0x170   0x4 */
	int                        sk_sndbuf;            /* 0x174   0x4 */
	int                        sk_wmem_queued;       /* 0x178   0x4 */
	refcount_t                 sk_wmem_alloc;        /* 0x17c   0x4 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	unsigned long              sk_tsq_flags;         /* 0x180   0x8 */
	union {
		struct sk_buff *   sk_send_head;         /* 0x188   0x8 */
		struct rb_root     tcp_rtx_queue;        /* 0x188   0x8 */
	};                                               /* 0x188   0x8 */
	union {
		struct sk_buff *           sk_send_head;         /*     0   0x8 */
		struct rb_root             tcp_rtx_queue;        /*     0   0x8 */
	};

	struct sk_buff_head        sk_write_queue;       /* 0x190  0x18 */
	u32                        sk_dst_pending_confirm; /* 0x1a8   0x4 */
	u32                        sk_pacing_status;     /* 0x1ac   0x4 */
	struct page_frag           sk_frag;              /* 0x1b0  0x10 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	struct timer_list          sk_timer;             /* 0x1c0  0x28 */

	/* XXX last struct has 4 bytes of padding */

	unsigned long              sk_pacing_rate;       /* 0x1e8   0x8 */
	atomic_t                   sk_zckey;             /* 0x1f0   0x4 */
	atomic_t                   sk_tskey;             /* 0x1f4   0x4 */
	__u8                       __cacheline_group_end__sock_write_tx[0]; /* 0x1f8     0 */
	__u8                       __cacheline_group_begin__sock_read_tx[0]; /* 0x1f8     0 */
	unsigned long              sk_max_pacing_rate;   /* 0x1f8   0x8 */
	/* --- cacheline 8 boundary (512 bytes) --- */
	long                       sk_sndtimeo;          /* 0x200   0x8 */
	u32                        sk_priority;          /* 0x208   0x4 */
	u32                        sk_mark;              /* 0x20c   0x4 */
	rcu *                      sk_dst_cache;         /* 0x210   0x8 */
	netdev_features_t          sk_route_caps;        /* 0x218   0x8 */
	u16                        sk_gso_type;          /* 0x220   0x2 */
	u16                        sk_gso_max_segs;      /* 0x222   0x2 */
	unsigned int               sk_gso_max_size;      /* 0x224   0x4 */
	gfp_t                      sk_allocation;        /* 0x228   0x4 */
	u32                        sk_txhash;            /* 0x22c   0x4 */
	u8                         sk_pacing_shift;      /* 0x230   0x1 */
	bool                       sk_use_task_frag;     /* 0x231   0x1 */
	__u8                       __cacheline_group_end__sock_read_tx[0]; /* 0x232     0 */
	u8                         sk_gso_disabled:1;    /* 0x232: 0 0x1 */
	u8                         sk_kern_sock:1;       /* 0x232:0x1 0x1 */
	u8                         sk_no_check_tx:1;     /* 0x232:0x2 0x1 */
	u8                         sk_no_check_rx:1;     /* 0x232:0x3 0x1 */

	/* XXX 4 bits hole, try to pack */

	u8                         sk_shutdown;          /* 0x233   0x1 */
	u16                        sk_type;              /* 0x234   0x2 */
	u16                        sk_protocol;          /* 0x236   0x2 */
	unsigned long              sk_lingertime;        /* 0x238   0x8 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	struct proto *             sk_prot_creator;      /* 0x240   0x8 */
	rwlock_t                   sk_callback_lock;     /* 0x248   0x8 */
	int                        sk_err_soft;          /* 0x250   0x4 */
	u32                        sk_ack_backlog;       /* 0x254   0x4 */
	u32                        sk_max_ack_backlog;   /* 0x258   0x4 */
	kuid_t                     sk_uid;               /* 0x25c   0x4 */
	spinlock_t                 sk_peer_lock;         /* 0x260   0x4 */
	int                        sk_bind_phc;          /* 0x264   0x4 */
	struct pid *               sk_peer_pid;          /* 0x268   0x8 */
	const struct cred  *       sk_peer_cred;         /* 0x270   0x8 */
	ktime_t                    sk_stamp;             /* 0x278   0x8 */
	/* --- cacheline 10 boundary (640 bytes) --- */
	int                        sk_disconnects;       /* 0x280   0x4 */
	u8                         sk_txrehash;          /* 0x284   0x1 */
	u8                         sk_clockid;           /* 0x285   0x1 */
	u8                         sk_txtime_deadline_mode:1; /* 0x286: 0 0x1 */
	u8                         sk_txtime_report_errors:1; /* 0x286:0x1 0x1 */
	u8                         sk_txtime_unused:6;   /* 0x286:0x2 0x1 */

	/* XXX 1 byte hole, try to pack */

	void *                     sk_user_data;         /* 0x288   0x8 */
	void *                     sk_security;          /* 0x290   0x8 */
	struct sock_cgroup_data    sk_cgrp_data;         /* 0x298   0x8 */
	void                       (*sk_state_change)(struct sock *); /* 0x2a0   0x8 */
	void                       (*sk_write_space)(struct sock *); /* 0x2a8   0x8 */
	void                       (*sk_error_report)(struct sock *); /* 0x2b0   0x8 */
	int                        (*sk_backlog_rcv)(struct sock *, struct sk_buff *); /* 0x2b8   0x8 */
	/* --- cacheline 11 boundary (704 bytes) --- */
	void                       (*sk_destruct)(struct sock *); /* 0x2c0   0x8 */
	rcu *                      sk_reuseport_cb;      /* 0x2c8   0x8 */
	rcu *                      sk_bpf_storage;       /* 0x2d0   0x8 */
	struct callback_head       sk_rcu __attribute__((__aligned__(8))); /* 0x2d8  0x10 */
	netns_tracker              ns_tracker;           /* 0x2e8   0x8 */

	/* size: 752, cachelines: 12, members: 105 */
	/* sum members: 749, holes: 1, sum holes: 1 */
	/* sum bitfield members: 12 bits, bit holes: 1, sum bit holes: 4 bits */
	/* paddings: 1, sum paddings: 4 */
	/* forced alignments: 1 */
	/* last cacheline: 48 bytes */
};

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240216162006.2342759-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 12:01:45 +01:00
Colin Ian King
465c1abcb6 net: tcp: Remove redundant initialization of variable len
The variable len being initialized with a value that is never read, an
if statement is initializing it in both paths of the if statement.
The initialization is redundant and can be removed.

Cleans up clang scan build warning:
net/ipv4/tcp_ao.c:512:11: warning: Value stored to 'len' during its
initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/20240216125443.2107244-1-colin.i.king@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 11:40:15 +01:00
Marc Kleine-Budde
00bf80c437 can: raw: raw_getsockopt(): reduce scope of err
Reduce the scope of the variable "err" to the individual cases. This
is to avoid the mistake of setting "err" in the mistaken belief that
it will be evaluated later.

Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20240220-raw-setsockopt-v1-1-7d34cb1377fc@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-20 09:40:46 +01:00
Mina Almasry
21d2e6737c net: add netmem to skb_frag_t
Use struct netmem* instead of page in skb_frag_t. Currently struct
netmem* is always a struct page underneath, but the abstraction
allows efforts to add support for skb frags not backed by pages.

There is unfortunately 1 instance where the skb_frag_t is assumed to be
a exactly a bio_vec in kcm. For this case, WARN_ON_ONCE and return error
before doing a cast.

Add skb[_frag]_fill_netmem_*() and skb_add_rx_frag_netmem() helpers so
that the API can be used to create netmem skbs.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20 09:22:58 +01:00
Oliver Hartkopp
c8fba5d6df can: raw: fix getsockopt() for new CAN_RAW_XL_VCID_OPTS
The code for the CAN_RAW_XL_VCID_OPTS getsockopt() was incompletely adopted
from the CAN_RAW_FILTER getsockopt().

Add the missing put_user() and return statements.

Flagged by Smatch.

Fixes: c83c22ec14 ("can: canxl: add virtual CAN network identifier support")
Reported-by: Simon Horman <horms@kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20240219200021.12113-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-20 08:20:42 +01:00
Breno Leitao
74293ea1c4 net: sysfs: Do not create sysfs for non BQL device
Creation of sysfs entries is expensive, mainly for workloads that
constantly creates netdev and netns often.

Do not create BQL sysfs entries for devices that don't need,
basically those that do not have a real queue, i.e, devices that has
NETIF_F_LLTX and IFF_NO_QUEUE, such as `lo` interface.

This will remove the /sys/class/net/eth0/queues/tx-X/byte_queue_limits/
directory for these devices.

In the example below, eth0 has the `byte_queue_limits` directory but not
`lo`.

	# ls /sys/class/net/lo/queues/tx-0/
	traffic_class  tx_maxrate  tx_timeout  xps_cpus  xps_rxqs

	# ls /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
	hold_time  inflight  limit  limit_max  limit_min

This also removes the #ifdefs, since we can also use netdev_uses_bql() to
check if the config is enabled. (as suggested by Jakub).

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240216094154.3263843-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19 12:30:44 -08:00
Lorenzo Bianconi
f853fa5c54 net: page_pool: fix recycle stats for system page_pool allocator
Use global percpu page_pool_recycle_stats counter for system page_pool
allocator instead of allocating a separate percpu variable for each
(also percpu) page pool instance.

Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19 12:30:27 -08:00
Alexander Lobakin
56ef27e3ab page_pool: disable direct recycling based on pool->cpuid on destroy
Now that direct recycling is performed basing on pool->cpuid when set,
memory leaks are possible:

1. A pool is destroyed.
2. Alloc cache is emptied (it's done only once).
3. pool->cpuid is still set.
4. napi_pp_put_page() does direct recycling basing on pool->cpuid.
5. Now alloc cache is not empty, but it won't ever be freed.

In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to
make sure no direct recycling will be possible after emptying the cache.
This involves a bit of overhead as pool->cpuid now must be accessed
via READ_ONCE() to avoid partial reads.
Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling()
to reflect what it actually does and unexport it.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-19 11:48:00 -08:00
Kees Cook
1e63e5a813 net: sched: Annotate struct tc_pedit with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct tc_pedit.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-19 10:58:24 +00:00
Breno Leitao
ea7f3cfaa5 net: bql: allow the config to be disabled
It is impossible to disable BQL individually today, since there is no
prompt for the Kconfig entry, so, the BQL is always enabled if SYSFS is
enabled.

Create a prompt entry for BQL, so, it could be enabled or disabled at
build time independently of SYSFS.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:19:21 +00:00
Geert Uytterhoeven
21bd52ea38 tcp: Spelling s/curcuit/circuit/
Fix a misspelling of "circuit".

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16 10:12:00 +00:00
Alexander Gordeev
2210c5485e net/iucv: fix virtual vs physical address confusion
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.

Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16 08:45:17 +00:00
Arınç ÜNAL
ae94dc25fd net: dsa: remove OF-based MDIO bus registration from DSA core
The code block under the "!ds->user_mii_bus && ds->ops->phy_read" check
under dsa_switch_setup() populates ds->user_mii_bus. The use of
ds->user_mii_bus is inappropriate when the MDIO bus of the switch is
described on the device tree [1].

For this reason, use this code block only for switches [with MDIO bus]
probed on platform_data, and OF which the switch MDIO bus isn't described
on the device tree. Therefore, remove OF-based MDIO bus registration as
it's useless for these cases.

These subdrivers which control switches [with MDIO bus] probed on OF, will
lose the ability to register the MDIO bus OF-based:

drivers/net/dsa/b53/b53_common.c
drivers/net/dsa/lan9303-core.c
drivers/net/dsa/vitesse-vsc73xx-core.c

These subdrivers let the DSA core driver register the bus:
- ds->ops->phy_read() and ds->ops->phy_write() are present.
- ds->user_mii_bus is not populated.

The commit fe7324b932 ("net: dsa: OF-ware slave_mii_bus") which brought
OF-based MDIO bus registration on the DSA core driver is reasonably recent
and, in this time frame, there have been no device trees in the Linux
repository that started describing the MDIO bus, or dt-bindings defining
the MDIO bus for the switches these subdrivers control. So I don't expect
any devices to be affected.

The logic we encourage is that all subdrivers should register the switch
MDIO bus on their own [2]. And, for subdrivers which control switches [with
MDIO bus] probed on OF, this logic must be followed to support all cases
properly:

No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if "interrupt-controller" is defined at
the switch node. This case should only be covered for the switches which
their dt-bindings documentation didn't document the MDIO bus from the
start. This is to keep supporting the device trees that do not describe the
MDIO bus on the device tree but the MDIO bus is being used nonetheless.

Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if ["interrupt-controller" is defined at
the switch node and "interrupts" is defined at the PHY nodes under the
switch MDIO bus node].

Switch MDIO bus defined but explicitly disabled: If the device tree says
status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all.
Instead, just exit as early as possible and do not call any MDIO API.

After all subdrivers that control switches with MDIO buses are made to
register the MDIO buses on their own, we will be able to get rid of
dsa_switch_ops :: phy_read() and :: phy_write(), and the code block for
registering the MDIO bus on the DSA core driver.

Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/ [1]
Link: https://lore.kernel.org/netdev/20240103184459.dcbh57wdnlox6w7d@skbuf/ [2]
Suggested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240213-for-netnext-dsa-mdio-bus-v2-1-0ff6f4823a9e@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 21:30:02 -08:00
Jakub Kicinski
fc906e7922 Merge branch 'for-thermal-genetlink-family-bind-unbind-callbacks'
Stanislaw Gruszka says:

====================
thermal/netlink/intel_hfi: Enable HFI feature only when required

The patchset introduces a new genetlink family bind/unbind callbacks
and thermal/netlink notifications, which allow drivers to send netlink
multicast events based on the presence of actual user-space consumers.
This functionality optimizes resource usage by allowing disabling
of features when not needed.

v1: https://lore.kernel.org/linux-pm/20240131120535.933424-1-stanislaw.gruszka@linux.intel.com//
v2: https://lore.kernel.org/linux-pm/20240206133605.1518373-1-stanislaw.gruszka@linux.intel.com/
v3: https://lore.kernel.org/linux-pm/20240209120625.1775017-1-stanislaw.gruszka@linux.intel.com/
====================

Link: https://lore.kernel.org/r/20240212161615.161935-1-stanislaw.gruszka@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 17:49:24 -08:00
Stanislaw Gruszka
3de21a8990 genetlink: Add per family bind/unbind callbacks
Add genetlink family bind()/unbind() callbacks when adding/removing
multicast group to/from netlink client socket via setsockopt() or
bind() syscall.

They can be used to track if consumers of netlink multicast messages
emerge or disappear. Thus, a client implementing callbacks, can now
send events only when there are active consumers, preventing unnecessary
work when none exist.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240212161615.161935-2-stanislaw.gruszka@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 17:49:16 -08:00
Jakub Kicinski
73be9a3aab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

net/core/dev.c
  9f30831390 ("net: add rcu safety to rtnl_prop_list_size()")
  723de3ebef ("net: free altname using an RCU callback")

net/unix/garbage.c
  11498715f2 ("af_unix: Remove io_uring code for GC.")
  25236c91b5 ("af_unix: Fix task hung while purging oob_skb in GC.")

drivers/net/ethernet/renesas/ravb_main.c
  ed4adc0720 ("net: ravb: Count packets instead of descriptors in GbEth RX path"
)
  c2da940857 ("ravb: Add Rx checksum offload support for GbEth")

net/mptcp/protocol.c
  bdd70eb689 ("mptcp: drop the push_pending field")
  28e5c13805 ("mptcp: annotate lockless accesses around read-mostly fields")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 16:20:04 -08:00
Linus Torvalds
4f5e5092fd Including fixes from can, wireless and netfilter.
Current release - regressions:
 
  - af_unix: fix task hung while purging oob_skb in GC
 
  - pds_core: do not try to run health-thread in VF path
 
 Current release - new code bugs:
 
  - sched: act_mirred: don't zero blockid when net device is being deleted
 
 Previous releases - regressions:
 
  - netfilter:
    - nat: restore default DNAT behavior
    - nf_tables: fix bidirectional offload, broken when unidirectional
      offload support was added
 
  - openvswitch: limit the number of recursions from action sets
 
  - eth: i40e: do not allow untrusted VF to remove administratively
    set MAC address
 
 Previous releases - always broken:
 
  - tls: fix races and bugs in use of async crypto
 
  - mptcp: prevent data races on some of the main socket fields,
    fix races in fastopen handling
 
  - dpll: fix possible deadlock during netlink dump operation
 
  - dsa: lan966x: fix crash when adding interface under a lag
    when some of the ports are disabled
 
  - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
 
 Misc:
 
  - handful of fixes and reliability improvements for selftests
 
  - fix sysfs documentation missing net/ in paths
 
  - finish the work of squashing the missing MODULE_DESCRIPTION()
    warnings in networking
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmXOQ6AACgkQMUZtbf5S
 IrsUrBAAhFMdcrJwLO73+ODfix4okmpOVPLvnW8DxsT46F9Uex3oP2mR7W5CtSp9
 yr10n5Ce2rjRUu8T5D5XGkg0dHFFF887Ngs3PLxaZTEb13UcfxANZ+jjyyVB8XPf
 HEODBqzJuFBkh4/qSY2/VEDjQW57JopyVVitC9ktF7yhJbZfFfEEf68L0DYqijF4
 MzsGgcHenm2UuunOppp7S5yoWRHgl0IPr6Stz0Dw/AacqJrGl0sicuobTARvcGXP
 G/0nLDerbcr+JhbgQUmKX3t3hxxwG9zyJmgyuX285NTPQagbGvYM5gQHLREdAwLF
 8N2r2uoD0cPv00PQee/7/kfepLOiIkKthX9YEutT4fjOqtQ/CwSForXDqe7oI3rs
 +KCMDn3LN/JECu9i8zUJUxdt2LBy0TPu7XrgZZuXbOEnAIKBjFQc59dtBE1Z2ROJ
 r10Q4aR0xjaQ1yErl+mu/WP7zQpJTJb0PQCuy8zSYl3b64cbyJb+UqpLcXaizY8G
 cT6XlTEpRvP21ULxU71/UyBLnYNX3msDTlfZRs2gVZEC1dt4WuM55BZmCl+mMvEd
 nuAkaPyp61EiUNSVx+eeZ5r91qFuwDo+pPyAta4PNNEzeVx2CZI0RzeFrrFzJevB
 DigB69R85zs8lhDJEC129GDNgGZpbQOttEA5GzVYFFsoxBS1ygk=
 =YRod
 -----END PGP SIGNATURE-----

Merge tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can, wireless and netfilter.

  Current release - regressions:

   - af_unix: fix task hung while purging oob_skb in GC

   - pds_core: do not try to run health-thread in VF path

  Current release - new code bugs:

   - sched: act_mirred: don't zero blockid when net device is being
     deleted

  Previous releases - regressions:

   - netfilter:
      - nat: restore default DNAT behavior
      - nf_tables: fix bidirectional offload, broken when unidirectional
        offload support was added

   - openvswitch: limit the number of recursions from action sets

   - eth: i40e: do not allow untrusted VF to remove administratively set
     MAC address

  Previous releases - always broken:

   - tls: fix races and bugs in use of async crypto

   - mptcp: prevent data races on some of the main socket fields, fix
     races in fastopen handling

   - dpll: fix possible deadlock during netlink dump operation

   - dsa: lan966x: fix crash when adding interface under a lag when some
     of the ports are disabled

   - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock

  Misc:

   - a handful of fixes and reliability improvements for selftests

   - fix sysfs documentation missing net/ in paths

   - finish the work of squashing the missing MODULE_DESCRIPTION()
     warnings in networking"

* tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
  net: fill in MODULE_DESCRIPTION()s for missing arcnet
  net: fill in MODULE_DESCRIPTION()s for mdio_devres
  net: fill in MODULE_DESCRIPTION()s for ppp
  net: fill in MODULE_DESCRIPTION()s for fddik/skfp
  net: fill in MODULE_DESCRIPTION()s for plip
  net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb
  net: fill in MODULE_DESCRIPTION()s for xen-netback
  net: ravb: Count packets instead of descriptors in GbEth RX path
  pppoe: Fix memory leak in pppoe_sendmsg()
  net: sctp: fix skb leak in sctp_inq_free()
  net: bcmasp: Handle RX buffer allocation failure
  net-timestamp: make sk_tskey more predictable in error path
  selftests: tls: increase the wait in poll_partial_rec_async
  ice: Add check for lport extraction to LAG init
  netfilter: nf_tables: fix bidirectional offload regression
  netfilter: nat: restore default DNAT behavior
  netfilter: nft_set_pipapo: fix missing : in kdoc
  igc: Remove temporary workaround
  igb: Fix string truncation warnings in igb_set_fw_version
  can: netlink: Fix TDCO calculation using the old data bittiming
  ...
2024-02-15 11:39:27 -08:00
Dmitry Antipov
4e45170d9a net: sctp: fix skb leak in sctp_inq_free()
In case of GSO, 'chunk->skb' pointer may point to an entry from
fraglist created in 'sctp_packet_gso_append()'. To avoid freeing
random fraglist entry (and so undefined behavior and/or memory
leak), introduce 'sctp_inq_chunk_free()' helper to ensure that
'chunk->skb' is set to 'chunk->head_skb' (i.e. fraglist head)
before calling 'sctp_chunk_free()', and use the aforementioned
helper in 'sctp_inq_pop()' as well.

Reported-by: syzbot+8bb053b5d63595ab47db@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=0d8351bbe54fd04a492c2daab0164138db008042
Fixes: 90017accff ("sctp: Add GSO support")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20240214082224.10168-1-dmantipov@yandex.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15 07:34:52 -08:00
Alex Henrie
f4bcbf360a net: ipv6/addrconf: clamp preferred_lft to the minimum required
If the preferred lifetime was less than the minimum required lifetime,
ipv6_create_tempaddr would error out without creating any new address.
On my machine and network, this error happened immediately with the
preferred lifetime set to 5 seconds or less, after a few minutes with
the preferred lifetime set to 6 seconds, and not at all with the
preferred lifetime set to 7 seconds. During my investigation, I found a
Stack Exchange post from another person who seems to have had the same
problem: They stopped getting new addresses if they lowered the
preferred lifetime below 3 seconds, and they didn't really know why.

The preferred lifetime is a preference, not a hard requirement. The
kernel does not strictly forbid new connections on a deprecated address,
nor does it guarantee that the address will be disposed of the instant
its total valid lifetime expires. So rather than disable IPv6 privacy
extensions altogether if the minimum required lifetime swells above the
preferred lifetime, it is more in keeping with the user's intent to
increase the temporary address's lifetime to the minimum necessary for
the current network conditions.

With these fixes, setting the preferred lifetime to 5 or 6 seconds "just
works" because the extra fraction of a second is practically
unnoticeable. It's even possible to reduce the time before deprecation
to 1 or 2 seconds by setting /proc/sys/net/ipv6/conf/*/regen_min_advance
and /proc/sys/net/ipv6/conf/*/dad_transmits to 0. I realize that that is
a pretty niche use case, but I know at least one person who would gladly
sacrifice performance and convenience to be sure that they are getting
the maximum possible level of privacy.

Link: https://serverfault.com/a/1031168/310447
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 15:34:40 +01:00
Alex Henrie
a5fcea2d2f net: ipv6/addrconf: introduce a regen_min_advance sysctl
In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC
does not permit the creation of temporary addresses with lifetimes
shorter than that:

> When processing a Router Advertisement with a
> Prefix Information option carrying a prefix for the purposes of
> address autoconfiguration (i.e., the A bit is set), the host MUST
> perform the following steps:

> 5.  A temporary address is created only if this calculated preferred
>     lifetime is greater than REGEN_ADVANCE time units.

However, some users want to change their IPv6 address as frequently as
possible regardless of the RFC's arbitrary minimum lifetime. For the
benefit of those users, add a regen_min_advance sysctl parameter that
can be set to below or above 2 seconds.

Link: https://datatracker.ietf.org/doc/html/rfc8981
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 15:34:40 +01:00
Alex Henrie
2aa8f155b0 net: ipv6/addrconf: ensure that regen_advance is at least 2 seconds
RFC 8981 defines REGEN_ADVANCE as follows:

REGEN_ADVANCE = 2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits * RetransTimer / 1000)

Thus, allowing it to be less than 2 seconds is technically a protocol
violation.

Link: https://datatracker.ietf.org/doc/html/rfc8981#name-defined-protocol-parameters
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 15:34:40 +01:00
Shigeru Yoshida
984328c765 tipc: Cleanup tipc_nl_bearer_add() error paths
Consolidate the error paths of tipc_nl_bearer_add() under the common label
if the function holds rtnl_lock.

Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20240213134058.386123-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 13:18:19 +01:00
Jason Xing
d25f32722f tcp: no need to use acceptable for conn_request
Since tcp_conn_request() always returns zero, there is no need to
keep the dead code. Remove it then.

Link: https://lore.kernel.org/netdev/CANn89iJwx9b2dUGUKFSV3PF=kN5o+kxz3A_fHZZsOS4AnXhBNw@mail.gmail.com/
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240213131205.4309-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 12:57:11 +01:00
Paolo Abeni
d74b23d0c2 netfilter pull request 24-02-15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmXNTiwACgkQ1V2XiooU
 IORw4RAAmr6WYYKyKL9TLXtdxp2c5Aj2BClIrMS/mtLBT9RKjxvL5/m2ePFCvz7N
 /i7Om+dquZ4m5bS8Dk6MO61fhaKEmNWYigvfIYs4fc4qYj5WTV6XMzhY2lCRIgns
 UQXZ0zbb2+BbmsXL/izYcXwM3VMp2l8PLhb/OeGtUtLDMZXF+INXrn3krYLc3TxS
 4UEeLiCwxy8hgGCyS1w73GctfkznQ5vd2Zb6sD6TJ0pG1H4LmhxGDaQPMEtR9DaV
 l+gxC9+Igw6r1Gmv9c1QZ//dvw4Jb+0ZuYEifeD/xqT//M56AKh8UB1/Nil6Kazq
 r/VroMxQcuTJIPcx72F14U94M6r1BVRDIpBjVcpWBCrWjkgaJZkl2tcwfmn8Cihb
 GWRy0zGbYoBynlsseSQUWvfJBGn0D8aFCaoroHYkFfg67Gj8aom5/hIuP2OblN3a
 d+9VQ9FbEkoddv/JAF0Dp6+VVPi6DRxUOj8zC9+Ynl/+AMtx8xZ9B4yUf3n8pEag
 7+OWDEnVHV7aFyfSeBETUQOPLSi+k4wpvp02QilbKIJ8s7Pp4v9KKw3CvHD59nrI
 Ci9Z7PhWICoh+cZXYgradZVbyoJ6iRv2LskG/RlRpHxilZ5os+pcOiUR7dEARX05
 tPRLagMiHsMsy7lsYhe+YBKtYZ1FMxGU+5p63hpkSDUVvOoV+R4=
 =G4r8
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains Netfilter fixes for net:

1) Missing : in kdoc field in nft_set_pipapo.

2) Restore default DNAT behavior When a DNAT rule is configured via
   iptables with different port ranges, from Kyle Swenson.

3) Restore flowtable hardware offload for bidirectional flows
   by setting NF_FLOW_HW_BIDIRECTIONAL flag, from Felix Fietkau.

netfilter pull request 24-02-15

* tag 'nf-24-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix bidirectional offload regression
  netfilter: nat: restore default DNAT behavior
  netfilter: nft_set_pipapo: fix missing : in kdoc
====================

Link: https://lore.kernel.org/r/20240214233818.7946-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 12:48:56 +01:00
Paolo Abeni
f3ac28e1f8 linux-can-fixes-for-6.8-20240214
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXMwlUTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAoOKI+ei28bwDmCACBeVNV2d9mL8AwNoaIiUmOHF8LsclP
 NsRSl4rz/TMDFgO2tX9oUQGLsZG0YTSqJ5dF3qI7zjskBlTBJX0y4fByvQAQ6mU9
 ZhwZMBz3JSS+tuZFIMWqHW1yq2TXoTnx1IzIM5f+D83LWqtP5Jto15lw1Ratrtat
 taZwGwR10cEWO0IFNUx+4c5SGa+gGbEBdr7UBlJU1MdZ9fzo+ByV/H6JrfY1qqEj
 DvraQm/oNCVrSP5dVr1s+0Kqnh1X1ff+6JWs5q2CJDN7E+Ai2cOxrEd2/JP7GANG
 S0UIqH744z3kJDSE+GuQjxF4vbXqX3qfKIP4Q+EYlNvs0oskIQ5ebCsW
 =So6Y
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2024-02-14

this is a pull request of 3 patches for net/master.

the first patch is by Ziqi Zhao and targets the CAN J1939 protocol, it
fixes a potential deadlock by replacing the spinlock by an rwlock.

Oleksij Rempel's patch adds a missing spin_lock_bh() to prevent a
potential Use-After-Free in the CAN J1939's
setsockopt(SO_J1939_FILTER).

Maxime Jayat contributes a patch to fix the transceiver delay
compensation (TDCO) calculation, which is needed for higher CAN-FD bit
rates (usually 2Mbit/s).

* tag 'linux-can-fixes-for-6.8-20240214' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: netlink: Fix TDCO calculation using the old data bittiming
  can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
  can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
====================

Link: https://lore.kernel.org/r/20240214140348.2412776-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 12:31:23 +01:00
Vadim Fedorenko
488b6d91b0 net-timestamp: make sk_tskey more predictable in error path
When SOF_TIMESTAMPING_OPT_ID is used to ambiguate timestamped datagrams,
the sk_tskey can become unpredictable in case of any error happened
during sendmsg(). Move increment later in the code and make decrement of
sk_tskey in error path. This solution is still racy in case of multiple
threads doing snedmsg() over the very same socket in parallel, but still
makes error path much more predictable.

Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240213110428.1681540-1-vadfed@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15 12:04:04 +01:00
Jakub Kicinski
63a3dd6e62 Valentine's day edition, with just few fixes because
that's how we love it ;-)
 
 iwlwifi:
  - correct A3 in A-MSDUs
  - fix crash when operating as AP and running out of station
    slots to use
  - clear link ID to correct some later checks against it
  - fix error codes in SAR table loading
  - fix error path in PPAG table read
 
 mac80211:
  - reload a pointer after SKB may have changed
    (only in certain monitor inject mode scenarios)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmXNCSMACgkQ10qiO8sP
 aAA79A//SAXDwnnfJDa+F57aqFFSQSs+y+4D01NgWsJkVSHVF9JJMowsCvWZ2lhz
 NXaBtONTzwMjDVxnMaEQgqBMNH7HXWzxqi7twvDCbHYFPyJzInWcCPokpsfQ9/Tc
 5n0yPcUuUwDdO1I06CxAdvBU26I9nMIvI353DuhdGRaZdH85isTDgW35T2G3TD3w
 ratlmHoIYUe7cjvhJs/6p4R7quBSLT74mqIs00l3mtlyRKhdVGR+Tl3YwCfmUb+F
 V/vQo13O04QnC2QOzEAz//PUj1Rm9XXCaiWQHKs8QyVM4opFQADhrKLRQjkqu/3p
 KOaJPxJEr2NnTuuFWyfj78k+zV8tMSvfXcRwVPO/ZtXow6CtYtV4h09FK8xdpkJK
 rkrQ06Up111sS8uDJrzWRlREBM/JTOIZHkLGF7ZkQK3ICVZPi1vGg8MbQjuM8lnd
 Oc95eOn4BTC0lua3L65f/C/UQpSXr+vqKTq+xOsybxnWmLJBcFSWOIqeaLJTblsi
 YiZwowlpxoFC/UCEzTsSTRKbjETb590oyJqeg0pchdUT50x9ZiBfo094sdovrKqE
 eJDiiDiXWPIB1Cf+ic8iP6T6C0Qsv8zq+GZtyIMZ0ZAkywdUOTMNst8UA9LRstx4
 AlvMRfOM9aJhSDmvDk/Nheff9mjIsJYjZ+U09wLRXOJO1Yse4VU=
 =h/fM
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Valentine's day edition, with just few fixes because
that's how we love it ;-)

iwlwifi:
 - correct A3 in A-MSDUs
 - fix crash when operating as AP and running out of station
   slots to use
 - clear link ID to correct some later checks against it
 - fix error codes in SAR table loading
 - fix error path in PPAG table read

mac80211:
 - reload a pointer after SKB may have changed
   (only in certain monitor inject mode scenarios)

* tag 'wireless-2024-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: fix a crash when we run out of stations
  wifi: iwlwifi: uninitialized variable in iwl_acpi_get_ppag_table()
  wifi: iwlwifi: Fix some error codes
  wifi: iwlwifi: clear link_id in time_event
  wifi: iwlwifi: mvm: use correct address 3 in A-MSDU
  wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
====================

Link: https://lore.kernel.org/r/20240214184326.132813-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-14 17:32:37 -08:00
Felix Fietkau
84443741fa netfilter: nf_tables: fix bidirectional offload regression
Commit 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
made unidirectional flow offload possible, while completely ignoring (and
breaking) bidirectional flow offload for nftables.
Add the missing flag that was left out as an exercise for the reader :)

Cc: Vlad Buslov <vladbu@nvidia.com>
Fixes: 8f84780b84 ("netfilter: flowtable: allow unidirectional rules")
Reported-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-15 00:20:00 +01:00
Kyle Swenson
0f1ae2821f netfilter: nat: restore default DNAT behavior
When a DNAT rule is configured via iptables with different port ranges,

iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
-j DNAT --to-destination 192.168.0.10:21000-21010

we seem to be DNATing to some random port on the LAN side. While this is
expected if --random is passed to the iptables command, it is not
expected without passing --random.  The expected behavior (and the
observed behavior prior to the commit in the "Fixes" tag) is the traffic
will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
with that destination.  In that case, we expect the traffic to be
instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
the range.

This patch intends to restore the behavior observed prior to the "Fixes"
tag.

Fixes: 6ed5943f87 ("netfilter: nat: remove l4 protocol port rovers")
Signed-off-by: Kyle Swenson <kyle.swenson@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-15 00:20:00 +01:00
Pablo Neira Ayuso
f6374a82fc netfilter: nft_set_pipapo: fix missing : in kdoc
Add missing : in kdoc field names.

Fixes: 8683f4b995 ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-15 00:17:45 +01:00
Oleksij Rempel
efe7cf8280 can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
modifies jsk->filters while receiving packets.

Following trace was seen on affected system:
 ==================================================================
 BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
 Read of size 4 at addr ffff888012144014 by task j1939/350

 CPU: 0 PID: 350 Comm: j1939 Tainted: G        W  OE      6.5.0-rc5 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 Call Trace:
  print_report+0xd3/0x620
  ? kasan_complete_mode_report_info+0x7d/0x200
  ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  kasan_report+0xc2/0x100
  ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  __asan_load4+0x84/0xb0
  j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
  j1939_sk_recv+0x20b/0x320 [can_j1939]
  ? __kasan_check_write+0x18/0x20
  ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939]
  ? j1939_simple_recv+0x69/0x280 [can_j1939]
  ? j1939_ac_recv+0x5e/0x310 [can_j1939]
  j1939_can_recv+0x43f/0x580 [can_j1939]
  ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
  ? raw_rcv+0x42/0x3c0 [can_raw]
  ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
  can_rcv_filter+0x11f/0x350 [can]
  can_receive+0x12f/0x190 [can]
  ? __pfx_can_rcv+0x10/0x10 [can]
  can_rcv+0xdd/0x130 [can]
  ? __pfx_can_rcv+0x10/0x10 [can]
  __netif_receive_skb_one_core+0x13d/0x150
  ? __pfx___netif_receive_skb_one_core+0x10/0x10
  ? __kasan_check_write+0x18/0x20
  ? _raw_spin_lock_irq+0x8c/0xe0
  __netif_receive_skb+0x23/0xb0
  process_backlog+0x107/0x260
  __napi_poll+0x69/0x310
  net_rx_action+0x2a1/0x580
  ? __pfx_net_rx_action+0x10/0x10
  ? __pfx__raw_spin_lock+0x10/0x10
  ? handle_irq_event+0x7d/0xa0
  __do_softirq+0xf3/0x3f8
  do_softirq+0x53/0x80
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x6e/0x70
  netif_rx+0x16b/0x180
  can_send+0x32b/0x520 [can]
  ? __pfx_can_send+0x10/0x10 [can]
  ? __check_object_size+0x299/0x410
  raw_sendmsg+0x572/0x6d0 [can_raw]
  ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
  ? apparmor_socket_sendmsg+0x2f/0x40
  ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
  sock_sendmsg+0xef/0x100
  sock_write_iter+0x162/0x220
  ? __pfx_sock_write_iter+0x10/0x10
  ? __rtnl_unlock+0x47/0x80
  ? security_file_permission+0x54/0x320
  vfs_write+0x6ba/0x750
  ? __pfx_vfs_write+0x10/0x10
  ? __fget_light+0x1ca/0x1f0
  ? __rcu_read_unlock+0x5b/0x280
  ksys_write+0x143/0x170
  ? __pfx_ksys_write+0x10/0x10
  ? __kasan_check_read+0x15/0x20
  ? fpregs_assert_state_consistent+0x62/0x70
  __x64_sys_write+0x47/0x60
  do_syscall_64+0x60/0x90
  ? do_syscall_64+0x6d/0x90
  ? irqentry_exit+0x3f/0x50
  ? exc_page_fault+0x79/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 Allocated by task 348:
  kasan_save_stack+0x2a/0x50
  kasan_set_track+0x29/0x40
  kasan_save_alloc_info+0x1f/0x30
  __kasan_kmalloc+0xb5/0xc0
  __kmalloc_node_track_caller+0x67/0x160
  j1939_sk_setsockopt+0x284/0x450 [can_j1939]
  __sys_setsockopt+0x15c/0x2f0
  __x64_sys_setsockopt+0x6b/0x80
  do_syscall_64+0x60/0x90
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

 Freed by task 349:
  kasan_save_stack+0x2a/0x50
  kasan_set_track+0x29/0x40
  kasan_save_free_info+0x2f/0x50
  __kasan_slab_free+0x12e/0x1c0
  __kmem_cache_free+0x1b9/0x380
  kfree+0x7a/0x120
  j1939_sk_setsockopt+0x3b2/0x450 [can_j1939]
  __sys_setsockopt+0x15c/0x2f0
  __x64_sys_setsockopt+0x6b/0x80
  do_syscall_64+0x60/0x90
  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Reported-by: Sili Luo <rootlab@huawei.com>
Suggested-by: Sili Luo <rootlab@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-14 13:53:03 +01:00
Ziqi Zhao
6cdedc18ba can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
The following 3 locks would race against each other, causing the
deadlock situation in the Syzbot bug report:

- j1939_socks_lock
- active_session_list_lock
- sk_session_queue_lock

A reasonable fix is to change j1939_socks_lock to an rwlock, since in
the rare situations where a write lock is required for the linked list
that j1939_socks_lock is protecting, the code does not attempt to
acquire any more locks. This would break the circular lock dependency,
where, for example, the current thread already locks j1939_socks_lock
and attempts to acquire sk_session_queue_lock, and at the same time,
another thread attempts to acquire j1939_socks_lock while holding
sk_session_queue_lock.

NOTE: This patch along does not fix the unregister_netdevice bug
reported by Syzbot; instead, it solves a deadlock situation to prepare
for one or more further patches to actually fix the Syzbot bug, which
appears to be a reference counting problem within the j1939 codebase.

Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
[mkl: remove unrelated newline change]
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-02-14 13:53:03 +01:00
Eric Dumazet
1b3ef46cb7 net: remove dev_base_lock
dev_base_lock is not needed anymore, all remaining users also hold RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:14 +00:00
Eric Dumazet
e51b962438 net: remove dev_base_lock from register_netdevice() and friends.
RTNL already protects writes to dev->reg_state, we no longer need to hold
dev_base_lock to protect the readers.

unlist_netdevice() second argument can be removed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:14 +00:00
Eric Dumazet
2dd4d828d6 net: remove dev_base_lock from do_setlink()
We hold RTNL here, and dev->link_mode readers already
are using READ_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:14 +00:00
Eric Dumazet
6a2968ee1e net: add netdev_set_operstate() helper
dev_base_lock is going away, add netdev_set_operstate() helper
so that hsr does not have to know core internals.

Remove dev_base_lock acquisition from rfc2863_policy()

v3: use an "unsigned int" for dev->operstate,
    so that try_cmpxchg() can work on all arches.
        ( https://lore.kernel.org/oe-kbuild-all/202402081918.OLyGaea3-lkp@intel.com/ )

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
e154bb7a6e net-sysfs: convert netstat_show() to RCU
dev_get_stats() can be called from RCU, there is no need
to acquire dev_base_lock.

Change dev_isalive() comment to reflect we no longer use
dev_base_lock from net/core/net-sysfs.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
004d138364 net-sysfs: convert dev->operstate reads to lockless ones
operstate_show() can omit dev_base_lock acquisition only
to read dev->operstate.

Annotate accesses to dev->operstate.

Writers still acquire dev_base_lock for mutual exclusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
c7d52737e7 net-sysfs: use dev_addr_sem to remove races in address_show()
Using dev_base_lock is not preventing from reading garbage.

Use dev_addr_sem instead.

v4: place dev_addr_sem extern in net/core/dev.h (Jakub Kicinski)
 Link: https://lore.kernel.org/netdev/20240212175845.10f6680a@kernel.org/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
12692e3df2 net-sysfs: convert netdev_show() to RCU
Make clear dev_isalive() can be called with RCU protection.

Then convert netdev_show() to RCU, to remove dev_base_lock
dependency.

Also add RCU to broadcast_show().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
4d42b37def net: convert dev->reg_state to u8
Prepares things so that dev->reg_state reads can be lockless,
by adding WRITE_ONCE() on write side.

READ_ONCE()/WRITE_ONCE() do not support bitfields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
a6473fe9b6 dev: annotate accesses to dev->link
Following patch will read dev->link locklessly,
annotate the write from do_setlink().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
f694eee9e1 ip_tunnel: annotate data-races around t->parms.link
t->parms.link is read locklessly, annotate these reads
and opposite writes accordingly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Eric Dumazet
1c07dbb0cc net: annotate data-races around dev->name_assign_type
name_assign_type_show() runs locklessly, we should annotate
accesses to dev->name_assign_type.

Alternative would be to grab devnet_rename_sem semaphore
from name_assign_type_show(), but this would not bring
more accuracy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 11:20:13 +00:00
Dmitry Antipov
6cf9ff4633 net: smc: fix spurious error message from __sock_release()
Commit 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
leaves the socket's fasync list pointer within a container socket as well.
When the latter is destroyed, '__sock_release()' warns about its non-empty
fasync list, which is a dangling pointer to previously freed fasync list
of an underlying TCP socket. Fix this spurious warning by nullifying
fasync list of a container socket.

Fixes: 67f562e3e1 ("net/smc: transfer fasync_list in case of fallback")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 10:56:02 +00:00
David S. Miller
e1a00373e1 linux-can-next-for-6.9-20240213
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEUEC6huC2BN0pvD5fKDiiPnotvG8FAmXLUSQTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAoOKI+ei28b6VMB/0eqFcC233/c60/7iEbxXTGG52qs4mc
 4LeTs57+4Msfibq7M81ZzBuZoMqFluFELunYT5gDPXgnSn4AWXyCv9ciYCW8vort
 Z/2wcSNUMdOIbmKZhdc96gnqXuE6fNMx/eYTsn34HBkMkM7BfxZSIH3pZsys+eGw
 JrVwhT2aBVKG5ji4YPZF/RuqHwuM00GLMs9G9GR6yw9JiCwI1n+Jjru/6zwJprpi
 NAyLhJGgvgp+twLID2jH2Gy6Mqs/ZrXMyxPMqycbYOtZ4oQJOfTkg1SXzT/J3GsY
 VFWvhGWrADSx7CnISuS9VXsoWpe5nZ7yMhFBOtKME3Gh3qmhQegPIMY3
 =w4J5
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.9-20240213' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
linux-can-next-for-6.9-20240213

this is a pull request of 23 patches for net-next/master.

The first patch is by Nicolas Maier and targets the CAN Broadcast
Manager (bcm), it adds message flags to distinguish between own local
and remote traffic.

Oliver Hartkopp contributes a patch for the CAN ISOTP protocol that
adds dynamic flow control parameters.

Stefan Mätje's patch series add support for the esd PCIe/402 CAN
interface family.

Markus Schneider-Pargmann contributes 14 patches for the m_can to
optimize for the SPI attached tcan4x5x controller.

A patch by Vincent Mailhol replaces Wolfgang Grandegger by Vincent
Mailhol as the CAN drivers Co-Maintainer.

Jimmy Assarsson's patch add support for the Kvaser M.2 PCIe 4xCAN
adapter.

A patch by Daniil Dulov removed a redundant NULL check in the softing
driver.

Oliver Hartkopp contributes a patch to add CANXL virtual CAN network
identifier support.

A patch by myself removes Naga Sureshkumar Relli as the maintainer of
the xilinx_can driver, as their email bounces.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14 10:00:35 +00:00
Lorenzo Bianconi
27accb3cc0 veth: rely on skb_pp_cow_data utility routine
Rely on skb_pp_cow_data utility routine and remove duplicated code.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/029cc14cce41cb242ee7efdcf32acc81f1ce4e9f.1707729884.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13 19:22:30 -08:00
Lorenzo Bianconi
e6d5dbdd20 xdp: add multi-buff support for xdp running in generic mode
Similar to native xdp, do not always linearize the skb in
netif_receive_generic_xdp routine but create a non-linear xdp_buff to be
processed by the eBPF program. This allow to add multi-buffer support
for xdp running in generic mode.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/1044d6412b1c3e95b40d34993fd5f37cd2f319fd.1707729884.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13 19:22:30 -08:00
Lorenzo Bianconi
4d2bb0bfe8 xdp: rely on skb pointer reference in do_xdp_generic and netif_receive_generic_xdp
Rely on skb pointer reference instead of the skb pointer in do_xdp_generic
and netif_receive_generic_xdp routine signatures.
This is a preliminary patch to add multi-buff support for xdp running in
generic mode where we will need to reallocate the skb to avoid
linearization and we will need to make it visible to do_xdp_generic()
caller.

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c09415b1f48c8620ef4d76deed35050a7bddf7c2.1707729884.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13 19:22:30 -08:00