Commit graph

1218366 commits

Author SHA1 Message Date
Heng Guo
b4a11b2033 net: fix IPSTATS_MIB_OUTPKGS increment in OutForwDatagrams.
Reproduce environment:
network with 3 VM linuxs is connected as below:
VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3
VM1: eth0 ip: 192.168.122.207 MTU 1500
VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500
VM3: eth0 ip: 192.168.123.240 MTU 1500

Reproduce:
VM1 send 1400 bytes UDP data to VM3 using tools scapy with flags=0.
scapy command:
send(IP(dst="192.168.123.240",flags=0)/UDP()/str('0'*1400),count=1,
inter=1.000000)

Result:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates
Ip: 1 64 11 0 3 4 0 0 4 7 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates
Ip: 1 64 12 0 3 5 0 0 4 8 0 0 0 0 0 0 0 0 0
......
----------------------------------------------------------------------
"ForwDatagrams" increase from 4 to 5 and "OutRequests" also increase
from 7 to 8.

Issue description and patch:
IPSTATS_MIB_OUTPKTS("OutRequests") is counted with IPSTATS_MIB_OUTOCTETS
("OutOctets") in ip_finish_output2().
According to RFC 4293, it is "OutOctets" counted with "OutTransmits" but
not "OutRequests". "OutRequests" does not include any datagrams counted
in "ForwDatagrams".
ipSystemStatsOutOctets OBJECT-TYPE
    DESCRIPTION
           "The total number of octets in IP datagrams delivered to the
            lower layers for transmission.  Octets from datagrams
            counted in ipIfStatsOutTransmits MUST be counted here.
ipSystemStatsOutRequests OBJECT-TYPE
    DESCRIPTION
           "The total number of IP datagrams that local IP user-
            protocols (including ICMP) supplied to IP in requests for
            transmission.  Note that this counter does not include any
            datagrams counted in ipSystemStatsOutForwDatagrams.
So do patch to define IPSTATS_MIB_OUTPKTS to "OutTransmits" and add
IPSTATS_MIB_OUTREQUESTS for "OutRequests".
Add IPSTATS_MIB_OUTREQUESTS counter in __ip_local_out() for ipv4 and add
IPSTATS_MIB_OUT counter in ip6_finish_output2() for ipv6.

Test result with patch:
Before IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 9 0 5 1 0 0 3 3 0 0 0 0 0 0 0 0 0 4
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
  OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
  InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
  InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 2976 1896 0 0 0 0 0 9 0 0 0 0
----------------------------------------------------------------------
After IP data is sent.
----------------------------------------------------------------------
root@qemux86-64:~# cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
  ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
  OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
  FragOKs FragFails FragCreates OutTransmits
Ip: 1 64 10 0 5 2 0 0 3 3 0 0 0 0 0 0 0 0 0 5
......
root@qemux86-64:~# cat /proc/net/netstat
......
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts
  OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets
  InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts
  InECT0Pkts InCEPkts ReasmOverlaps
IpExt: 0 0 0 0 0 0 4404 3324 0 0 0 0 0 10 0 0 0 0
----------------------------------------------------------------------
"ForwDatagrams" increase from 1 to 2 and "OutRequests" is keeping 3.
"OutTransmits" increase from 4 to 5 and "OutOctets" increase 1428.

Signed-off-by: Heng Guo <heng.guo@windriver.com>
Reviewed-by: Kun Song <Kun.Song@windriver.com>
Reviewed-by: Filip Pudak <filip.pudak@windriver.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 12:01:00 +01:00
David S. Miller
095c3ea6fd Merge branch 'ksz886x-forced-link-modes'
Oleksij Rempel says:

====================
fix forced link mode for KSZ886X switches

changes v3:
- squash patch 1 and 2
- use genphy_config_aneg() instead of genphy_setup_forced()

changes v2:
- address kernel test robot warning
- change comment explaining clearing of KSZ886X_CTRL_FORCE_LINK bit
- s/PHY we create/PHY will create/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:50:46 +01:00
Oleksij Rempel
510f02febb net: phy: micrel: Fix forced link mode for KSZ886X switches
Address a link speed detection issue in KSZ886X PHY driver when in
forced link mode. Previously, link partners like "ASIX AX88772B"
with KSZ8873 could fall back to 10Mbit instead of configured 100Mbit.

The issue arises as KSZ886X PHY continues sending Fast Link Pulses (FLPs)
even with autonegotiation off, misleading link partners in autoneg mode,
leading to incorrect link speed detection.

Now, when autonegotiation is disabled, the driver sets the link state
forcefully using KSZ886X_CTRL_FORCE_LINK bit. This action, beyond just
disabling autonegotiation, makes the PHY state more reliably detected by
link partners using parallel detection, thus fixing the link speed
misconfiguration.

With autonegotiation enabled, link state is not forced, allowing proper
autonegotiation process participation.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:50:46 +01:00
Oleksij Rempel
f600bb612b net: dsa: microchip: ksz8: Enable MIIM PHY Control reg access
Provide access to MIIM PHY Control register (Reg. 31) through
ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for
upcoming micrel.c patch to address forced link mode configuration.

Closes: https://lore.kernel.org/oe-kbuild-all/202310112224.iYgvjBUy-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:50:46 +01:00
David S. Miller
c051857154 Merge branch 'mlxsw-lag-table-allocation'
Petr Machata says:

====================
mlxsw: Move allocation of LAG table to the driver

PGT is an in-HW table that maps addresses to sets of ports. Then when some
HW process needs a set of ports as an argument, instead of embedding the
actual set in the dynamic configuration, what gets configured is the
address referencing the set. The HW then works with the appropriate PGT
entry.

Within the PGT is placed a LAG table. That is a contiguous block of PGT
memory where each entry describes which ports are members of the
corresponding LAG port.

The PGT is split to two parts: one managed by the FW, and one managed by
the driver. Historically, the FW part included also the LAG table, referred
to as FW LAG mode. Giving the responsibility for placement of the LAG table
to the driver, referred to as SW LAG mode, makes the whole system more
flexible. The FW currently supports both FW and SW LAG modes. To shed
complexity, the FW should in the future only support SW LAG mode.

Hence this patchset, where support for placement of LAG is added to mlxsw.

There are FW versions out there that do not support SW LAG mode, and on
Spectrum-1 in particular, there is no plan to support it at all. mlxsw will
therefore have to support both modes of operation.

Another aspect is that at least on Spectrum-1, there are FW versions out
there that claim to support driver-placed LAG table, but then reject or
ignore configurations enabling the same. The driver thus has to have a say
in whether an attempt to configure SW LAG mode should even be done.

The feature is therefore expressed in terms of "does the driver prefer SW
LAG mode?", and "what LAG mode the PCI module managed to configure the FW
with". This is unlike current flood mode configuration, where the driver
can give a strict value, and that's what gets configured. But it gives a
chance to the driver to determine whether LAG mode should be enabled at
all.

The "does the driver prefer SW LAG mode?" bit is expressed as a boolean
lag_mode_prefer_sw. The reason for this is largely another feature that
will be introduced in a follow-up patchset: support for CFF flood mode. The
driver currently requires that the FW be configured with what is called
controlled flood mode. But on capable systems, CFF would be preferred. So
there are two values in flight: the preferred flood mode, and the fallback.
This could be expressed with an array of flood modes ordered by preference,
but that looks like an overkill in comparison. This flag/value model is
then reused for LAG mode as well, except the fallback value is absent and
implied to be FW, because there are no other values to choose from.

The patchset progresses as follows:

- Patches #1 to #5 adjust reg.h and cmd.h with new register fields,
  constants and remarks.

- Patches #6 and #7 add the ability to request SW LAG mode and to query the
  LAG mode that was actually negotiated. This is where the abovementioned
  lag_mode_prefer_sw flag is added.

- Patches #7 to #9 generalize PGT allocations to make it possible to
  allocate the LAG table, which is done in patch #10.

- In patch #11, toggle lag_mode_prefer_sw on Spectrum-2 and above, which
  makes the newly-added code live.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:51 +01:00
Petr Machata
b46c1f3f5e mlxsw: spectrum: Set SW LAG mode on Spectrum>1
On Spectrum-2, Spectrum-3 and Spectrum-4 machines, request SW
responsibility for placement of the LAG table.

On Spectrum-1, some FW versions claim to support lag_mode field despite
quietly ignoring any settings made to that field. Thus refrain from
attempting to configure lag_mode on those systems at all.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
c678972580 mlxsw: spectrum: Allocate LAG table when in SW LAG mode
In this patch, if the LAG mode is SW, allocate the LAG table and configure
SGCR to indicate where it was allocated.

We use the default "DDD" (for dynamic data duplication) layout of the LAG
table. In the DDD mode, the membership information for each LAG is copied
in 8 PGT entries. This is done for performance reasons. The LAG table then
needs to be allocated on an address aligned to 8. Deal with this by
moving the LAG init ahead so that the LAG table is allocated at address 0.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
8c893abd64 mlxsw: spectrum_pgt: Generalize PGT allocation
PGT blocks are allocated through the function
mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows
which piece of PGT exactly they want to get. That was fine while the FID
code was the only client allocating blocks of PGT. However for SW-allocated
LAG table, there will be an additional client: mlxsw_sp_lag_init(). The
interface should therefore be changed to not require particular
coordinates, but to take just the requested size, allocate the block
wherever, and give back the PGT address.

In this patch, change the interface accordingly. Initialize FID family's
pgt_base from the result of the PGT allocation (note that mlxsw makes a
copy of the family structure, so what gets initialized is not actually the
global structure). Drop the now-unnecessary pgt_base initializations and
the corresponding defines.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
f5e293f993 mlxsw: spectrum_fid: Allocate PGT for the whole FID family in one go
PGT blocks are allocated through the function
mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows
which piece of PGT exactly they want to get. That was fine while the FID
code was the only client allocating blocks of PGT. However for SW-allocated
LAG table, there will be an additional client: mlxsw_sp_lag_init(). The
interface should therefore be changed to not require particular
coordinates, but to take just the requested size, allocate the block
wherever, and give back the PGT address.

The current FID mode has one place where PGT address can be stored: the FID
family's pgt_base. The allocation scheme should therefore be changed from
allocating a block per FID flood table, to allocating a block per FID
family.

Do just that in this patch.

The per-family allocation is going to be useful for another related feature
as well: the CFF mode.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:50 +01:00
Petr Machata
daee7aaba8 mlxsw: pci: Permit toggling LAG mode
Add to struct mlxsw_config_profile a field lag_mode_prefer_sw for the
driver to indicate that SW LAG mode should be configured if possible. Add
to the PCI module code to set lag_mode as appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
b2e9b1fe8c mlxsw: core, pci: Add plumbing related to LAG mode
lag_mode describes where the responsibility for LAG table placement lies:
SW or FW. The bus module determines whether LAG is supported, can configure
it if it is, and knows what (if any) configuration has been applied.
Therefore add a bus callback to determine the configured LAG mode. Also add
to core an API to query it.

The LAG mode is for now kept at the default value of 0 for FW-managed. The
code to actually toggle it will be added later.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
8eabd10cdc mlxsw: cmd: Add QUERY_FW.lag_mode_support
Add QUERY_FW.lag_mode_support, which determines whether
CONFIG_PROFILE.lag_mode is available.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
eb26a59232 mlxsw: cmd: Add CONFIG_PROFILE.{set_, }lag_mode
Add CONFIG_PROFILE.lag_mode, which serves for moving responsibility for
placement of the LAG table from FW to SW. Whether lag_mode should be
configured is determined by CONFIG_PROFILE.set_lag_mode, which also add.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
be9ed47d3f mlxsw: cmd: Fix omissions in CONFIG_PROFILE field names in comments
A number of CONFIG_PROFILE fields' comments refer to a field named like
cmd_mbox_config_* instead of cmd_mbox_config_profile_*. Correct these
omissions.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
cf0a86e8ce mlxsw: reg: Add SGCR.lag_lookup_pgt_base
Add SGCR.lag_lookup_pgt_base, which is used for configuring the base
address of the LAG table within the PGT table for cases when the driver
is responsible for the table placement.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:49 +01:00
Petr Machata
66eaaa8541 mlxsw: reg: Drop SGCR.llb
SGCR, Switch General Configuration Register, has not been used since commit
b0d80c013b ("mlxsw: Remove Mellanox SwitchX-2 ASIC support"). We will
need the register again shortly, so instead of dropping it and
reintroducing again, just drop the sole unused field.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:47:48 +01:00
David S. Miller
fd533a7ac7 Merge branch 'netlink-auto-integers'
Jakub Kicinski says:

====================
netlink: add variable-length / auto integers

Add netlink support for "common" / variable-length / auto integers
which are carried at the message level as either 4B or 8B depending
on the exact value. This saves space and will hopefully decrease
the number of instances where we realize that we needed more bits
after uAPI is set is stone. It also loosens the alignment requirements,
avoiding the need for padding.

This mini-series is a fuller version of the previous RFC:
https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/
No user included here. I have tested (and will use) it
in the upcoming page pool API but the assumption is that
it will be widely applicable. So sending without a user.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:36 +01:00
Jakub Kicinski
7d4caf54d2 netlink: specs: add support for auto-sized scalars
Support uint / sint types in specs and YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Jakub Kicinski
374d345d9b netlink: add variable-length / auto integers
We currently push everyone to use padding to align 64b values
in netlink. Un-padded nla_put_u64() doesn't even exist any more.

The story behind this possibly start with this thread:
https://lore.kernel.org/netdev/20121204.130914.1457976839967676240.davem@davemloft.net/
where DaveM was concerned about the alignment of a structure
containing 64b stats. If user space tries to access such struct
directly:

	struct some_stats *stats = nla_data(attr);
	printf("A: %llu", stats->a);

lack of alignment may become problematic for some architectures.
These days we most often put every single member in a separate
attribute, meaning that the code above would use a helper like
nla_get_u64(), which can deal with alignment internally.
Even for arches which don't have good unaligned access - access
aligned to 4B should be pretty efficient.
Kernel and well known libraries deal with unaligned input already.

Padded 64b is quite space-inefficient (64b + pad means at worst 16B
per attr vs 32b which takes 8B). It is also more typing:

    if (nla_put_u64_pad(rsp, NETDEV_A_SOMETHING_SOMETHING,
                        value, NETDEV_A_SOMETHING_PAD))

Create a new attribute type which will use 32 bits at netlink
level if value is small enough (probably most of the time?),
and (4B-aligned) 64 bits otherwise. Kernel API is just:

    if (nla_put_uint(rsp, NETDEV_A_SOMETHING_SOMETHING, value))

Calling this new type "just" sint / uint with no specific size
will hopefully also make people more comfortable with using it.
Currently telling people "don't use u8, you may need the bits,
and netlink will round up to 4B, anyway" is the #1 comment
we give to newcomers.

In terms of netlink layout it looks like this:

         0       4       8       12      16
32b:     [nlattr][ u32  ]
64b:     [  pad ][nlattr][     u64      ]
uint(32) [nlattr][ u32  ]
uint(64) [nlattr][     u64      ]

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Jakub Kicinski
e1b347c5f7 tools: ynl-gen: make the mnl_type() method public
uint/sint support will add more logic to mnl_type(),
deduplicate it and make it more accessible.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
David S. Miller
a10f9bfe93 Merge branch 'devlink-errors-fmsg'
Przemek Kitszel says:

====================
devlink: retain error in struct devlink_fmsg

Extend devlink fmsg to retain error (patch 1),
so drivers could omit error checks after devlink_fmsg_*() (patches 2-10),
and finally enforce future uses to follow this practice by change to
return void (patch 11)

Note that it was compile tested only.

bloat-o-meter for whole series:
add/remove: 8/18 grow/shrink: 23/40 up/down: 2017/-5833 (-3816)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:51 +01:00
Przemek Kitszel
0050629cd3 devlink: convert most of devlink_fmsg_*() to return void
Since struct devlink_fmsg retains error by now (see 1st patch of this
series), there is no longer need to keep returning it in each call.

This is a separate commit to allow per-driver conversion to stop using
those return values.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:51 +01:00
Przemek Kitszel
3465915e99 staging: qlge: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
18256cb2d4 qed: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
d17f98bf7c net/mlx5: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
1d434b4849 mlxsw: core: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
d8cf03fca3 octeontx2-af: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
aca7734d94 hinic: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:50 +01:00
Przemek Kitszel
074e1b4221 bnxt_en: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:49 +01:00
Przemek Kitszel
47957bb3f7 pds_core: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:49 +01:00
Przemek Kitszel
20f9b9d385 netdevsim: devlink health: use retained error fmsg API
Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:49 +01:00
Przemek Kitszel
db80d3b255 devlink: retain error in struct devlink_fmsg
Retain error value in struct devlink_fmsg, to relieve drivers from
checking it after each call.
Note that fmsg is an in-memory builder/buffer of formatted message,
so it's not the case that half baked message was sent somewhere.

We could find following scheme in multiple drivers:
  err = devlink_fmsg_obj_nest_start(fmsg);
  if (err)
  	return err;
  err = devlink_fmsg_string_pair_put(fmsg, "src", src);
  if (err)
  	return err;
  err = devlink_fmsg_something(fmsg, foo, bar);
  if (err)
	return err;
  // and so on...
  err = devlink_fmsg_obj_nest_end(fmsg);

With retaining error API that translates to:
  devlink_fmsg_obj_nest_start(fmsg);
  devlink_fmsg_string_pair_put(fmsg, "src", src);
  devlink_fmsg_something(fmsg, foo, bar);
  // and so on...
  devlink_fmsg_obj_nest_end(fmsg);

What means we check error just when is time to send.

Possible error scenarios are developer error (API misuse) and memory
exhaustion, both cases are good candidates to choose readability
over fastest possible exit.

Note that this patch keeps returning errors, to allow per-driver conversion
to the new API, but those are not needed at this point already.

This commit itself is an illustration of benefits for the dev-user,
more of it will be in separate commits of the series.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:34:49 +01:00
Jakub Kicinski
7ce6936045 Merge branch 'tools-ynl-gen-support-full-range-of-min-max-checks'
Jakub Kicinski says:

====================
tools: ynl-gen: support full range of min/max checks

YNL code gen currently supports only very simple range checks
within the range of s16. Add support for full range of u64 / s64
which is good to have, and will be even more important with uint / sint.
====================

Link: https://lore.kernel.org/r/20231018163917.2514503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:58 -07:00
Jakub Kicinski
f9bc3cbc20 tools: ynl-gen: support limit names
Support the use of symbolic names like s8-min or u32-max in checks
to make writing specs less painful.

Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
668c1ac828 tools: ynl-gen: support full range of min/max checks for integer values
Extend the support to full range of min/max checks.
None of the existing YNL families required complex integer validation.

The support is less than trivial, because we try to keep struct nla_policy
tiny the min/max members it holds in place are s16. Meaning we can only
express checks in range of s16. For larger ranges we need to define
a structure and link it in the policy.

Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
ee0a4cfcbd tools: ynl-gen: track attribute use
For range validation we'll need to know if any individual
attribute is used on input (i.e. whether we will generate
a policy for it). Track this information.

Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Dan Carpenter
75a384ceda ptp: prevent string overflow
The ida_alloc_max() function can return up to INT_MAX so this buffer is
not large enough.  Also use snprintf() for extra safety.

Fixes: 403376ddb4 ("ptp: add debugfs interface to see applied channel masks")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/d4b1a995-a0cb-4125-aa1d-5fd5044aba1d@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:51:52 -07:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2 ("wifi: mac80211: fix error path key leak")
  2a8b665e6b ("wifi: mac80211: remove key_mtx")
  7d6904bf26 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae950 ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Linus Torvalds
ce55c22ec8 Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
 The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly
 clear-cut user reported regressions. The mlx5 DMA bug was causing strife
 for 390x folks. The fixes themselves are not particularly scary, tho.
 No open investigations / outstanding reports at the time of writing.
 
 Current release - regressions:
 
  - eth: mlx5: perform DMA operations in the right locations,
    make devices usable on s390x, again
 
  - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve,
    previous fix of rejecting invalid config broke some scripts
 
  - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
 
  - revert "ethtool: Fix mod state of verbose no_mask bitset",
    needs more work
 
 Current release - new code bugs:
 
  - tcp: fix listen() warning with v4-mapped-v6 address
 
 Previous releases - regressions:
 
  - tcp: allow tcp_disconnect() again when threads are waiting,
    it was denied to plug a constant source of bugs but turns out
    .NET depends on it
 
  - eth: mlx5: fix double-free if buffer refill fails under OOM
 
  - revert "net: wwan: iosm: enable runtime pm support for 7560",
    it's causing regressions and the WWAN team at Intel disappeared
 
  - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains
    a single skb, fix single-stream perf regression on some devices
 
 Previous releases - always broken:
 
  - Bluetooth:
    - fix issues in legacy BR/EDR PIN code pairing
    - correctly bounds check and pad HCI_MON_NEW_INDEX name
 
  - netfilter:
    - more fixes / follow ups for the large "commit protocol" rework,
      which went in as a fix to 6.5
    - fix null-derefs on netlink attrs which user may not pass in
 
  - tcp: fix excessive TLP and RACK timeouts from HZ rounding
    (bless Debian for keeping HZ=250 alive)
 
  - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
    letting frankenstein UDP super-frames from getting into the stack
 
  - net: fix interface altnames when ifc moves to a new namespace
 
  - eth: qed: fix the size of the RX buffers
 
  - mptcp: avoid sending RST when closing the initial subflow
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmUxawUACgkQMUZtbf5S
 Irt75w/+MC2ilnFQhfoQqpCxL2uHWOKzhenUz2PoNKR60OKgLLmTq8YBYcpyCVAQ
 wBQaNzNu1/yjF5BG7aNS/j0suGeJsOfhfoahQvXlLaat9NuDqxTpaeoT5FZ7eNQw
 RZ8+CJug3BRDV0TkaJH9UDVC/nfJTsnsGWNIhNXYGPsuveqAUun+xrnN8ZbvZIrn
 6D9rMF+u9SdVO+ANCquXBC7+CWEWiJS1ljUrU7BRNiv/9FSlnPQtjOdpuKleeBO8
 4usMS7TezHNgRdiAKC8GjSGUiIkIIMJT4y4wuczBEQAD4Pkki9UpBrui97ozOj7h
 W4N7UOuPlUBIardvKNoYz9rZyiFXBcPPm0GruHiuCqpxyqmzoFgv2XJsb/6KfzNn
 Dyro+lvh8smtbFHvFqiwaNu5y8ucfClaowvR4gjSe2KcB7hIpwNkh6vWC6OMGJK3
 hiKHnDrnXBQMbnP1YfiJ4feLmm3UYCG8eFdv/ULZT0a9TzZ7fKfzAfywUwD+/O8Y
 +S28Hr9srdDCHO7ih/gF3Wq9wtnnLy8QEkpt7cpXjRDj0uWH8JkHU3YEIGF2814Y
 LNVGmX9y6RcgrHNp03K1PmUcgAzhTTuV9QRoQKEucuBT5AK9ALDQ8YomnWDWDgrp
 UOdJPi1RUTsmqslADF15wZ9W5Ki/cDnUJsE4HU/MtnM2w95C49Q=
 =z1F6
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter, WiFi.

  Feels like an up-tick in regression fixes, mostly for older releases.
  The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
  fairly clear-cut user reported regressions. The mlx5 DMA bug was
  causing strife for 390x folks. The fixes themselves are not
  particularly scary, tho. No open investigations / outstanding reports
  at the time of writing.

  Current release - regressions:

   - eth: mlx5: perform DMA operations in the right locations, make
     devices usable on s390x, again

   - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
     curve, previous fix of rejecting invalid config broke some scripts

   - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock

   - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
     more work

  Current release - new code bugs:

   - tcp: fix listen() warning with v4-mapped-v6 address

  Previous releases - regressions:

   - tcp: allow tcp_disconnect() again when threads are waiting, it was
     denied to plug a constant source of bugs but turns out .NET depends
     on it

   - eth: mlx5: fix double-free if buffer refill fails under OOM

   - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
     causing regressions and the WWAN team at Intel disappeared

   - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
     single skb, fix single-stream perf regression on some devices

  Previous releases - always broken:

   - Bluetooth:
      - fix issues in legacy BR/EDR PIN code pairing
      - correctly bounds check and pad HCI_MON_NEW_INDEX name

   - netfilter:
      - more fixes / follow ups for the large "commit protocol" rework,
        which went in as a fix to 6.5
      - fix null-derefs on netlink attrs which user may not pass in

   - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
     Debian for keeping HZ=250 alive)

   - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
     letting frankenstein UDP super-frames from getting into the stack

   - net: fix interface altnames when ifc moves to a new namespace

   - eth: qed: fix the size of the RX buffers

   - mptcp: avoid sending RST when closing the initial subflow"

* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  Revert "ethtool: Fix mod state of verbose no_mask bitset"
  selftests: mptcp: join: no RST when rm subflow/addr
  mptcp: avoid sending RST when closing the initial subflow
  mptcp: more conservative check for zero probes
  tcp: check mptcp-level constraints for backlog coalescing
  selftests: mptcp: join: correctly check for no RST
  net: ti: icssg-prueth: Fix r30 CMDs bitmasks
  selftests: net: add very basic test for netdev names and namespaces
  net: move altnames together with the netdevice
  net: avoid UAF on deleted altname
  net: check for altname conflicts when changing netdev's netns
  net: fix ifname in netlink ntf during netns move
  net: ethernet: ti: Fix mixed module-builtin object
  net: phy: bcm7xxx: Add missing 16nm EPHY statistics
  ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
  tcp_bpf: properly release resources on error paths
  net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
  net: mdio-mux: fix C45 access returning -EIO after API change
  tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
  octeon_ep: update BQL sent bytes before ringing doorbell
  ...
2023-10-19 12:08:18 -07:00
Linus Torvalds
74e9347ebc LoongArch fixes for v6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmUwqw8WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImeu57EACCgpJMG4fO+aVb6jQRdHmwny2K
 SIfcYoLi7wu7UGqP5SXJANsYwh1JXyCdi8m6jmCxfyVzu/JkmszG0W8eJBgrBrra
 sqKE83dpti+v7G7GmB2dmAYxnkbdwDCaI0cnIsOaSOen0gYu5zQ9ImUhVyNrIgTS
 xyf60kLqmJ1VSlFUCB5ju2zz3+oN6nlBWvItI6kpWnsU6f5BnT61eWFEmQl9N5Q5
 pFG4k3AUN7+Ppw3qtlWmj0YjXMkrefK3ffDXmpE5UUmdbqxTiCiOliZymhtgo04s
 mAvUS9gzJskGhith1+3WcNpQUn/Whtll7eteiYP0eM2kfe9MLehNSYdSfrH0V3IZ
 EThmiz2q+xwPERnxSSBLLBnWABZwOo0ENjb2F/lgpsBuMyVgzR+C/EmF8DWkkCqg
 O/GoQrbI+GXWX+gv+js9W4XoDhUKVwcfePqMX+dONgOBB8F8pFd1d/gL1Hcbt2HC
 P1rLNkH2CN49mCfuCWQAA/KsgTeTMfu/spRRIKJ4Fl7uTWcjZabqTujX/iAAgFD1
 mR/l73Fz2nH9WKH7Uq47XlvSQGCAOf0uez3BYjjRPpzkJh2NNS/QhxJRROXogTne
 foSzN0qVGzNr1Z98Iyv9JSEQvY1MzB+rVuZJwN0Z05+4CFtqAFN10+FT7Cp8XE1S
 JsETlmF/dGagDprLyg==
 =RreE
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai ChenL
 "Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
  like ioremap_wc(), use correct annotation for exception handlers, and
  a trivial cleanup"

* tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
  LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
  LoongArch: Export symbol invalid_pud_table for modules building
  LoongArch: Use SYM_CODE_* to annotate exception handlers
2023-10-19 11:02:28 -07:00
Linus Torvalds
54fb58aec4 slab fixes for 6.6-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmUxOtMACgkQu+CwddJF
 iJqwmgf/UYBHiwWoGXOuLCDBvm2Z4GwQyGH347IVooDpY28/NoWqeGeYbllvqj9k
 2EhP5zWLT945nt4L2Si98hn0ikqZ4DyljrHv/zpQShqKII7Ddh3o3j5r60Cbhyql
 9HAhei2I2lhiVLQBuQ2MQxFYeXW38s1XRL9/OY6nph8tYC2GvafVei3a0YHiT5J3
 CwkWQs+TV8R5RSFVE52htPV2PWx2+ZI5apyEeu9yuXZr22RuRsAgZM27QjakNrGI
 bFdShWQwnWSf/MmtuEtqXwBw4dpvLyG9gqVaeMZUydrW7OotR9kLvnL2yhoMvNb6
 7bq+cJniwLj31uYRHgIALzzaVjA88Q==
 =hkq/
 -----END PGP SIGNATURE-----

Merge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

 - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
   due to improperly resolved kmalloc alignment restrictions (Catalin
   Marinas)

* tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
2023-10-19 10:53:31 -07:00
Linus Torvalds
189b756271 seccomp fix for v6.6-rc7
- Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmUwfdQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJixtD/98vbEvwKMPxQxTH17Y2XMA7uSg
 8/WV6reVxE+DCalieqO14XRP/I++88W5A+HJSxZ+C9Q1WFXdsGINr3XMcihWZIc1
 IkzJ7t+LTTn4Y8MxC0m7LyIqpRIVYTkKW6RwrUbsZj0+vrpXtSWz6Gy7Oz7YXvah
 Y3DHVyFpi8FNK5cEz0cmBIITE0CCR/eQuiMD1057esudruWCFiKKGCZN6hPdNTlE
 pzCYGKNlA3eXOzsW/N2qAIj2nlnEJCMpP92vnvpRhcNIa+MKCzAELSi73QScG/ja
 lrcvCWWpWmR5sSNhA2dWbamR/S6GrEtwo16J4Sg/sYqHrywsr0vVGjEHLzkpf8Ef
 zxW5mGXl78dL+zoq6woyOzt/ztQ1kqVbSXF0VMMhVjUQhTSkvivr6LX+mwvNJYim
 KLAaJmWgPj6wm9agpQ/hukNKaENiA6sDTb+wpcWKdrWl958DoSY8zl3ZdJXWSxnE
 EWXLRtPioMP9BR/9U63Fr2U/m7dXKXHRfn5J2+0WBdq4MsuPtwkIs6SAaec0AKMS
 jJnRcu/b1ub00WitnJZk8v4jFfHVernFVkUGhFrgdXzwF7hXaMvsAPK6wfg2MNpe
 9aRPS4Pf8xgF6Bz7XqNrw3qgNmQwZ5kn/ucuwSjfOc2BcwSKjYGVqwXUQ8hbQdNK
 HtrskXY14BxtSLo1Gg==
 =Ir3f
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19 10:10:14 -07:00
Linus Torvalds
ea1cc20cd4 v6.6-rc7.vfs.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZTD6IQAKCRCRxhvAZXjc
 opXLAQC9X+ECnGUAOy/kvOrEBkBb7G4BuZ8XsrnL976riVNp0gEA85LaJV9Ow7Xk
 51k/1ujhYkglQbCsa0zo+mI4ueE3wAQ=
 =Dqrj
 -----END PGP SIGNATURE-----

Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
 "An openat() call from io_uring triggering an audit call can apparently
  cause the refcount of struct filename to be incremented from multiple
  threads concurrently during async execution, triggering a refcount
  underflow and hitting a BUG_ON(). That bug has been lurking around
  since at least v5.16 apparently.

  Switch to an atomic counter to fix that. The underflow check is
  downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
  remove that check altogether tbh"

* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  audit,io_uring: io_uring openat triggers audit reference count underflow
2023-10-19 09:37:41 -07:00
Kory Maincent
524515020f Revert "ethtool: Fix mod state of verbose no_mask bitset"
This reverts commit 108a36d07c.

It was reported that this fix breaks the possibility to remove existing WoL
flags. For example:
~$ ethtool lan2
...
        Supports Wake-on: pg
        Wake-on: d
...
~$ ethtool -s lan2 wol gp
~$ ethtool lan2
...
        Wake-on: pg
...
~$ ethtool -s lan2 wol d
~$ ethtool lan2
...
        Wake-on: pg
...

This worked correctly before this commit because we were always updating
a zero bitmap (since commit 6699170376 ("ethtool: fix application of
verbose no_mask bitset"), that is) so that the rest was left zero
naturally. But now the 1->0 change (old_val is true, bit not present in
netlink nest) no longer works.

Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
Cc: stable@vger.kernel.org
Fixes: 108a36d07c ("ethtool: Fix mod state of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:27:12 -07:00
Linus Torvalds
f69d00d12f driver ntfs3 for linux 6.6
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmUwyyAACgkQqbAzH4Mk
 B7aQsg/9FohvuET2YOqYlmtgbq1YjFlWkVxoNB2Be4gxKEqRcFN4V/O8PrOD0ZBX
 KH8ASWGp9EKQlVKucAPUhSaZMv4u2HaKTVyRnH5KpUmNG7rC7EzRgYDY4xA5RfjL
 aIxgVOhaSibFBGfys2KoS8MHFgZ9mcCWZZ1154ueSkZfX9ghytZcql5i4Ahi2OGQ
 u+aa/tkhoIS9ZKWp/uIz4R+fDClrG8UoiWmJ1LoOHQ5YM3enboOld6Zz1CjbC9yN
 ldInX7klL9aHRf3SoAeRVCyL23e4GB5R4+QmBhZQemhwy0jF0/7E0kCmCK0wYFWn
 3t1M/+i0S7o7rI+utTA2mTuZzxnNPMXJsn2TnpH9vv1o633WOlL0mrvsM7Z/NJDz
 VShPinQ9eJGvZQ9kYuMfZODtK0BP7OV8mRi7aXinzHucG+ISB2fBGxsnA0lX9EfT
 fHix1oNI33lfazVzaStJ3hs/n9KXwP/9P49rBcLh75O2t3TBKc2T84qpywNx+W6S
 9GDMV5V2FNWf/hMCL8pcTtab3MvpbeU5DqKnxW1D1vu2vdGiHVum/Aq26aE5W7nl
 EcE5pjKppdNZgk3Qwj190vym7d8lbK4Esua6ieNM/qvWb7hAGoZO7Tjcypvj+YMY
 W4aiJTW/jULPT2tkUvLvHg/r5DFHi30v6eCeTCrPfivxFRIAzBQ=
 =ScrJ
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 fixes from Konstantin Komarov:

 - memory leak

 - some logic errors, NULL dereferences

 - some code was refactored

 - more sanity checks

* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: Avoid possible memory leak
  fs/ntfs3: Fix directory element type detection
  fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
  fs/ntfs3: Fix OOB read in ntfs_init_from_boot
  fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
  fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
  fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
  fs/ntfs3: Do not allow to change label if volume is read-only
  fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
  fs/ntfs3: Refactoring and comments
  fs/ntfs3: Fix alternative boot searching
  fs/ntfs3: Allow repeated call to ntfs3_put_sbi
  fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
  fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
  fs/ntfs3: fix deadlock in mark_as_free_ex
  fs/ntfs3: Add more attributes checks in mi_enum_attr()
  fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
  fs/ntfs3: Write immediately updated ntfs state
  fs/ntfs3: Add ckeck in ni_update_parent()
2023-10-19 09:10:18 -07:00
Jakub Kicinski
1c1f14f92b Merge branch 'mptcp-fixes-for-v6-6'
Mat Martineau says:

====================
mptcp: Fixes for v6.6

Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected.

Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP
backlog queue.

Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an
expected MPTCP reinjection scenario.

Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly
instead of always sending RST. Associated selftest is updated.
====================

Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:02 -07:00
Matthieu Baerts
2cfaa8b3b7 selftests: mptcp: join: no RST when rm subflow/addr
Recently, we noticed that some RST were wrongly generated when removing
the initial subflow.

This patch makes sure RST are not sent when removing any subflows or any
addresses.

Fixes: c2b2ae3925 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Geliang Tang
14c56686a6 mptcp: avoid sending RST when closing the initial subflow
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.

That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.

We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.

This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.

Fixes: c2b2ae3925 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Paolo Abeni
72377ab2d6 mptcp: more conservative check for zero probes
Christoph reported that the MPTCP protocol can find the subflow-level
write queue unexpectedly not empty while crafting a zero-window probe,
hitting a warning:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70
Modules linked in:
CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312
RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00
RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b
RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c
R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580
R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200
FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545
 __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614
 mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391
 release_sock+0xf6/0x100 net/core/sock.c:3521
 mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746
 process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630
 worker_thread+0x3a7/0x610 kernel/workqueue.c:2784
 kthread+0x143/0x180 kernel/kthread.c:388
 ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304
 </TASK>

The root cause of the issue is that expectations are wrong: e.g. due
to MPTCP-level re-injection we can hit the critical condition.

Explicitly avoid the zero-window probe when the subflow write queue
is not empty and drop the related warnings.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444
Fixes: f70cad1085 ("mptcp: stop relying on tcp_tx_skb_cache")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Paolo Abeni
6db8a37dfc tcp: check mptcp-level constraints for backlog coalescing
The MPTCP protocol can acquire the subflow-level socket lock and
cause the tcp backlog usage. When inserting new skbs into the
backlog, the stack will try to coalesce them.

Currently, we have no check in place to ensure that such coalescing
will respect the MPTCP-level DSS, and that may cause data stream
corruption, as reported by Christoph.

Address the issue by adding the relevant admission check for coalescing
in tcp_add_backlog().

Note the issue is not easy to reproduce, as the MPTCP protocol tries
hard to avoid acquiring the subflow-level socket lock.

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00