Commit graph

1136939 commits

Author SHA1 Message Date
David Howells
1fc4fa2ac9 rxrpc: Fix congestion management
rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

 (1) Don't save cwnd between calls, but rather reset back down to the
     initial cwnd and re-enter slow-start if data transmission is idle for
     more than an RTT.

 (2) Preserve ssthresh instead, as that is a handy estimate of pipe
     capacity.  Knowing roughly when to stop slow start and enter
     congestion avoidance can reduce the tendency to overshoot and drop
     larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.

Reported-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
6869ddb87d rxrpc: Remove the rxtx ring
The Rx/Tx ring is no longer used, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
d57a3a1516 rxrpc: Save last ACK's SACK table rather than marking txbufs
Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets.  Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen.  If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
4e76bd406d rxrpc: Remove call->lock
call->lock is no longer necessary, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
a4ea4c4776 rxrpc: Don't use a ring buffer for call Tx queue
Change the way the Tx queueing works to make the following ends easier to
achieve:

 (1) The filling of packets, the encryption of packets and the transmission
     of packets can be handled in parallel by separate threads, rather than
     rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
     packet before moving onto the next one.

 (2) Get rid of the fixed-size ring which sets a hard limit on the number
     of packets that can be retained in the ring.  This allows the number
     of packets to increase without having to allocate a very large ring or
     having variable-sized rings.

     [Note: the downside of this is that it's then less efficient to locate
     a packet for retransmission as we then have to step through a list and
     examine each buffer in the list.]

 (3) Allow the filler/encrypter to run ahead of the transmission window.

 (4) Make it easier to do zero copy UDP from the packet buffers.

 (5) Make it easier to do zero copy from userspace to the packet buffers -
     and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

 (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
     to be transmitted in.  This allows them to be placed on multiple
     queues simultaneously.  An sk_buff isn't really necessary as it's
     never passed on to lower-level networking code.

 (2) Keep the transmissable packets in a linked list on the call struct
     rather than in a ring.  As a consequence, the annotation buffer isn't
     used either; rather a flag is set on the packet to indicate ackedness.

 (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
     transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
     that all packets up to and including the last got hard acked.

 (4) Wire headers are now stored in the txbuf rather than being concocted
     on the stack and they're stored immediately before the data, thereby
     allowing zerocopy of a single span.

 (5) Don't bother with instant-resend on transmission failure; rather,
     leave it for a timer or an ACK packet to trigger.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
5d7edbc923 rxrpc: Get rid of the Rx ring
Get rid of the Rx ring and replace it with a pair of queues instead.  One
queue gets the packets that are in-sequence and are ready for processing by
recvmsg(); the other queue gets the out-of-sequence packets for addition to
the first queue as the holes get filled.

The annotation ring is removed and replaced with a SACK table.  The SACK
table has the bits set that correspond exactly to the sequence number of
the packet being acked.  The SACK ring is copied when an ACK packet is
being assembled and rotated so that the first ACK is in byte 0.

Flow control handling is altered so that packets that are moved to the
in-sequence queue are hard-ACK'd even before they're consumed - and then
the Rx window size in the ACK packet (rsize) is shrunk down to compensate
(even going to 0 if the window is full).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
d4d02d8bb5 rxrpc: Clone received jumbo subpackets and queue separately
Split up received jumbo packets into separate skbuffs by cloning the
original skbuff for each subpacket and setting the offset and length of the
data in that subpacket in the skbuff's private data.  The subpackets are
then placed on the recvmsg queue separately.  The security class then gets
to revise the offset and length to remove its metadata.

If we fail to clone a packet, we just drop it and let the peer resend it.
The original packet gets used for the final subpacket.

This should make it easier to handle parallel decryption of the subpackets.
It also simplifies the handling of lost or misordered packets in the
queuing/buffering loop as the possibility of overlapping jumbo packets no
longer needs to be considered.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
faf92e8d53 rxrpc: Split the rxrpc_recvmsg tracepoint
Split the rxrpc_recvmsg tracepoint so that the tracepoints that are about
data packet processing (and which have extra pieces of information) are
separate from the tracepoint that shows the general flow of recvmsg().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
530403d9ba rxrpc: Clean up ACK handling
Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs.  All other
ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval.  Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
72f0c6fb05 rxrpc: Allocate ACK records at proposal and queue for transmission
Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
02a1935640 rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted
Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a
socket buffer so that it can be placed onto multiple queues at once.  This
also allows the data buffer to be in the same allocation as the internal
data.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
a11e6ff961 rxrpc: Remove call->tx_phase
Remove call->tx_phase as it's only ever set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
27f699ccb8 rxrpc: Remove the flags from the rxrpc_skb tracepoint
Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
be using this for the transmission buffers and so marking which are
transmission buffers isn't going to be necessary.

Note that this also remove the rxrpc skb flag that indicates if this is a
transmission buffer and so the count is not updated for the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
23b237f325 rxrpc: Remove unnecessary header inclusions
Remove a bunch of unnecessary header inclusions.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
ed472b0c87 rxrpc: Call udp_sendmsg() directly
Call udp_sendmsg() and udpv6_sendmsg() directly rather than calling
kernel_sendmsg() as the latter assumes we want a kvec-class iterator.
However, zerocopy explicitly doesn't work with such an iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
b6c66c4324 rxrpc: Use the core ICMP/ICMP6 parsers
Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or
ipv6_icmp_error() as appropriate to do the parsing rather than trying to do
it in rxrpc.

This pushes an error report onto the UDP socket's error queue and calls
->sk_error_report() from which point rxrpc can pick it up.

It would be preferable to steal the packet directly from ip*_icmp_error()
rather than letting it get queued, but this is probably good enough.

Also note that __udp4_lib_err() calls sk_error_report() twice in some
cases.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells
42fb06b391 net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()
Change the udp encap_err_rcv signature to match ip_icmp_error() and
ipv6_icmp_error() so that those can be used from the called function and
export them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
2022-11-08 16:42:28 +00:00
David Howells
8889a711f9 rxrpc: Fix ack.bufferSize to be 0 when generating an ack
ack.bufferSize should be set to 0 when generating an ack.

Fixes: 8d94aa381d ("rxrpc: Calls shouldn't hold socket refs")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
f7fa52421f rxrpc: Record stats for why the REQUEST-ACK flag is being set
Record stats for why the REQUEST-ACK flag is being set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
f2a676d100 rxrpc: Record statistics about ACK types
Record statistics about the different types of ACKs that have been
transmitted and received and the number of ACKs that have been filled out
and transmitted or that have been skipped.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
b015424695 rxrpc: Add stats procfile and DATA packet stats
Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
what rxrpc has been doing.  Writing a blank line to the stats file will
clear the increment-only counters.  Allocated resource counters don't get
cleared.

Add some counters to count various things about DATA packets, including the
number created, transmitted and retransmitted and the number received, the
number of ACK-requests markings and the number of jumbo packets received.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
589a0c1e0a rxrpc: Track highest acked serial
Keep track of the highest DATA serial number that has been acked by the
peer for future purposes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
334dfbfc5a rxrpc: Split call timer-expiration from call timer-set tracepoint
Split the tracepoint for call timer-set to separate out the call
timer-expiration event

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
4d843be56b rxrpc: Trace setting of the request-ack flag
Add a tracepoint to log why the request-ack flag is set on an outgoing DATA
packet, allowing debugging as to why.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells
c3d96f690a net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write()
Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write().

Also provide a fallback for proc_create_net_data_write().

Fixes: 564def7176 ("proc: Add a way to make network proc files writable")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
2022-11-08 16:42:15 +00:00
Dmitry Torokhov
f8f797f35a nfc: s3fwrn5: use devm_clk_get_optional_enabled() helper
Because we enable the clock immediately after acquiring it in probe,
we can combine the 2 operations and use devm_clk_get_optional_enabled()
helper.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 11:27:59 +01:00
David S. Miller
92c7076e4a Merge branch 'txgbe'
Jiawen Wu says:

====================
net: WangXun txgbe ethernet driver

This patch series adds support for WangXun 10 gigabit NIC, to initialize
hardware, set mac address, and register netdev.

Change log:
v6: address comments:
    Jakub Kicinski: check with scripts/kernel-doc
v5: address comments:
    Jakub Kicinski: clean build with W=1 C=1
v4: address comments:
    Andrew Lunn: https://lore.kernel.org/all/YzXROBtztWopeeaA@lunn.ch/
v3: address comments:
    Andrew Lunn: remove hw function ops, reorder functions, use BIT(n)
                 for register bit offset, move the same code of txgbe
                 and ngbe to libwx
v2: address comments:
    Andrew Lunn: https://lore.kernel.org/netdev/YvRhld5rD%2FxgITEg@lunn.ch/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 11:25:53 +01:00
Jiawen Wu
d21d2c7f58 net: txgbe: Set MAC address and register netdev
Add MAC address related operations, and register netdev.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 11:25:53 +01:00
Jiawen Wu
b08012568e net: txgbe: Reset hardware
Reset and initialize the hardware by configuring the MAC layer.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 11:25:53 +01:00
Jiawen Wu
a34b3e6ed8 net: txgbe: Store PCI info
Get PCI config space info, set LAN id and check flash status.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 11:25:53 +01:00
David S. Miller
957ed5e712 Merge branch 'tcp-plb'
Mubashir Adnan Qureshi says:

====================
net: Add PLB functionality to TCP

This patch series adds PLB (Protective Load Balancing) to TCP and hooks
it up to DCTCP. PLB is disabled by default and can be enabled using
relevant sysctls and support from underlying CC.

PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
71fc704768 tcp: add rcv_wnd and plb_rehash to TCP_INFO
rcv_wnd can be useful to diagnose TCP performance where receiver window
becomes the bottleneck. rehash reports the PLB and timeout triggered
rehash attempts by the TCP connection.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
29c1c44646 tcp: add u32 counter in tcp_sock and an SNMP counter for PLB
A u32 counter is added to tcp_sock for counting the number of PLB
triggered rehashes for a TCP connection. An SNMP counter is also
added to count overall PLB triggered rehash events for a host. These
counters are hooked up to PLB implementation for DCTCP.

TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports
the rehash attempts triggered due to PLB or timeouts. This gives
a historical view of sustained congestion or timeouts experienced
by the TCP connection.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
c30f8e0b04 tcp: add support for PLB in DCTCP
PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the
congestion signal, PLB also uses ECN to make decisions whether to change
the path or not upon sustained congestion.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
1a91bb7c3e tcp: add PLB functionality for TCP
Congestion control algorithms track PLB state and cause the connection
to trigger a path change when either of the 2 conditions is satisfied:

- No packets are in flight and (# consecutive congested rounds >=
  sysctl_tcp_plb_idle_rehash_rounds)
- (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds)

A round (RTT) is marked as congested when congestion signal
(ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh.
In the event of RTO, PLB (via tcp_write_timeout()) triggers a path
change and disables congestion-triggered path changes for random time
between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec)
to avoid hopping onto the "connectivity blackhole". RTO-triggered
path changes can still happen during this cool-off period.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
Mubashir Adnan Qureshi
bd456f283b tcp: add sysctls for TCP PLB parameters
PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226

This commit adds new sysctl knobs and sets their default values for
TCP PLB.

Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:47:42 +01:00
David S. Miller
7f86cf50cf Merge branch 'mxl-gpy-MDI-X'
Raju Lakkaraju says:

====================
net: phy: mxl-gpy: Add MDI-X

This patch series add the MDI-X feature to GPY211 PHYs and
Also Change return type to gpy_update_interface() function
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:35:51 +01:00
Raju Lakkaraju
fd8825cd8c net: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips
Add support for MDI-X status and configuration for GPY211 chips

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:35:51 +01:00
Raju Lakkaraju
7a495dde27 net: phy: mxl-gpy: Change gpy_update_interface() function return type
gpy_update_interface() is called from gpy_read_status() which does
return error codes. gpy_read_status() would benefit from returning
-EINVAL, etc.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28 10:35:51 +01:00
Jakub Kicinski
12dee519d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move struct nft_payload_set definition to .c file where it is
   only used.

2) Shrink transport and inner header offset fields in the nft_pktinfo
   structure to 16-bits, from Florian Westphal.

3) Get rid of nft_objref Kbuild toggle, make it built-in into
   nf_tables. This expression is used to instantiate conntrack helpers
   in nftables. After removing the conntrack helper auto-assignment
   toggle it this feature became more important so move it to the nf_tables
   core module. Also from Florian.

4) Extend the existing function to calculate payload inner header offset
   to deal with the GRE and IPIP transport protocols.

6) Add inner expression support for nf_tables. This new expression
   provides a packet parser for tunneled packets which uses a userspace
   description of the expected inner headers. The inner expression
   invokes the payload expression (via direct call) to match on the
   inner header protocol fields using the inner link, network and
   transport header offsets.

   An example of the bytecode generated from userspace to match on
   IP source encapsulated in a VxLAN packet:

   # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
     netdev x y
       [ meta load l4proto => reg 1 ]
       [ cmp eq reg 1 0x00000011 ]
       [ payload load 2b @ transport header + 2 => reg 1 ]
       [ cmp eq reg 1 0x0000b512 ]
       [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
       [ cmp eq reg 1 0x00000008 ]
       [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
       [ cmp eq reg 1 0x04030201 ]

7) Store inner link, network and transport header offsets in percpu
   area to parse inner packet header once only. Matching on a different
   tunnel type invalidates existing offsets in the percpu area and it
   invokes the inner tunnel parser again.

8) Add support for inner meta matching. This support for
   NFTA_META_PROTOCOL, which specifies the inner ethertype, and
   NFT_META_L4PROTO, which specifies the inner transport protocol.

9) Extend nft_inner to parse GENEVE optional fields to calculate the
   link layer offset.

10) Update inner expression so tunnel offset points to GRE header
    to normalize tunnel header handling. This also allows to perform
    different interpretations of the GRE header from userspace.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_inner: set tunnel offset to GRE header offset
  netfilter: nft_inner: add geneve support
  netfilter: nft_meta: add inner match support
  netfilter: nft_inner: add percpu inner context
  netfilter: nft_inner: support for inner tunnel header matching
  netfilter: nft_payload: access ipip payload for inner offset
  netfilter: nft_payload: access GRE payload via inner offset
  netfilter: nft_objref: make it builtin
  netfilter: nf_tables: reduce nft_pktinfo by 8 bytes
  netfilter: nft_payload: move struct nft_payload_set definition where it belongs
====================

Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:41:05 -07:00
Yang Li
148b811c77 net: dpaa2-eth: Simplify bool conversion
./drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c:453:42-47: WARNING: conversion to bool not needed here

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2577
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221026051824.38730-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:31 -07:00
Jakub Kicinski
3e8e4aa5da Merge branch 'ionic-vf-attr-replay-and-other-updates'
Shannon Nelson says:

====================
ionic: VF attr replay and other updates

For better VF management when a FW update restart or a FW crash recover is
detected, the PF now will replay any user specified VF attributes to be
sure the FW hasn't lost them in the restart.

Newer FW offers more packet processing offloads, so we now support them in
the driver.

A small refactor of the Rx buffer fill cleans a bit of code and will help
future work on buffer caching.
====================

Link: https://lore.kernel.org/r/20221026143744.11598-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:19 -07:00
Neel Patel
e55f0f5bef ionic: refactor use of ionic_rx_fill()
The same pre-work code is used before each call to
ionic_rx_fill(), so bring it in and make it a part of
the routine.

Signed-off-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:14 -07:00
Neel Patel
cad478c7c3 ionic: enable tunnel offloads
Support stateless offloads for GRE, VXLAN, GENEVE, IPXIP4
and IPXIP6 when the FW supports them.

Signed-off-by: Neel Patel <neel@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:13 -07:00
Shannon Nelson
f43a96d91d ionic: new ionic device identity level and VF start control
A new ionic dev_cmd is added to the interface in ionic_if.h,
with a new capabilities field in the ionic device identity to
signal its availability in the FW.  The identity level code is
incremented to '2' to show support for this new capabilities
bitfield.

If the driver has indicated with the new identity level that
it has the VF_CTRL command, newer FW will wait for the start
command before starting the VFs after a FW update or crash
recovery.

This patch updates the driver to make use of the new VF start
control in fw_up path to be sure that the PF has set the user
attributes on the VF before the FW allows the VFs to restart.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:13 -07:00
Shannon Nelson
23e884a253 ionic: only save the user set VF attributes
Report the current FW values for the VF attributes, but don't
save the FW values locally, only save the vf attributes that
are given to us from the user.  This allows us to replay user
data, and doesn't end up confusing things like "who set the
mac address".

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:13 -07:00
Shannon Nelson
db28adf9af ionic: replay VF attributes after fw crash recovery
The VF attributes that the user has set into the FW through
the PF can be lost over a FW crash recovery.  Much like we
already replay the PF mac/vlan filters, we now add a replay
in the recovery path to be sure the FW has the up-to-date
VF configurations.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 20:34:12 -07:00
Jakub Kicinski
31f1aa4f74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
  2871edb32f ("can: kvaser_usb: Fix possible completions during init_completion")
  abb8670938 ("can: kvaser_usb_leaf: Ignore stale bus-off after start")
  8d21f5927a ("can: kvaser_usb_leaf: Fix improved state not being reported")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27 16:56:36 -07:00
Linus Torvalds
2375886721 Including fixes from 802.15.4 (Zigbee et al.).
Current release - regressions:
 
  - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1
 
 Current release - new code bugs:
 
  - mptcp: fix abba deadlock on fastopen
 
  - eth: stmmac: rk3588: allow multiple gmac controllers in one system
 
 Previous releases - regressions:
 
  - ip: rework the fix for dflt addr selection for connected nexthop
 
  - net: couple more fixes for misinterpreting bits in struct page after
    the signature was added
 
 Previous releases - always broken:
 
  - ipv6: ensure sane device mtu in tunnels
 
  - openvswitch: switch from WARN to pr_warn on a user-triggerable path
 
  - ethtool: eeprom: fix null-deref on genl_info in dump
 
  - ieee802154: more return code fixes for corner cases in dgram_sendmsg
 
  - mac802154: fix link-quality-indicator recording
 
  - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload
 
  - eth: fec: limit register access on i.MX6UL
 
  - eth: bcm4908_enet: update TX stats after actual transmission
 
  - can: rcar_canfd: improve IRQ handling for RZ/G2L
 
 Misc:
 
  - genetlink: piggy back on the newly added resv_op_start to enforce
    more sanity checks on new commands
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNa2CIACgkQMUZtbf5S
 IrsEDhAAsqvsIqhnwaDuvzTpdz/l2ZiLyRixue+Z5Q88/LkSYC7SRMjh70TzbYEj
 ENbB+hzGt9zDYIga1+vtLU13rENiI+3V0Pr5eOK9jVV2KBwQmgj1PatjlLhfQ8aa
 q9c/dg3YqKFcsLjHpCZC1O3imDEU+Wt1XV+N2tuoOhJ1QVPSemjSVUEgIP+qLTD7
 cXd+bWpcEXq/X0jkptElGsCM4RHxuN9MCcQDoGfdyoGEmXDi17BmmJEVu4LWdamg
 bPlky2uerFBtuUyK3jSvsoTI0VHwcxAr/MSmMxwcRGMr/smy/1UIKfehSJUOXFsr
 XeN4pfgezqPvl4l7LjC0xx83zg1UffKGhkGuu47MS3A8rS+zSo9CEH993owOb5Ty
 ZH5ZhBsdS6wchCbM15eqEby2ATYh/pYf8gNEBYfItsj2QuIPoqt8h19yQ4Gu1eX2
 1w1RpDJH0SyD02hsmfRWKzjehHNbNM+cQ2+prVazhXuSmhGxTOqWsirv6mThlfm6
 IEuG62d0VOYFoRBKxTV27S57QyfT0/+uMyu7UjDX5lieJGXvN6wGH7UlOUDBC5j/
 4GhW8Li4hxskxv292S8nvwANAOY02wWaunVsEtLYwB+7erkPDISUkiUjdxi4Uc7W
 yfxqbhW70Yd9sDEoKXGRsQ21nl82ZBeUIWPx/xLr+F6PuKdvUHo=
 =g5TW
 -----END PGP SIGNATURE-----

Merge tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from 802.15.4 (Zigbee et al).

  Current release - regressions:

   - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1

  Current release - new code bugs:

   - mptcp: fix abba deadlock on fastopen

   - eth: stmmac: rk3588: allow multiple gmac controllers in one system

  Previous releases - regressions:

   - ip: rework the fix for dflt addr selection for connected nexthop

   - net: couple more fixes for misinterpreting bits in struct page
     after the signature was added

  Previous releases - always broken:

   - ipv6: ensure sane device mtu in tunnels

   - openvswitch: switch from WARN to pr_warn on a user-triggerable path

   - ethtool: eeprom: fix null-deref on genl_info in dump

   - ieee802154: more return code fixes for corner cases in
     dgram_sendmsg

   - mac802154: fix link-quality-indicator recording

   - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack
     offload

   - eth: fec: limit register access on i.MX6UL

   - eth: bcm4908_enet: update TX stats after actual transmission

   - can: rcar_canfd: improve IRQ handling for RZ/G2L

  Misc:

   - genetlink: piggy back on the newly added resv_op_start to enforce
     more sanity checks on new commands"

* tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  net: enetc: survive memory pressure without crashing
  kcm: do not sense pfmemalloc status in kcm_sendpage()
  net: do not sense pfmemalloc status in skb_append_pagefrags()
  net/mlx5e: Fix macsec sci endianness at rx sa update
  net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function
  net/mlx5e: Fix macsec rx security association (SA) update/delete
  net/mlx5e: Fix macsec coverity issue at rx sa update
  net/mlx5: Fix crash during sync firmware reset
  net/mlx5: Update fw fatal reporter state on PCI handlers successful recover
  net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed
  net/mlx5e: TC, Reject forwarding from internal port to internal port
  net/mlx5: Fix possible use-after-free in async command interface
  net/mlx5: ASO, Create the ASO SQ with the correct timestamp format
  net/mlx5e: Update restore chain id for slow path packets
  net/mlx5e: Extend SKB room check to include PTP-SQ
  net/mlx5: DR, Fix matcher disconnect error flow
  net/mlx5: Wait for firmware to enable CRS before pci_restore_state
  net/mlx5e: Do not increment ESN when updating IPsec ESN state
  netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed
  netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed
  ...
2022-10-27 13:36:59 -07:00
Linus Torvalds
7dd257d02e execve fixes for v6.1-rc3
- Fix an ancient signal action copy race. (Bernd Edlinger)
 
 - Fix a memory leak in ELF loader, when under memory pressure. (Li Zetao)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmNa1xEWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJoLqD/927ZXWxVLQ0GygmNz3xSEZh+5c
 34flrZv4LUDQPw1rNXycWx2D5MQv5MehrpsMvF+11pu/M1EP3e3+R3bngFeFXtBo
 12ov3yEloe6yA8bOPPWEDB1fU8K7C9aODKMcJOoWFCk20g7uQGYS8+GCUGhLxjHs
 mZn5U8OuEGGvn4QuGknIps+Ddca2SHuJ7jBtsw8NVjuvtWcAhlw9PYNbLTJEgBzU
 0zsfK68idMpQHDPvWMmoRcwAXn3kiVzc3wKeR9Zdx9q2NyDIS+OxgynEAc3fM2rf
 ag19+Epn6GUGPMakS/zJNQS0wCA4+pJi60Z+Hlddy0WNUocg55uHd0zY7xcT3s75
 rsPtbTeabOrtzQMf7lSpsn5OUeCDJjc3KcZIlmILaZaVXUZv+jvysRwH7CRdDNNS
 gM2j9nu87I8TbSPXbY79KutvucfKAl88iWxRgFqnzyqzRYLWahwWSKsiVubH7OoU
 kUYdDdPmiZh7XAqTFUsMF4++wyx/PAwU7RdYuxaUvHZd6PT8J92AqIisPwRT9ojL
 oqLpgRoeYX3JY7aDyvBjYan2IKfIPhB0WZF9vCeHVoTXoEy/LVZeWVNoBXyO6ILl
 BYzBAjp5oJRLbJYVtjI4/gkDizdtpAu8YYRYX36TUvBAkFqpGYn9dvySpMGl24uJ
 g3IEqTj/kajeZleHnQ==
 =dHXB
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

 - Fix an ancient signal action copy race (Bernd Edlinger)

 - Fix a memory leak in ELF loader, when under memory pressure (Li
   Zetao)

* tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs/binfmt_elf: Fix memory leak in load_elf_binary()
  exec: Copy oldsighand->action under spin-lock
2022-10-27 13:16:36 -07:00