linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-02 07:04:24 +00:00

Author	SHA1	Message	Date
Matthieu Baerts	1c0b0ee264	selftests: mptcp: userspace: refactor asserts Instead of having a long list of conditions to check, it is possible to give a list of variable names to compare with their 'e_XXX' version. This will ease the introduction of the following commit which will print which condition has failed (if any). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Matthieu Baerts	f790ae03db	selftests: mptcp: userspace: print titles This script is running a few tests after having setup the environment. Printing titles helps understand what is being tested. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Matthieu Baerts	40c71f763f	mptcp: userspace pm: use a single point of exit Like in all other functions in this file, a single point of exit is used when extra operations are needed: unlock, decrement refcount, etc. There is no functional change for the moment but it is better to do the same here to make sure all cleanups are done in case of intermediate errors. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Paolo Abeni	ad3493746e	selftests: mptcp: add test-cases for mixed v4/v6 subflows Note that we can't guess the listener family anymore based on the client target address: always use IPv6. The fullmesh flag with endpoints from different families is also validated here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Matthieu Baerts	7e9740e0e8	mptcp: propagate sk_ipv6only to subflows Usually, attributes are propagated to subflows as well. Here, if subflows are created by other ways than the MPTCP path-manager, it is important to make sure they are in v6 if it is asked by the userspace. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Paolo Abeni	b9d69db87f	mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses Currently the in-kernel PM arbitrary enforces that created subflow's family must match the main MPTCP socket while the RFC allows mixing IPv4 and IPv6 subflows. This patch changes the in-kernel PM logic to create subflows matching the currently selected source (or destination) address. IPv4 sockets can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as long as the source and destination matches. A helper, previously introduced is used to ease family matching checks, taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269 Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 13:33:30 +01:00
Jamie Bainbridge	d0941130c9	icmp: Add counters for rate limits There are multiple ICMP rate limiting mechanisms: * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask However, when ICMP output is limited, there is no way to tell which limit has been hit or even if the limits are responsible for the lack of ICMP output. Add counters for each of the cases above. As we are within local_bh_disable(), use the __INC stats variant. Example output: # nstat -sz "RateLimit" IcmpOutRateLimitGlobal 134 0.0 IcmpOutRateLimitHost 770 0.0 Icmp6OutRateLimitHost 84 0.0 Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com> Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:52:18 +01:00
Paolo Abeni	9f92752788	Merge branch 'adding-sparx5-is0-vcap-support' Steen Hegelund says: ==================== Adding Sparx5 IS0 VCAP support This provides the Ingress Stage 0 (IS0) VCAP (Versatile Content-Aware Processor) support for the Sparx5 platform. The IS0 VCAP (also known in the datasheet as CLM) is a classifier VCAP that mainly extracts frame information to metadata that follows the frame in the Sparx5 processing flow all the way to the egress port. The IS0 VCAP has 4 lookups and they are accessible with a TC chain id: - chain 1000000: IS0 Lookup 0 - chain 1100000: IS0 Lookup 1 - chain 1200000: IS0 Lookup 2 - chain 1300000: IS0 Lookup 3 - chain 1400000: IS0 Lookup 4 - chain 1500000: IS0 Lookup 5 Each of these lookups have their own port keyset configuration that decides which keys will be used for matching on which traffic type. The IS0 VCAP has these traffic classifications: - IPv4 frames - IPv6 frames - Unicast MPLS frames (ethertype = 0x8847) - Multicast MPLS frames (ethertype = 0x8847) - Other frame types than MPLS, IPv4 and IPv6 The IS0 VCAP has an action that allows setting the value of a PAG (Policy Association Group) key field in the frame metadata, and this can be used for matching in an IS2 VCAP rule. This allow rules in the IS0 VCAP to be linked to rules in the IS2 VCAP. The linking is exposed by using the TC "goto chain" action with an offset from the IS2 chain ids. As an example a "goto chain 8000001" will use a PAG value of 1 to chain to a rule in IS2 Lookup 0. ==================== Link: https://lore.kernel.org/r/20230124104511.293938-1-steen.hegelund@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:08:05 +01:00
Steen Hegelund	52df82cc91	net: microchip: sparx5: Add support for IS0 VCAP CVLAN TC keys This adds support for parsing and matching on the CVLAN tags in the Sparx5 IS0 VCAP. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	63e3564507	net: microchip: sparx5: Add support for IS0 VCAP ethernet protocol types This allows the IS0 VCAP to have its own list of supported ethernet protocol types matching what is supported by the VCAPs port lookup classification. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	81e164c4ae	net: microchip: sparx5: Add automatic selection of VCAP rule actionset With more than one possible actionset in a VCAP instance, the VCAP API will now use the actions in a VCAP rule to select the actionset that fits these actions the best possible way. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	88bd9ea70b	net: microchip: sparx5: Add TC filter chaining support for IS0 and IS2 VCAPs This allows rules to be chained between VCAP instances, e.g. from IS0 Lookup 0 to IS0 Lookup 1, or from one of the IS0 Lookups to one of the IS2 Lookups. Chaining from an IS2 Lookup to another IS2 Lookup is not supported in the hardware. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	542e6e2c20	net: microchip: sparx5: Add TC support for IS0 VCAP This enables the TC command to use the Sparx5 IS0 VCAP Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	7306fcd17c	net: microchip: sparx5: Add actionset type id information to rule This adds the actionset type id to the rule information. This is needed as we now have more than one actionset in a VCAP instance (IS0). Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	545609fd4e	net: microchip: sparx5: Add IS0 VCAP keyset configuration for Sparx5 This adds the IS0 VCAP port keyset configuration for Sparx5 and also updates the debugFS support to show the keyset configuration. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Steen Hegelund	f274a659fb	net: microchip: sparx5: Add IS0 VCAP model and updated KUNIT VCAP model This provides the IS0 (Ingress Stage 0) or CLM VCAP model for Sparx5. This VCAP provides classification actions for Sparx5. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-26 10:07:44 +01:00
Jakub Kicinski	3f17e16f38	Merge branch 'add-ip_local_port_range-socket-option' Jakub Sitnicki says: ==================== Add IP_LOCAL_PORT_RANGE socket option This patch set is a follow up to the "How to share IPv4 addresses by partitioning the port space" talk given at LPC 2022 [1]. Please see patch #1 for the motivation & the use case description. Patch #2 adds tests exercising the new option in various scenarios. Documentation ------------- Proposed update to the ip(7) man-page: IP_LOCAL_PORT_RANGE (since Linux X.Y) Set or get the per-socket default local port range. This option can be used to clamp down the global local port range, defined by the ip_local_port_range /proc interface described below, for a given socket. The option takes an uint32_t value with the high 16 bits set to the upper range bound, and the low 16 bits set to the lower range bound. Range bounds are inclusive. The 16-bit values should be in host byte order. The lower bound has to be less than the upper bound when both bounds are not zero. Otherwise, setting the option fails with EINVAL. If either bound is outside of the global local port range, or is zero, then that bound has no effect. To reset the setting, pass zero as both the upper and the lower bound. Interaction with SELinux bind() hook ------------------------------------ SELinux bind() hook - selinux_socket_bind() - performs a permission check if the requested local port number lies outside of the netns ephemeral port range. The proposed socket option cannot be used change the ephemeral port range to extend beyond the per-netns port range, as set by net.ipv4.ip_local_port_range. Hence, there is no interaction with SELinux, AFAICT. RFC -> v1 RFC: https://lore.kernel.org/netdev/20220912225308.93659-1-jakub@cloudflare.com/ * Allow either the high bound or the low bound, or both, to be zero * Add getsockopt support * Add selftests Links: ------ [1]: https://lpc.events/event/16/contributions/1349/ ==================== Link: https://lore.kernel.org/r/20221221-sockopt-port-range-v6-0-be255cc0e51f@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-25 22:45:02 -08:00
Jakub Sitnicki	ae5439658c	selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option Exercise IP_LOCAL_PORT_RANGE socket option in various scenarios: 1. pass invalid values to setsockopt 2. pass a range outside of the per-netns port range 3. configure a single-port range 4. exhaust a configured multi-port range 5. check interaction with late-bind (IP_BIND_ADDRESS_NO_PORT) 6. set then get the per-socket port range Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-25 22:45:00 -08:00
Jakub Sitnicki	91d0b78c51	inet: Add IP_LOCAL_PORT_RANGE socket option Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 \| PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski <marek@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-25 22:45:00 -08:00
Randy Dunlap	6a7a2c18a9	net: Kconfig: fix spellos Fix spelling in net/ Kconfig files. (reported by codespell) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: coreteam@netfilter.org Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-25 22:39:56 -08:00
Vladimir Oltean	f5be9caf7b	net: ethtool: fix NULL pointer dereference in pause_prepare_data() In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> pause_prepare_data struct genl_info *info will be passed as NULL, and pause_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: `04692c9020` ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:57:41 +00:00
Vladimir Oltean	c96de13632	net: ethtool: fix NULL pointer dereference in stats_prepare_data() In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> stats_prepare_data struct genl_info *info will be passed as NULL, and stats_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: `04692c9020` ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:56:31 +00:00
David S. Miller	99db6fb043	Merge branch 's390-ism-generalized-interface' Jan Karcher says: ==================== drivers/s390/net/ism: Add generalized interface Previously, there was no clean separation between SMC-D code and the ISM device driver.This patch series addresses the situation to make ISM available for uses outside of SMC-D. In detail: SMC-D offers an interface via struct smcd_ops, which only the ISM module implements so far. However, there is no real separation between the smcd and ism modules, which starts right with the ISM device initialization, which calls directly into the SMC-D code. This patch series introduces a new API in the ISM module, which allows registration of arbitrary clients via include/linux/ism.h: struct ism_client. Furthermore, it introduces a "pure" struct ism_dev (i.e. getting rid of dependencies on SMC-D in the device structure), and adds a number of API calls for data transfers via ISM (see ism_register_dmb() & friends). Still, the ISM module implements the SMC-D API, and therefore has a number of internal helper functions for that matter. Note that the ISM API is consciously kept thin for now (as compared to the SMC-D API calls), as a number of API calls are only used with SMC-D and hardly have any meaningful usage beyond SMC-D, e.g. the VLAN-related calls. v1 -> v2: Removed s390x dependency which broke config for other archs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:49 +00:00
Stefan Raspl	8c81ba2034	net/smc: De-tangle ism and smc device initialization The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:49 +00:00
Stefan Raspl	820f21009f	s390/ism: Consolidate SMC-D-related code The ism module had SMC-D-specific code sprinkled across the entire module. We are now consolidating the SMC-D-specific parts into the latter parts of the module, so it becomes more clear what code is intended for use with ISM, and which parts are glue code for usage in the context of SMC-D. This is the fourth part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:49 +00:00
Stefan Raspl	9de4df7b6b	net/smc: Separate SMC-D and ISM APIs We separate the code implementing the struct smcd_ops API in the ISM device driver from the functions that may be used by other exploiters of ISM devices. Note: We start out small, and don't offer the whole breadth of the ISM device for public use, as many functions are specific to or likely only ever used in the context of SMC-D. This is the third part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Stefan Raspl	8747716f39	net/smc: Register SMC-D as ISM client Register the smc module with the new ism device driver API. This is the second part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Stefan Raspl	89e7d2ba61	net/ism: Add new API for client registration Add a new API that allows other drivers to concurrently access ISM devices. To do so, we introduce a new API that allows other modules to register for ISM device usage. Furthermore, we move the GID to struct ism, where it belongs conceptually, and rename and relocate struct smcd_event to struct ism_event. This is the first part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Stefan Raspl	1baedb13f1	s390/ism: Introduce struct ism_dmb Conceptually, a DMB is a structure that belongs to ISM devices. However, SMC currently 'owns' this structure. So future exploiters of ISM devices would be forced to include SMC headers to work - which is just weird. Therefore, we switch ISM to struct ism_dmb, introduce a new public header with the definition (will be populated with further API calls later on), and, add a thin wrapper to please SMC. Since structs smcd_dmb and ism_dmb are identical, we can simply convert between the two for now. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Stefan Raspl	462502ff9a	net/ism: Add missing calls to disable bus-mastering Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Stefan Raspl	c40bff4132	net/smc: Terminate connections prior to device removal Removing an ISM device prior to terminating its associated connections doesn't end well. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:46:48 +00:00
Parav Pandit	d067111586	virtio-net: Reduce debug name field size to 16 bytes virtio queue index can be maximum of 65535. 16 bytes are enough to store the vq name with the existing string prefix. With this change, send queue struct saves 24 bytes and receive queue saves whole cache line worth 64 bytes per structure due to saving in alignment bytes. Pahole results before: pahole -s drivers/net/virtio_net.o \| \ grep -e "send_queue" -e "receive_queue" send_queue 1112 0 receive_queue 1280 1 Pahole results after: pahole -s drivers/net/virtio_net.o \| \ grep -e "send_queue" -e "receive_queue" send_queue 1088 0 receive_queue 1216 1 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-01-25 09:27:13 +00:00
Jakub Kicinski	4373a023e0	devlink: remove a dubious assumption in fmsg dumping Build bot detects that err may be returned uninitialized in devlink_fmsg_prepare_skb(). This is not really true because all fmsgs users should create at least one outer nest, and therefore fmsg can't be completely empty. That said the assumption is not trivial to confirm, so let's follow the bots advice, anyway. This code does not seem to have changed since its inception in commit `1db64e8733` ("devlink: Add devlink formatted message (fmsg) API") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230124035231.787381-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 20:31:35 -08:00
Vladimir Oltean	28113cfada	net: mscc: ocelot: fix incorrect verify_enabled reporting in ethtool get_mm() We don't read the verify_enabled variable from hardware in the MAC Merge layer state GET operation, instead we always leave it set to "false". The user may think something is wrong if they set verify_enabled to true, then read it back and see it's still false, even though the configuration took place. Fixes: `6505b68056` ("net: mscc: ocelot: add MAC Merge layer support for VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230123184538.3420098-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 18:34:20 -08:00
James Hershaw	74b4f1739d	nfp: flower: change get/set_eeprom logic and enable for flower reps The changes in this patch are as follows: - Alter the logic of get/set_eeprom functions to use the helper function nfp_app_from_netdev() which handles differentiating between an nfp_net and a nfp_repr. This allows us to get an agnostic backpointer to the pdev. - Enable the various eeprom commands by adding the 'get_eeprom_len', 'get_eeprom', 'set_eeprom' callbacks to the nfp_port_ethtool_ops struct. This allows the eeprom commands to work on representor interfaces, similar to a previous patch which added it to the vnics. Currently these are being used to configure persistent MAC addresses for the physical ports on the nfp. Signed-off-by: James Hershaw <james.hershaw@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230123134135.293278-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 18:19:12 -08:00
Guillaume Nault	90317bcdbd	ipv6: Make ip6_route_output_flags_noref() static. This function is only used in net/ipv6/route.c and has no reason to be visible outside of it. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/50706db7f675e40b3594d62011d9363dce32b92e.1674495822.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 18:12:52 -08:00
Jakub Kicinski	ec8f7d495b	netlink: fix spelling mistake in dump size assert Commit `2c7bc10d0f` ("netlink: add macro for checking dump ctx size") misspelled the name of the assert as asset, missing an R. Reported-by: Ido Schimmel <idosch@idosch.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230123222224.732338-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-24 16:29:11 -08:00
Paolo Abeni	c554520f2c	Merge branch 'netlink-protocol-specs' Jakub Kicinski says: ==================== Netlink protocol specs I think the Netlink proto specs are far along enough to merge. Filling in all attribute types and quirks will be an ongoing effort but we have enough to cover FOU so it's somewhat complete. I fully intend to continue polishing the code but at the same time I'd like to start helping others base their work on the specs (e.g. DPLL) and need to start working on some new families myself. That's the progress / motivation for merging. The RFC [1] has more of a high level blurb, plus I created a lot of documentation, I'm not going to repeat it here. There was also the talk at LPC [2]. [1] https://lore.kernel.org/all/20220811022304.583300-1-kuba@kernel.org/ [2] https://youtu.be/9QkXIQXkaQk?t=2562 v2: https://lore.kernel.org/all/20220930023418.1346263-1-kuba@kernel.org/ v3: https://lore.kernel.org/all/20230119003613.111778-1-kuba@kernel.org/1 v4: - spec improvements (patch 2) - Python cleanup (patch 3) - rename auto-gen files and use the right comment style ==================== Link: https://lore.kernel.org/r/20230120175041.342573-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 11:02:03 +01:00
Jakub Kicinski	e4b48ed460	tools: ynl: add a completely generic client Add a CLI sample which can take in arbitrary request in JSON format, convert it to Netlink and do the inverse for output. It's meant as a development tool primarily and perhaps for selftests which need to tickle netlink in a special way. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	1d562c32e4	net: fou: use policy and operation tables generated from the spec Generate and plug in the spec-based tables. A little bit of renaming is needed in the FOU code. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	08d323234d	net: fou: rename the source for linking We'll need to link two objects together to form the fou module. This means the source can't be called fou, the build system expects fou.o to be the combined object. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	3a330496ba	net: fou: regenerate the uAPI from the spec Regenerate the FOU uAPI header from the YAML spec. The flags now come before attributes which use them, and the comments for type disappear (coders should look at the spec instead). Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	4eb77b4ecd	netlink: add a proto specification for FOU FOU has a reasonably modern Genetlink family. Add a spec. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	be5bea1cc0	net: add basic C code generators for Netlink Code generators to turn Netlink specs into C code. I'm definitely not proud of it. The main generator is in Python, there's a bash script to regen all code-gen'ed files in tree after making spec changes. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	e616c07ca5	netlink: add schemas for YAML specs Add schemas for Netlink spec files. As described in the docs we have 4 "protocols" or compatibility levels, and each one comes with its own schema, but the more general / legacy schemas are superset of more modern ones: genetlink is the smallest followed by genetlink-c and genetlink-legacy. There is no schema for raw netlink, yet, I haven't found the time.. I don't know enough jsonschema to do inheritance or something but the repetition is not too bad. I hope. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Jakub Kicinski	9d6a65079c	docs: add more netlink docs (incl. spec docs) Add documentation about the upcoming Netlink protocol specs. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:58:11 +01:00
Paolo Abeni	d961bee454	Merge branch 'net-sched-use-the-backlog-for-nested-mirred-ingress' Davide Caratti says: ==================== net/sched: use the backlog for nested mirred ingress TC mirred has a protection against excessive stack growth, but that protection doesn't really guarantee the absence of recursion, nor it guards against loops. Patch 1/2 rewords "recursion" to "nesting" to make this more clear. We can leverage on this existing mechanism to prevent TCP / SCTP from doing soft lock-up in some specific scenarios that uses mirred egress->ingress: patch 2 changes mirred so that the networking backlog is used for nested mirred ingress actions. ==================== Link: https://lore.kernel.org/r/cover.1674233458.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:30:56 +01:00
Davide Caratti	ca22da2fbd	act_mirred: use the backlog for nested calls to mirred ingress William reports kernel soft-lockups on some OVS topologies when TC mirred egress->ingress action is hit by local TCP traffic [1]. The same can also be reproduced with SCTP (thanks Xin for verifying), when client and server reach themselves through mirred egress to ingress, and one of the two peers sends a "heartbeat" packet (from within a timer). Enqueueing to backlog proved to fix this soft lockup; however, as Cong noticed [2], we should preserve - when possible - the current mirred behavior that counts as "overlimits" any eventual packet drop subsequent to the mirred forwarding action [3]. A compromise solution might use the backlog only when tcf_mirred_act() has a nest level greater than one: change tcf_mirred_forward() accordingly. Also, add a kselftest that can reproduce the lockup and verifies TC mirred ability to account for further packet drops after TC mirred egress->ingress (when the nest level is 1). [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/ [3] such behavior is not guaranteed: for example, if RPS or skb RX timestamping is enabled on the mirred target device, the kernel can defer receiving the skb and return NET_RX_SUCCESS inside tcf_mirred_forward(). Reported-by: William Zhao <wizhao@redhat.com> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:30:54 +01:00
Davide Caratti	78dcdffe04	net/sched: act_mirred: better wording on protection against excessive stack growth with commit `e2ca070f89` ("net: sched: protect against stack overflow in TC act_mirred"), act_mirred protected itself against excessive stack growth using per_cpu counter of nested calls to tcf_mirred_act(), and capping it to MIRRED_RECURSION_LIMIT. However, such protection does not detect recursion/loops in case the packet is enqueued to the backlog (for example, when the mirred target device has RPS or skb timestamping enabled). Change the wording from "recursion" to "nesting" to make it more clear to readers. CC: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:30:54 +01:00
Paolo Abeni	5cf6c22b5b	Merge branch 'fix-cpts-release-action-in-am65-cpts-driver' Siddharth Vadapalli says: ==================== Fix CPTS release action in am65-cpts driver Delete unreachable code in am65_cpsw_init_cpts() function, which was Reported-by: Leon Romanovsky <leon@kernel.org> at: https://lore.kernel.org/r/Y8aHwSnVK9+sAb24@unreal Remove the devm action associated with am65_cpts_release() and invoke the function directly on the cleanup and exit paths. v4: https://lore.kernel.org/r/20230120044201.357950-1-s-vadapalli@ti.com/ v3: https://lore.kernel.org/r/20230118095439.114222-1-s-vadapalli@ti.com/ v2: https://lore.kernel.org/r/20230116044517.310461-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20230113104816.132815-1-s-vadapalli@ti.com/ ==================== Link: https://lore.kernel.org/r/20230120070731.383729-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-24 10:08:54 +01:00

1 2 3 4 5 ...

1155093 commits