linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-08-22 00:40:03 +00:00

Author	SHA1	Message	Date
Daniel Borkmann	39c536acc3	Merge branch 'xdp-ice-mbuf' Alexander Lobakin says: ==================== The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES when the ice-backed device is a sender. Initially there were around 3.3 Mpps / thread, while I have 5.5 on skb-based pktgen ... After fixing 0005 (0004 is a prereq for it) first (strange thing nobody noticed that earlier), I started catching random OOMs. This is how 0002 (and partially 0001) appeared. 0003 is a suggestion from Maciej to not waste time on refactoring dead lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps. 4.5 of 6 are fixes, but only the first three are tagged, since it then starts being tricky. I may backport them manually later on. TL;DR for the series is that shortcuts are good, but only as long as they don't make the driver miss important things. %XDP_TX is purely driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions can be unsafe there. With that series and also one core code patch[0], "live frames" and xdp-trafficgen are now safe'n'fast on ice (probably more to come). [0] https://lore.kernel.org/all/20230209172827.874728-1-alexandr.lobakin@intel.com ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2023-02-13 19:13:18 +01:00
Alexander Lobakin	ad07f29b9c	ice: Micro-optimize .ndo_xdp_xmit() path After the recent mbuf changes, ice_xmit_xdp_ring() became a 3-liner. It makes no sense to keep it global in a different file than its caller. Move it just next to the sole call site and mark static. Also, it doesn't need a full xdp_convert_frame_to_buff(). Save several cycles and fill only the fields used by __ice_xmit_xdp_ring() later on. Finally, since it doesn't modify @xdpf anyhow, mark the argument const to save some more (whole -11 bytes of .text! :D). Thanks to 1 jump less and less calcs as well, this yields as many as 6.7 Mpps per queue. `xdp.data_hard_start = xdpf` is fully intentional again (see xdp_convert_buff_to_frame()) and just works when there are no source device's driver issues. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-7-alexandr.lobakin@intel.com	2023-02-13 19:13:13 +01:00
Alexander Lobakin	055d092068	ice: Fix freeing XDP frames backed by Page Pool As already mentioned, freeing any &xdp_frame via page_frag_free() is wrong, as it assumes the frame is backed by either an order-0 page or a page with no "patrons" behind them, while in fact frames backed by Page Pool can be redirected to a device, which's driver doesn't use it. Keep storing a pointer to the raw buffer and then freeing it unconditionally via page_frag_free() for %XDP_TX frames, but introduce a separate type in the enum for frames coming through .ndo_xdp_xmit(), and free them via xdp_return_frame_bulk(). Note that saving xdpf as xdp_buff->data_hard_start is intentional and is always true when everything is configured properly. After this change, %XDP_REDIRECT from a Page Pool based driver to ice becomes zero-alloc as it should be and horrendous 3.3 Mpps / queue turn into 6.6, hehe. Let it go with no "Fixes:" tag as it spans across good 5+ commits and can't be trivially backported. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-6-alexandr.lobakin@intel.com	2023-02-13 19:13:12 +01:00
Alexander Lobakin	aa1d3faf71	ice: Robustify cleaning/completing XDP Tx buffers When queueing frames from a Page Pool for redirecting to a device backed by the ice driver, `perf top` shows heavy load on page_alloc() and page_frag_free(), despite that on a properly working system it must be fully or at least almost zero-alloc. The problem is in fact a bit deeper and raises from how ice cleans up completed Tx buffers. The story so far: when cleaning/freeing the resources related to a particular completed Tx frame (skbs, DMA mappings etc.), ice uses some heuristics only without setting any type explicitly (except for dummy Flow Director packets, which are marked via ice_tx_buf::tx_flags). This kinda works, but only up to some point. For example, currently ice assumes that each frame coming to __ice_xmit_xdp_ring(), is backed by either plain order-0 page or plain page frag, while it may also be backed by Page Pool or any other possible memory models introduced in future. This means any &xdp_frame must be freed properly via xdp_return_frame() family with no assumptions. In order to do that, the whole heuristics must be replaced with setting the Tx buffer/frame type explicitly, just how it's always been done via an enum. Let us reuse 16 bits from ::tx_flags -- 1 bit-and instr won't hurt much -- especially given that sometimes there was a check for %ICE_TX_FLAGS_DUMMY_PKT, which is now turned from a flag to an enum member. The rest of the changes is straightforward and most of it is just a conversion to rely now on the type set in &ice_tx_buf rather than to some secondary properties. For now, no functional changes intended, the change only prepares the ground for starting freeing XDP frames properly next step. And it must be done atomically/synchronously to not break stuff. Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-5-alexandr.lobakin@intel.com	2023-02-13 19:13:12 +01:00
Alexander Lobakin	923096b5ce	ice: Remove two impossible branches on XDP Tx cleaning The tagged commit started sending %XDP_TX frames from XSk Rx ring directly without converting it to an &xdp_frame. However, when XSk is enabled on a queue pair, it has its separate Tx cleaning functions, so neither ice_clean_xdp_irq() nor ice_unmap_and_free_tx_buf() ever happens there. Remove impossible branches in order to reduce the diffstat of the upcoming change. Fixes: `a24b4c6e9a` ("ice: xsk: Do not convert to buff to frame for XDP_TX") Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-4-alexandr.lobakin@intel.com	2023-02-13 19:13:12 +01:00
Alexander Lobakin	0bd939b60c	ice: Fix XDP Tx ring overrun Sometimes, under heavy XDP Tx traffic, e.g. when using XDP traffic generator (%BPF_F_TEST_XDP_LIVE_FRAMES), the machine can catch OOM due to the driver not freeing all of the pages passed to it by .ndo_xdp_xmit(). Turned out that during the development of the tagged commit, the check, which ensures that we have a free descriptor to queue a frame, moved into the branch happening only when a buffer has frags. Otherwise, we only run a cleaning cycle, but don't check anything. ATST, there can be situations when the driver gets new frames to send, but there are no buffers that can be cleaned/completed and the ring has no free slots. It's very rare, but still possible (> 6.5 Mpps per ring). The driver then fills the next buffer/descriptor, effectively overwriting the data, which still needs to be freed. Restore the check after the cleaning routine to make sure there is a slot to queue a new frame. When there are frags, there still will be a separate check that we can place all of them, but if the ring is full, there's no point in wasting any more time. (minor: make `!ready_frames` unlikely since it happens ~1-2 times per billion of frames) Fixes: `3246a10752` ("ice: Add support for XDP multi-buffer on Tx side") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-3-alexandr.lobakin@intel.com	2023-02-13 19:13:12 +01:00
Alexander Lobakin	bc4db83470	ice: fix ice_tx_ring:: Xdp_tx_active underflow xdp_tx_active is used to indicate whether an XDP ring has any %XDP_TX frames queued to shortcut processing Tx cleaning for XSk-enabled queues. When !XSk, it simply indicates whether the ring has any queued frames in general. It gets increased on each frame placed onto the ring and counts the whole frame, not each frag. However, currently it gets decremented in ice_clean_xdp_tx_buf(), which is called per each buffer, i.e. per each frag. Thus, on completing multi-frag frames, an underflow happens. Move the decrement to the outer function and do it once per frame, not buf. Also, do that on the stack and update the ring counter after the loop is done to save several cycles. XSk rings are fine since there are no frags at the moment. Fixes: `3246a10752` ("ice: Add support for XDP multi-buffer on Tx side") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20230210170618.1973430-2-alexandr.lobakin@intel.com	2023-02-13 19:13:12 +01:00
Ilya Leoshkevich	0b07572447	selftests/bpf: Fix out-of-srctree build Building BPF selftests out of srctree fails with: make: *** No rule to make target '/linux-build//ima_setup.sh', needed by 'ima_setup.sh'. Stop. The culprit is the rule that defines convenient shorthands like "make test_progs", which builds $(OUTPUT)/test_progs. These shorthands make sense only for binaries that are built though; scripts that live in the source tree do not end up in $(OUTPUT). Therefore drop $(TEST_PROGS) and $(TEST_PROGS_EXTENDED) from the rule. The issue exists for a while, but it became a problem only after commit `d68ae4982c` ("selftests/bpf: Install all required files to run selftests"), which added dependencies on these scripts. Fixes: `03dcb78460` ("selftests/bpf: Add simple per-test targets to Makefile") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230208231211.283606-1-iii@linux.ibm.com	2023-02-13 18:14:50 +01:00
Alan Maguire	0243d3dfe2	bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25 v1.25 of pahole supports filtering out functions with multiple inconsistent function prototypes or optimized-out parameters from the BTF representation. These present problems because there is no additional info in BTF saying which inconsistent prototype matches which function instance to help guide attachment, and functions with optimized-out parameters can lead to incorrect assumptions about register contents. So for now, filter out such functions while adding BTF representations for functions that have "."-suffixes (foo.isra.0) but not optimized-out parameters. This patch assumes that below linked changes land in pahole for v1.25. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/1675790102-23037-1-git-send-email-alan.maguire@oracle.com Link: https://lore.kernel.org/bpf/1675949331-27935-1-git-send-email-alan.maguire@oracle.com	2023-02-13 17:50:39 +01:00
Alexei Starovoitov	ab86cf337a	Merge branch 'bpf, mm: introduce cgroup.memory=nobpf' Yafang Shao says: ==================== The bpf memory accouting has some known problems in contianer environment, - The container memory usage is not consistent if there's pinned bpf program After the container restart, the leftover bpf programs won't account to the new generation, so the memory usage of the container is not consistent. This issue can be resolved by introducing selectable memcg, but we don't have an agreement on the solution yet. See also the discussions at https://lwn.net/Articles/905150/ . - The leftover non-preallocated bpf map can't be limited The leftover bpf map will be reparented, and thus it will be limited by the parent, rather than the container itself. Furthermore, if the parent is destroyed, it be will limited by its parent's parent, and so on. It can also be resolved by introducing selectable memcg. - The memory dynamically allocated in bpf prog is charged into root memcg only Nowdays the bpf prog can dynamically allocate memory, for example via bpf_obj_new(), but it only allocate from the global bpf_mem_alloc pool, so it will charge into root memcg only. That needs to be addressed by a new proposal. So let's give the container user an option to disable bpf memory accouting. The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ Changes, v1->v2: - squash patches (Roman) - commit log improvement in patch #2. (Johannes) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 19:00:13 -08:00
Yafang Shao	bf39650824	bpf: allow to disable bpf prog memory accounting We can simply disable the bpf prog memory accouting by not setting the GFP_ACCOUNT. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-5-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:59:57 -08:00
Yafang Shao	ee53cbfb1e	bpf: allow to disable bpf map memory accounting We can simply set root memcg as the map's memcg to disable bpf memory accounting. bpf_map_area_alloc is a little special as it gets the memcg from current rather than from the map, so we need to disable GFP_ACCOUNT specifically for it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:59:57 -08:00
Yafang Shao	ddef81b5fd	bpf: use bpf_map_kvcalloc in bpf_local_storage Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:59:56 -08:00
Yafang Shao	b6c1a8af5b	mm: memcontrol: add new kernel parameter cgroup.memory=nobpf Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf memory accounting. This is a preparation for the followup patch. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:59:56 -08:00
Daniel Borkmann	7e2a9ebe81	docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changes Given BPF is increasingly being used beyond just the Linux kernel, with implementations in NICs and other hardware, Windows, etc, there is an ongoing effort to document and standardize parts of the existing BPF infrastructure such as its ISA. As "source of truth" we decided some time ago to rely on the in-tree documentation, in particular, starting out with the Documentation/bpf/instruction-set.rst as a base for later RFC drafts on the ISA. Therefore, we want to ensure that changes to that document have bpf@ietf.org in Cc, so add a MAINTAINERS file entry with a section on documents related to standardization efforts. For now, this only relates to instruction-set.rst, and later additional files will be added. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Thaler <dthaler@microsoft.com> Cc: bpf@ietf.org Link: https://datatracker.ietf.org/doc/bofreq-thaler-bpf-ebpf/ Link: https://lore.kernel.org/r/57619c0dd8e354d82bf38745f99405e3babdc970.1676068387.git.daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:58:21 -08:00
Jakub Kicinski	de42873367	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+bZrwAKCRDbK58LschI gzi4AP4+TYo0jnSwwkrOoN9l4f5VO9X8osmj3CXfHBv7BGWVxAD/WnvA3TDZyaUd agIZTkRs6BHF9He8oROypARZxTeMLwM= =nO1C -----END PGP SIGNATURE----- Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit `5b246e533d` ("ice: split probe into smaller functions") from the net-next tree and commit `66c0e13ad2` ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC \| NETDEV_XDP_ACT_REDIRECT \| NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 17:51:27 -08:00
Randy Dunlap	d12f9ad028	Documentation: isdn: correct spelling Correct spelling problems for Documentation/isdn/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230209071400.31476-9-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:28:13 -08:00
Jakub Kicinski	33c6ce4a4c	Merge branch 'net-move-more-duplicate-code-of-ovs-and-tc-conntrack-into-nf_conntrack_ovs' Xin Long says: ==================== net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs We've moved some duplicate code into nf_nat_ovs in: "net: eliminate the duplicate code in the ct nat functions of ovs and tc" This patchset addresses more code duplication in the conntrack of ovs and tc then creates nf_conntrack_ovs for them, and four functions will be extracted and moved into it: nf_ct_handle_fragments() nf_ct_skb_network_trim() nf_ct_helper() nf_ct_add_helper() ==================== Link: https://lore.kernel.org/r/cover.1675810210.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:05 -08:00
Xin Long	0785407e78	net: extract nf_ct_handle_fragments to nf_conntrack_ovs Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:03 -08:00
Xin Long	558d95e7e1	net: sched: move frag check and tc_skb_cb update out of handle_fragments This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:03 -08:00
Xin Long	1b83bf4489	openvswitch: move key and ovs_cb update out of handle_fragments This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:03 -08:00
Xin Long	67fc5d7ffb	net: extract nf_ct_skb_network_trim function to nf_conntrack_ovs There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:03 -08:00
Xin Long	c0c3ab63de	net: create nf_conntrack_ovs for ovs and tc use Similar to nf_nat_ovs created by Commit `ebddb14049` ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-10 16:23:03 -08:00
Ilya Leoshkevich	17bcd27a08	libbpf: Fix alen calculation in libbpf_nla_dump_errormsg() The code assumes that everything that comes after nlmsgerr are nlattrs. When calculating their size, it does not account for the initial nlmsghdr. This may lead to accessing uninitialized memory. Fixes: `bbf48c18ee` ("libbpf: add error reporting in XDP") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-8-iii@linux.ibm.com	2023-02-10 15:27:22 -08:00
Ilya Leoshkevich	202702e890	selftests/bpf: Attach to fopen()/fclose() in attach_probe malloc() and free() may be completely replaced by sanitizers, use fopen() and fclose() instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-7-iii@linux.ibm.com	2023-02-10 15:21:27 -08:00
Ilya Leoshkevich	907300c7a6	selftests/bpf: Attach to fopen()/fclose() in uprobe_autoattach malloc() and free() may be completely replaced by sanitizers, use fopen() and fclose() instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-6-iii@linux.ibm.com	2023-02-10 15:21:27 -08:00
Ilya Leoshkevich	24a87b477c	selftests/bpf: Forward SAN_CFLAGS and SAN_LDFLAGS to runqslower and libbpf To get useful results from the Memory Sanitizer, all code running in a process needs to be instrumented. When building tests with other sanitizers, it's not strictly necessary, but is also helpful. So make sure runqslower and libbpf are compiled with SAN_CFLAGS and linked with SAN_LDFLAGS. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-5-iii@linux.ibm.com	2023-02-10 15:21:27 -08:00
Ilya Leoshkevich	0589d16475	selftests/bpf: Split SAN_CFLAGS and SAN_LDFLAGS Memory Sanitizer requires passing different options to CFLAGS and LDFLAGS: besides the mandatory -fsanitize=memory, one needs to pass header and library paths, and passing -L to a compilation step triggers -Wunused-command-line-argument. So introduce a separate variable for linker flags. Use $(SAN_CFLAGS) as a default in order to avoid complicating the ASan usage. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-4-iii@linux.ibm.com	2023-02-10 15:21:27 -08:00
Ilya Leoshkevich	585bf4640e	tools: runqslower: Add EXTRA_CFLAGS and EXTRA_LDFLAGS support This makes it possible to add sanitizer flags. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-3-iii@linux.ibm.com	2023-02-10 15:21:26 -08:00
Ilya Leoshkevich	795deb3f97	selftests/bpf: Quote host tools Using HOSTCC="ccache clang" breaks building the tests, since, when it's forwarded to e.g. bpftool, the child make sees HOSTCC=ccache and "clang" is considered a target. Fix by quoting it, and also HOSTLD and HOSTAR for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-2-iii@linux.ibm.com	2023-02-10 15:21:26 -08:00
Jakub Kicinski	025a785ff0	net: skbuff: drop the word head from skb cache skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 09:10:28 +00:00
David S. Miller	01a7ee36e7	Merge branch 'net-ipa-GSI' Alex Elder says: ==================== net: ipa: prepare for GSI register updtaes An upcoming series (or two) will convert the definitions of GSI registers used by IPA so they use the "IPA reg" mechanism to specify register offsets and their fields. This will simplify implementing the fairly large number of changes required in GSI registers to support more than 32 GSI channels (introduced in IPA v5.0). A few minor problems and inconsistencies were found, and they're fixed here. The last three patches in this series change the "ipa_reg" code to separate the IPA-specific part (the base virtual address, basically) from the generic register part, and the now- generic code is renamed to use just "reg_" or "REG_" as a prefix rather than "ipa_reg" or "IPA_REG_". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:33 +00:00
Alex Elder	f1470fd790	net: ipa: generalize register field functions Rename functions related to register fields so they don't appear to be IPA-specific, and move their definitions into "reg.h": ipa_reg_fmask() -> reg_fmask() ipa_reg_bit() -> reg_bit() ipa_reg_field_max() -> reg_field_max() ipa_reg_encode() -> reg_encode() ipa_reg_decode() -> reg_decode() Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	fc4cecf706	net: ipa: generalize register offset functions Rename ipa_reg_offset() to be reg_offset() and move its definition to "reg.h". Rename ipa_reg_n_offset() to be reg_n_offset() also. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	81772e444d	net: ipa: start generalizing "ipa_reg" IPA register definitions have evolved with each new version. The changes required to support more than 32 endpoints in IPA v5.0 made it best to define a unified mechanism for defining registers and their fields. GSI register definitions, meanwhile, have remained fairly stable. And even as the total number of IPA endpoints goes beyond 32, the number of GSI channels on a given EE that underly endpoints still remains 32 or less. Despite that, GSI v3.0 (which is used with IPA v5.0) extends the number of channels (and events) it supports to be about 256, and as a result, many GSI register definitions must change significantly. To address this, we'll use the same "ipa_reg" mechanism to define the GSI registers. As a first step in generalizing the "ipa_reg" to also support GSI registers, isolate the definitions of the "ipa_reg" and "ipa_regs" structure types (and some supporting macros) into a new header file, and remove the "ipa_" and "IPA_" from symbol names. Separate the IPA register ID validity checking from the generic check that a register ID is in range. Aside from that, this is intended to have no functional effect on the code. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	0ec573ef2a	net: ipa: GSI register cleanup Move some static inline function definitions out of "gsi_reg.h" and into "gsi.c", which is the only place they're used. Rename them so their names identify the register they're associated with. Move the gsi_channel_type enumerated type definition below the offset and field definitions for the CH_C_CNTXT_0 register where it's used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	c5ebba75c7	net: ipa: use bitmasks for GSI IRQ values There are seven GSI interrupt types that can be signaled by a single GSI IRQ. These are represented in a bitmask, and the gsi_irq_type_id enumerated type defines what each bit position represents. Similarly, the global and general GSI interrupt types each has a set of conditions it signals, and both types have an enumerated type that defines which bit that represents each condition. When used, these enumerated values are passed as an argument to BIT() in all cases. So clean up the code a little bit by defining the enumerated type values as one-bit masks rather than bit positions. Rename gsi_general_id to be gsi_general_irq_id for consistency. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	d86603e940	net: ipa: tighten up IPA register validity checking When checking the validity of an IPA register ID, compare it against all possible ipa_reg_id values. Rename the function ipa_reg_id_valid() to be specific about what's being checked. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:32 +00:00
Alex Elder	3aac8ec1c0	net: ipa: add some new IPA versions Soon IPA v5.0+ will be supported, and when that happens we will be able to enable support for the SDX65 (IPA v5.0), SM8450 (IPA v5.1), and SM8550 (IPA v5.5). Fix the comment about the GSI version used for IPA v3.1. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:31 +00:00
Alex Elder	38028e6f39	net: ipa: get rid of ipa->reg_addr The reg_addr field in the IPA structure is set but never used. Get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:31 +00:00
Alex Elder	2df181f09c	net: ipa: generic command param fix Starting at IPA v4.11, the GSI_GENERIC_COMMAND GSI register got a new PARAMS field. The code that encodes a value into that field sets it unconditionally, which is wrong. We currently only provide 0 as the field's value, so this error has no real effect. Still, it's a bug, so let's fix it. Fix an (unrelated) incorrect comment as well. Fields in the ERROR_LOG GSI register actually are defined for IPA versions prior to v3.5.1. Fixes: `fe68c43ce3` ("net: ipa: support enhanced channel flow control") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:06:31 +00:00
Horatiu Vultur	47400aaea4	net: microchip: vcap: Add tc flower keys for lan966x Add the following TC flower filter keys to lan966x for IS2: - ipv4_addr (sip and dip) - ipv6_addr (sip and dip) - control (IPv4 fragments) - portnum (tcp and udp port numbers) - basic (L3 and L4 protocol) - vlan (outer vlan tag info) - tcp (tcp flags) - ip (tos field) As the parsing of these keys is similar between lan966x and sparx5, move the code in a separate file to be shared by these 2 chips. And put the specific parsing outside of the common functions. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:03:09 +00:00
David S. Miller	21119e2c6e	rxrpc development -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmPjdqgACgkQ+7dXa6fL C2tbWw/9F6e5SAUKDMTk0LafoGweg/4/cUqAyVXib379RO3XesYPrtcSJ4qhd7P+ F+VA5xCHA8uGRpLVgkDtsgJuuEqcHbxHFN2dBiVLWnH+xnDXc9qAORG9P+M3QLsX ev5zTOXFVkcSFo7ZZjwuC2aTLdR3+z2/apFwtqoxIOOs9SCg+JonZDXDod2nf4Qn B8SB03kz+qQTYslaaYiCnffK25V0fPQDfAQy+WpOw+klPGHLuYMjZnTh3l++Qs7J A1RlZ5t15MpSP73KhKLV3uvmulnXhCi4AwLlLrTcv/ET8+xb78umht6Mz8oxx4Dq lDko3TWeGIrKkNKYmStFo8tYp2PCZgam78F+pC2NtrW1IIKBUYI2Vw3+qAn3XKFi 4UVN7ClRam0ROgRmwZwSt81Abbf2+qioBBSksXlDywbfaX/1HYVha2Ep/gI/YTcB QVs3acc3rtOsplLjHDdHK1p7l8tDEjPJPiW36L8Gwi7JhcC8YAP/n8oQm/fNi0eK nnK9VCbEoIpKi/8psdC4/BRI4LlGJ/KgLrj3uR5sx/Zbs3lIw2gqJOVltwEKjkBt YlnsH+gStbQ1I+tq8icSLNw45f/+WCgf3A4/eNjGHZnGJsmo4lXPGq8+5ZxboAhe CJ1/vSSuKA8jsgJNDlBhwFJenILJBUvw4FlP9JfFi3qqhOUO5/I= =1nmF -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20230208' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc development Here are some miscellaneous changes for rxrpc: (1) Use consume_skb() rather than kfree_skb_reason(). (2) Fix unnecessary waking when poking and already-poked call. (3) Add ack.rwind to the rxrpc_tx_ack tracepoint as this indicates how many incoming DATA packets we're telling the peer that we are currently willing to accept on this call. (4) Reduce duplicate ACK transmission. We send ACKs to let the peer know that we're increasing the receive window (ack.rwind) as we consume packets locally. Normal ACK transmission is triggered in three places and that leads to duplicates being sent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-10 08:00:05 +00:00
Clément Léger	dc8c413201	net: pcs: rzn1-miic: remove unused struct members and use miic variable Remove unused bulk clocks struct from the miic state and use an already existing miic variable in miic_config(). Signed-off-by: Clément Léger <clement.leger@bootlin.com> Link: https://lore.kernel.org/r/20230208161249.329631-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:47:16 -08:00
Andy Shevchenko	5c72b4c644	openvswitch: Use string_is_terminated() helper Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:30:24 -08:00
Andy Shevchenko	d4545bf9c3	genetlink: Use string_is_terminated() helper Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:30:24 -08:00
Andy Shevchenko	f1db99c07b	string_helpers: Move string_is_valid() to the header Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:30:24 -08:00
Horatiu Vultur	a136391ae4	net: micrel: Cable Diagnostics feature for lan8841 PHY Add support for cable diagnostics in lan8841 PHY. It has the same registers layout as lan8814 PHY, therefore reuse the functionality. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230208114406.1666671-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:29:52 -08:00
Huanhuan Wang	436396f26d	nfp: support IPsec offloading for NFP3800 Add IPsec offloading support for NFP3800. Include data plane and control plane. Data plane: add IPsec packet process flow in NFP3800 datapath (NFDk). Control plane: add an algorithm support distinction flow in xfrm hook function xdo_dev_state_add(), as NFP3800 has a different set of IPsec algorithm support. This matches existing support for the NFP6000/NFP4000 and their NFD3 datapath. In addition, fixup the md_bytes calculation for NFD3 datapath to make sure the two datapahts are keept in sync. Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230208091000.4139974-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 22:22:19 -08:00
Jakub Kicinski	2894d35309	Merge branch 'net-introduce-rps_default_mask' Paolo Abeni says: ==================== net: introduce rps_default_mask Real-time setups try hard to ensure proper isolation between time critical applications and e.g. network processing performed by the network stack in softirq and RPS is used to move the softirq activity away from the isolated core. If the network configuration is dynamic, with netns and devices routinely created at run-time, enforcing the correct RPS setting on each newly created device allowing to transient bad configuration became complex. Additionally, when multi-queue devices are involved, configuring rps in user-space on each queue easily becomes very expensive, e.g. some setups use veths with per cpu queues. These series try to address the above, introducing a new sysctl knob: rps_default_mask. The new sysctl entry allows configuring a netns-wide RPS mask, to be enforced since receive queue creation time without any fourther per device configuration required. Additionally, a simple self-test is introduced to check the rps_default_mask behavior. ==================== Link: https://lore.kernel.org/r/cover.1675789134.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-09 17:45:57 -08:00

1 2 3 4 5 ...

1156816 commits