linux-stable

Commit Graph

Author	SHA1	Message	Date
Daniel T. Lee	e7e6c774f5	samples/bpf: convert to vmlinux.h with tracing programs This commit replaces separate headers with a single vmlinux.h to tracing programs. Thanks to that, we no longer need to define the argument structure for tracing programs directly. For example, argument for the sched_switch tracpepoint (sched_switch_args) can be replaced with the vmlinux.h provided trace_event_raw_sched_switch. Additional defines have been added to the BPF program either directly or through the inclusion of net_shared.h. Defined values are PERF_MAX_STACK_DEPTH, IFNAMSIZ constants and __stringify() macro. This change enables the BPF program to access internal structures with BTF generated "vmlinux.h" header. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-3-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:39:09 -07:00
Daniel T. Lee	34f6e38f58	samples/bpf: fix warning with ignored-attributes Currently, compiling the bpf programs will result the warning with the ignored attribute as follows. This commit fixes the warning by adding cf-protection option. In file included from ./arch/x86/include/asm/linkage.h:6: ./arch/x86/include/asm/ibt.h:77:8: warning: 'nocf_check' attribute ignored; use -fcf-protection to enable the attribute [-Wignored-attributes] extern __noendbr u64 ibt_save(bool disable); ^ ./arch/x86/include/asm/ibt.h:32:34: note: expanded from macro '__noendbr' ^ Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230818090119.477441-2-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:39:09 -07:00
Alexei Starovoitov	5bebd3e3a3	Merge branch 'remove-unnecessary-synchronizations-in-cpumap' Hou Tao says: ==================== Remove unnecessary synchronizations in cpumap From: Hou Tao <houtao1@huawei.com> Hi, This is the formal patchset to remove unnecessary synchronizations in cpu-map after address comments and collect Rvb tags from Toke Høiland-Jørgensen (Big thanks to Toke). Patch #1 removes the unnecessary rcu_barrier() when freeing bpf_cpu_map_entry and replaces it by queue_rcu_work(). Patch #2 removes the unnecessary call_rcu() and queue_work() when destroying cpu-map and does the freeing directly. Test the patchset by using xdp_redirect_cpu and virtio-net. Both xdp-mode and skb-mode have been exercised and no issues were reported. As ususal, comments and suggestions are always welcome. Change Log: v1: * address comments from Toke Høiland-Jørgensen * add Rvb tags from Toke Høiland-Jørgensen * update outdated comment in cpu_map_delete_elem() RFC: https://lore.kernel.org/bpf/20230728023030.1906124-1-houtao@huaweicloud.com ==================== Link: https://lore.kernel.org/r/20230816045959.358059-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:21:16 -07:00
Hou Tao	c2e42ddf26	bpf, cpumask: Clean up bpf_cpu_map_entry directly in cpu_map_free After synchronous_rcu(), both the dettached XDP program and xdp_do_flush() are completed, and the only user of bpf_cpu_map_entry will be cpu_map_kthread_run(), so instead of calling __cpu_map_entry_replace() to stop kthread and cleanup entry after a RCU grace period, do these things directly. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:21:16 -07:00
Hou Tao	8f8500a247	bpf, cpumap: Use queue_rcu_work() to remove unnecessary rcu_barrier() As for now __cpu_map_entry_replace() uses call_rcu() to wait for the inflight xdp program to exit the RCU read critical section, and then launch kworker cpu_map_kthread_stop() to call kthread_stop() to flush all pending xdp frames or skbs. But it is unnecessary to use rcu_barrier() in cpu_map_kthread_stop() to wait for the completion of __cpu_map_entry_free(), because rcu_barrier() will wait for all pending RCU callbacks and cpu_map_kthread_stop() only needs to wait for the completion of a specific __cpu_map_entry_free(). So use queue_rcu_work() to replace call_rcu(), schedule_work() and rcu_barrier(). queue_rcu_work() will queue a __cpu_map_entry_free() kworker after a RCU grace period. Because __cpu_map_entry_free() is running in a kworker context, so it is OK to do all of these freeing procedures include kthread_stop() in it. After the update, there is no need to do reference-counting for bpf_cpu_map_entry, because bpf_cpu_map_entry is freed directly in __cpu_map_entry_free(), so just remove it. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20230816045959.358059-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-08-21 15:21:16 -07:00
Yonghong Song	0a55264cf9	selftests/bpf: Fix a selftest compilation error When building the kernel and selftest with clang compiler (llvm17 or llvm18), I hit the following compilation failure: In file included from progs/test_lwt_redirect.c:3: In file included from /usr/include/linux/ip.h:21: In file included from /usr/include/asm/byteorder.h:5: In file included from /usr/include/linux/byteorder/little_endian.h:13: /usr/include/linux/swab.h:136:8: error: unknown type name '__always_inline' 136 \| static __always_inline unsigned long __swab(const unsigned long y) \| ^ /usr/include/linux/swab.h:171:8: error: unknown type name '__always_inline' 171 \| static __always_inline __u16 __swab16p(const __u16 *p) ... bpf_helpers.h file provided a definition for __always_inline. Putting 'ip.h' after 'bpf_helpers.h' fixed the issue. Fixes: `43a7c3ef8a` ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230818174312.1883381-1-yonghong.song@linux.dev Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-18 12:19:22 -07:00
Dave Marchevsky	63ae8eb2c5	selftests/bpf: Add CO-RE relocs kfunc flavors tests This patch adds selftests that exercise kfunc flavor relocation functionality added in the previous patch. The actual kfunc defined in kernel/bpf/helpers.c is: struct task_struct bpf_task_acquire(struct task_struct p) The following relocation behaviors are checked: struct task_struct bpf_task_acquire___one(struct task_struct name) * Should succeed despite differing param name struct task_struct bpf_task_acquire___two(struct task_struct p, void ctx) Should fail because there is no two-param bpf_task_acquire struct task_struct bpf_task_acquire___three(void ctx) * Should fail because, despite vmlinux's bpf_task_acquire having one param, the types don't match Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230817225353.2570845-2-davemarchevsky@fb.com	2023-08-18 18:12:59 +02:00
Dave Marchevsky	5964a223f5	libbpf: Support triple-underscore flavors for kfunc relocation The function signature of kfuncs can change at any time due to their intentional lack of stability guarantees. As kfuncs become more widely used, BPF program writers will need facilities to support calling different versions of a kfunc from a single BPF object. Consider this simplified example based on a real scenario we ran into at Meta: /* initial kfunc signature / int some_kfunc(void ptr) /* Oops, we need to add some flag to modify behavior. No problem, change the kfunc. flags = 0 retains original behavior / int some_kfunc(void ptr, long flags) If the initial version of the kfunc is deployed on some portion of the fleet and the new version on the rest, a fleetwide service that uses some_kfunc will currently need to load different BPF programs depending on which some_kfunc is available. Luckily CO-RE provides a facility to solve a very similar problem, struct definition changes, by allowing program writers to declare my_struct___old and my_struct___new, with ___suffix being considered a 'flavor' of the non-suffixed name and being ignored by bpf_core_type_exists and similar calls. This patch extends the 'flavor' facility to the kfunc extern relocation process. BPF program writers can now declare extern int some_kfunc___old(void ptr) extern int some_kfunc___new(void ptr, int flags) then test which version of the kfunc exists with bpf_ksym_exists. Relocation and verifier's dead code elimination will work in concert as expected, allowing this pattern: if (bpf_ksym_exists(some_kfunc___old)) some_kfunc___old(ptr); else some_kfunc___new(ptr, 0); Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230817225353.2570845-1-davemarchevsky@fb.com	2023-08-18 18:12:30 +02:00
Helge Deller	b6594a17ec	bpf/tests: Enhance output on error and fix typos If a testcase returns a wrong (unexpected) value, print the expected and returned value in hex notation in addition to the decimal notation. This is very useful in tests which bit-shift hex values left or right and helped me a lot while developing the JIT compiler for the hppa architecture. Additionally fix two typos: dowrd -> dword, tall calls -> tail calls. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/ZN6ZAAVoWZpsD1Jf@p100	2023-08-18 17:08:42 +02:00
Yan Zhai	6c77997bc6	selftests/bpf: Add lwt_xmit tests for BPF_REROUTE There is no lwt test case for BPF_REROUTE yet. Add test cases for both normal and abnormal situations. The abnormal situation is set up with an fq qdisc on the reroute target device. Without proper fixes, overflow this qdisc queue limit (to trigger a drop) would panic the kernel. Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/62c8ddc1e924269dcf80d2e8af1a1e632cee0b3a.1692326837.git.yan@cloudflare.com	2023-08-18 16:05:27 +02:00
Yan Zhai	43a7c3ef8a	selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT There is no lwt_xmit test case for BPF_REDIRECT yet. Add test cases for both normal and abnormal situations. For abnormal test cases, devices are set down or have its carrier set down. Without proper fixes, BPF_REDIRECT to either ingress or egress of such device would panic the kernel. Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/96bf435243641939d9c9da329fab29cb45f7df22.1692326837.git.yan@cloudflare.com	2023-08-18 16:05:26 +02:00
Yan Zhai	a171fbec88	lwt: Check LWTUNNEL_XMIT_CONTINUE strictly LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops. One of the possible errors is to return statuses of dst_output directly. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: `3a0af8fd61` ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/96b939b85eda00e8df4f7c080f770970a4c5f698.1692326837.git.yan@cloudflare.com	2023-08-18 16:05:26 +02:00
Yan Zhai	29b22badb7	lwt: Fix return values of BPF xmit ops BPF encap ops can return different types of positive values, such like NET_RX_DROP, NET_XMIT_CN, NETDEV_TX_BUSY, and so on, from function skb_do_redirect and bpf_lwt_xmit_reroute. At the xmit hook, such return values would be treated implicitly as LWTUNNEL_XMIT_CONTINUE in ip(6)_finish_output2. When this happens, skbs that have been freed would continue to the neighbor subsystem, causing use-after-free bug and kernel crashes. To fix the incorrect behavior, skb_do_redirect return values can be simply discarded, the same as tc-egress behavior. On the other hand, bpf_lwt_xmit_reroute returns useful errors to local senders, e.g. PMTU information. Thus convert its return values to avoid the conflict with LWTUNNEL_XMIT_CONTINUE. Fixes: `3a0af8fd61` ("bpf: BPF for lightweight tunnel infrastructure") Reported-by: Jordan Griege <jgriege@cloudflare.com> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/0d2b878186cfe215fec6b45769c1cd0591d3628d.1692326837.git.yan@cloudflare.com	2023-08-18 16:05:26 +02:00
Xu Kuohai	5f6395fd06	selftests/bpf: Enable cpu v4 tests for arm64 Enable CPU v4 instruction tests for arm64. Below are the test results from BPF test_progs selftests: # ./test_progs -t ldsx_insn,verifier_sdiv,verifier_movsx,verifier_ldsx,verifier_gotol,verifier_bswap #115/1 ldsx_insn/map_val and probed_memory:OK #115/2 ldsx_insn/ctx_member_sign_ext:OK #115/3 ldsx_insn/ctx_member_narrow_sign_ext:OK #115 ldsx_insn:OK #302/1 verifier_bswap/BSWAP, 16:OK #302/2 verifier_bswap/BSWAP, 16 @unpriv:OK #302/3 verifier_bswap/BSWAP, 32:OK #302/4 verifier_bswap/BSWAP, 32 @unpriv:OK #302/5 verifier_bswap/BSWAP, 64:OK #302/6 verifier_bswap/BSWAP, 64 @unpriv:OK #302 verifier_bswap:OK #316/1 verifier_gotol/gotol, small_imm:OK #316/2 verifier_gotol/gotol, small_imm @unpriv:OK #316 verifier_gotol:OK #324/1 verifier_ldsx/LDSX, S8:OK #324/2 verifier_ldsx/LDSX, S8 @unpriv:OK #324/3 verifier_ldsx/LDSX, S16:OK #324/4 verifier_ldsx/LDSX, S16 @unpriv:OK #324/5 verifier_ldsx/LDSX, S32:OK #324/6 verifier_ldsx/LDSX, S32 @unpriv:OK #324/7 verifier_ldsx/LDSX, S8 range checking, privileged:OK #324/8 verifier_ldsx/LDSX, S16 range checking:OK #324/9 verifier_ldsx/LDSX, S16 range checking @unpriv:OK #324/10 verifier_ldsx/LDSX, S32 range checking:OK #324/11 verifier_ldsx/LDSX, S32 range checking @unpriv:OK #324 verifier_ldsx:OK #335/1 verifier_movsx/MOV32SX, S8:OK #335/2 verifier_movsx/MOV32SX, S8 @unpriv:OK #335/3 verifier_movsx/MOV32SX, S16:OK #335/4 verifier_movsx/MOV32SX, S16 @unpriv:OK #335/5 verifier_movsx/MOV64SX, S8:OK #335/6 verifier_movsx/MOV64SX, S8 @unpriv:OK #335/7 verifier_movsx/MOV64SX, S16:OK #335/8 verifier_movsx/MOV64SX, S16 @unpriv:OK #335/9 verifier_movsx/MOV64SX, S32:OK #335/10 verifier_movsx/MOV64SX, S32 @unpriv:OK #335/11 verifier_movsx/MOV32SX, S8, range_check:OK #335/12 verifier_movsx/MOV32SX, S8, range_check @unpriv:OK #335/13 verifier_movsx/MOV32SX, S16, range_check:OK #335/14 verifier_movsx/MOV32SX, S16, range_check @unpriv:OK #335/15 verifier_movsx/MOV32SX, S16, range_check 2:OK #335/16 verifier_movsx/MOV32SX, S16, range_check 2 @unpriv:OK #335/17 verifier_movsx/MOV64SX, S8, range_check:OK #335/18 verifier_movsx/MOV64SX, S8, range_check @unpriv:OK #335/19 verifier_movsx/MOV64SX, S16, range_check:OK #335/20 verifier_movsx/MOV64SX, S16, range_check @unpriv:OK #335/21 verifier_movsx/MOV64SX, S32, range_check:OK #335/22 verifier_movsx/MOV64SX, S32, range_check @unpriv:OK #335/23 verifier_movsx/MOV64SX, S16, R10 Sign Extension:OK #335/24 verifier_movsx/MOV64SX, S16, R10 Sign Extension @unpriv:OK #335 verifier_movsx:OK #347/1 verifier_sdiv/SDIV32, non-zero imm divisor, check 1:OK #347/2 verifier_sdiv/SDIV32, non-zero imm divisor, check 1 @unpriv:OK #347/3 verifier_sdiv/SDIV32, non-zero imm divisor, check 2:OK #347/4 verifier_sdiv/SDIV32, non-zero imm divisor, check 2 @unpriv:OK #347/5 verifier_sdiv/SDIV32, non-zero imm divisor, check 3:OK #347/6 verifier_sdiv/SDIV32, non-zero imm divisor, check 3 @unpriv:OK #347/7 verifier_sdiv/SDIV32, non-zero imm divisor, check 4:OK #347/8 verifier_sdiv/SDIV32, non-zero imm divisor, check 4 @unpriv:OK #347/9 verifier_sdiv/SDIV32, non-zero imm divisor, check 5:OK #347/10 verifier_sdiv/SDIV32, non-zero imm divisor, check 5 @unpriv:OK #347/11 verifier_sdiv/SDIV32, non-zero imm divisor, check 6:OK #347/12 verifier_sdiv/SDIV32, non-zero imm divisor, check 6 @unpriv:OK #347/13 verifier_sdiv/SDIV32, non-zero imm divisor, check 7:OK #347/14 verifier_sdiv/SDIV32, non-zero imm divisor, check 7 @unpriv:OK #347/15 verifier_sdiv/SDIV32, non-zero imm divisor, check 8:OK #347/16 verifier_sdiv/SDIV32, non-zero imm divisor, check 8 @unpriv:OK #347/17 verifier_sdiv/SDIV32, non-zero reg divisor, check 1:OK #347/18 verifier_sdiv/SDIV32, non-zero reg divisor, check 1 @unpriv:OK #347/19 verifier_sdiv/SDIV32, non-zero reg divisor, check 2:OK #347/20 verifier_sdiv/SDIV32, non-zero reg divisor, check 2 @unpriv:OK #347/21 verifier_sdiv/SDIV32, non-zero reg divisor, check 3:OK #347/22 verifier_sdiv/SDIV32, non-zero reg divisor, check 3 @unpriv:OK #347/23 verifier_sdiv/SDIV32, non-zero reg divisor, check 4:OK #347/24 verifier_sdiv/SDIV32, non-zero reg divisor, check 4 @unpriv:OK #347/25 verifier_sdiv/SDIV32, non-zero reg divisor, check 5:OK #347/26 verifier_sdiv/SDIV32, non-zero reg divisor, check 5 @unpriv:OK #347/27 verifier_sdiv/SDIV32, non-zero reg divisor, check 6:OK #347/28 verifier_sdiv/SDIV32, non-zero reg divisor, check 6 @unpriv:OK #347/29 verifier_sdiv/SDIV32, non-zero reg divisor, check 7:OK #347/30 verifier_sdiv/SDIV32, non-zero reg divisor, check 7 @unpriv:OK #347/31 verifier_sdiv/SDIV32, non-zero reg divisor, check 8:OK #347/32 verifier_sdiv/SDIV32, non-zero reg divisor, check 8 @unpriv:OK #347/33 verifier_sdiv/SDIV64, non-zero imm divisor, check 1:OK #347/34 verifier_sdiv/SDIV64, non-zero imm divisor, check 1 @unpriv:OK #347/35 verifier_sdiv/SDIV64, non-zero imm divisor, check 2:OK #347/36 verifier_sdiv/SDIV64, non-zero imm divisor, check 2 @unpriv:OK #347/37 verifier_sdiv/SDIV64, non-zero imm divisor, check 3:OK #347/38 verifier_sdiv/SDIV64, non-zero imm divisor, check 3 @unpriv:OK #347/39 verifier_sdiv/SDIV64, non-zero imm divisor, check 4:OK #347/40 verifier_sdiv/SDIV64, non-zero imm divisor, check 4 @unpriv:OK #347/41 verifier_sdiv/SDIV64, non-zero imm divisor, check 5:OK #347/42 verifier_sdiv/SDIV64, non-zero imm divisor, check 5 @unpriv:OK #347/43 verifier_sdiv/SDIV64, non-zero imm divisor, check 6:OK #347/44 verifier_sdiv/SDIV64, non-zero imm divisor, check 6 @unpriv:OK #347/45 verifier_sdiv/SDIV64, non-zero reg divisor, check 1:OK #347/46 verifier_sdiv/SDIV64, non-zero reg divisor, check 1 @unpriv:OK #347/47 verifier_sdiv/SDIV64, non-zero reg divisor, check 2:OK #347/48 verifier_sdiv/SDIV64, non-zero reg divisor, check 2 @unpriv:OK #347/49 verifier_sdiv/SDIV64, non-zero reg divisor, check 3:OK #347/50 verifier_sdiv/SDIV64, non-zero reg divisor, check 3 @unpriv:OK #347/51 verifier_sdiv/SDIV64, non-zero reg divisor, check 4:OK #347/52 verifier_sdiv/SDIV64, non-zero reg divisor, check 4 @unpriv:OK #347/53 verifier_sdiv/SDIV64, non-zero reg divisor, check 5:OK #347/54 verifier_sdiv/SDIV64, non-zero reg divisor, check 5 @unpriv:OK #347/55 verifier_sdiv/SDIV64, non-zero reg divisor, check 6:OK #347/56 verifier_sdiv/SDIV64, non-zero reg divisor, check 6 @unpriv:OK #347/57 verifier_sdiv/SMOD32, non-zero imm divisor, check 1:OK #347/58 verifier_sdiv/SMOD32, non-zero imm divisor, check 1 @unpriv:OK #347/59 verifier_sdiv/SMOD32, non-zero imm divisor, check 2:OK #347/60 verifier_sdiv/SMOD32, non-zero imm divisor, check 2 @unpriv:OK #347/61 verifier_sdiv/SMOD32, non-zero imm divisor, check 3:OK #347/62 verifier_sdiv/SMOD32, non-zero imm divisor, check 3 @unpriv:OK #347/63 verifier_sdiv/SMOD32, non-zero imm divisor, check 4:OK #347/64 verifier_sdiv/SMOD32, non-zero imm divisor, check 4 @unpriv:OK #347/65 verifier_sdiv/SMOD32, non-zero imm divisor, check 5:OK #347/66 verifier_sdiv/SMOD32, non-zero imm divisor, check 5 @unpriv:OK #347/67 verifier_sdiv/SMOD32, non-zero imm divisor, check 6:OK #347/68 verifier_sdiv/SMOD32, non-zero imm divisor, check 6 @unpriv:OK #347/69 verifier_sdiv/SMOD32, non-zero reg divisor, check 1:OK #347/70 verifier_sdiv/SMOD32, non-zero reg divisor, check 1 @unpriv:OK #347/71 verifier_sdiv/SMOD32, non-zero reg divisor, check 2:OK #347/72 verifier_sdiv/SMOD32, non-zero reg divisor, check 2 @unpriv:OK #347/73 verifier_sdiv/SMOD32, non-zero reg divisor, check 3:OK #347/74 verifier_sdiv/SMOD32, non-zero reg divisor, check 3 @unpriv:OK #347/75 verifier_sdiv/SMOD32, non-zero reg divisor, check 4:OK #347/76 verifier_sdiv/SMOD32, non-zero reg divisor, check 4 @unpriv:OK #347/77 verifier_sdiv/SMOD32, non-zero reg divisor, check 5:OK #347/78 verifier_sdiv/SMOD32, non-zero reg divisor, check 5 @unpriv:OK #347/79 verifier_sdiv/SMOD32, non-zero reg divisor, check 6:OK #347/80 verifier_sdiv/SMOD32, non-zero reg divisor, check 6 @unpriv:OK #347/81 verifier_sdiv/SMOD64, non-zero imm divisor, check 1:OK #347/82 verifier_sdiv/SMOD64, non-zero imm divisor, check 1 @unpriv:OK #347/83 verifier_sdiv/SMOD64, non-zero imm divisor, check 2:OK #347/84 verifier_sdiv/SMOD64, non-zero imm divisor, check 2 @unpriv:OK #347/85 verifier_sdiv/SMOD64, non-zero imm divisor, check 3:OK #347/86 verifier_sdiv/SMOD64, non-zero imm divisor, check 3 @unpriv:OK #347/87 verifier_sdiv/SMOD64, non-zero imm divisor, check 4:OK #347/88 verifier_sdiv/SMOD64, non-zero imm divisor, check 4 @unpriv:OK #347/89 verifier_sdiv/SMOD64, non-zero imm divisor, check 5:OK #347/90 verifier_sdiv/SMOD64, non-zero imm divisor, check 5 @unpriv:OK #347/91 verifier_sdiv/SMOD64, non-zero imm divisor, check 6:OK #347/92 verifier_sdiv/SMOD64, non-zero imm divisor, check 6 @unpriv:OK #347/93 verifier_sdiv/SMOD64, non-zero imm divisor, check 7:OK #347/94 verifier_sdiv/SMOD64, non-zero imm divisor, check 7 @unpriv:OK #347/95 verifier_sdiv/SMOD64, non-zero imm divisor, check 8:OK #347/96 verifier_sdiv/SMOD64, non-zero imm divisor, check 8 @unpriv:OK #347/97 verifier_sdiv/SMOD64, non-zero reg divisor, check 1:OK #347/98 verifier_sdiv/SMOD64, non-zero reg divisor, check 1 @unpriv:OK #347/99 verifier_sdiv/SMOD64, non-zero reg divisor, check 2:OK #347/100 verifier_sdiv/SMOD64, non-zero reg divisor, check 2 @unpriv:OK #347/101 verifier_sdiv/SMOD64, non-zero reg divisor, check 3:OK #347/102 verifier_sdiv/SMOD64, non-zero reg divisor, check 3 @unpriv:OK #347/103 verifier_sdiv/SMOD64, non-zero reg divisor, check 4:OK #347/104 verifier_sdiv/SMOD64, non-zero reg divisor, check 4 @unpriv:OK #347/105 verifier_sdiv/SMOD64, non-zero reg divisor, check 5:OK #347/106 verifier_sdiv/SMOD64, non-zero reg divisor, check 5 @unpriv:OK #347/107 verifier_sdiv/SMOD64, non-zero reg divisor, check 6:OK #347/108 verifier_sdiv/SMOD64, non-zero reg divisor, check 6 @unpriv:OK #347/109 verifier_sdiv/SMOD64, non-zero reg divisor, check 7:OK #347/110 verifier_sdiv/SMOD64, non-zero reg divisor, check 7 @unpriv:OK #347/111 verifier_sdiv/SMOD64, non-zero reg divisor, check 8:OK #347/112 verifier_sdiv/SMOD64, non-zero reg divisor, check 8 @unpriv:OK #347/113 verifier_sdiv/SDIV32, zero divisor:OK #347/114 verifier_sdiv/SDIV32, zero divisor @unpriv:OK #347/115 verifier_sdiv/SDIV64, zero divisor:OK #347/116 verifier_sdiv/SDIV64, zero divisor @unpriv:OK #347/117 verifier_sdiv/SMOD32, zero divisor:OK #347/118 verifier_sdiv/SMOD32, zero divisor @unpriv:OK #347/119 verifier_sdiv/SMOD64, zero divisor:OK #347/120 verifier_sdiv/SMOD64, zero divisor @unpriv:OK #347 verifier_sdiv:OK Summary: 6/166 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-8-xukuohai@huaweicloud.com	2023-08-18 15:46:42 +02:00
Xu Kuohai	68b18191fe	bpf, arm64: Support signed div/mod instructions Add JIT for signed div/mod instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-7-xukuohai@huaweicloud.com	2023-08-18 15:46:35 +02:00
Xu Kuohai	c32b6ee514	bpf, arm64: Support 32-bit offset jmp instruction Add support for 32-bit offset jmp instructions. Given the arm64 direct jump range is +-128MB, which is large enough for BPF prog, jumps beyond this range are not supported. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-6-xukuohai@huaweicloud.com	2023-08-18 15:46:18 +02:00
Xu Kuohai	1104247f3f	bpf, arm64: Support unconditional bswap Add JIT support for unconditional bswap instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-5-xukuohai@huaweicloud.com	2023-08-18 15:46:09 +02:00
Xu Kuohai	bb0a1d6b49	bpf, arm64: Support sign-extension mov instructions Add JIT support for BPF sign-extension mov instructions with arm64 SXTB/SXTH/SXTW instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-4-xukuohai@huaweicloud.com	2023-08-18 15:45:58 +02:00
Xu Kuohai	cc88f540da	bpf, arm64: Support sign-extension load instructions Add JIT support for sign-extension load instructions. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-3-xukuohai@huaweicloud.com	2023-08-18 15:45:49 +02:00
Xu Kuohai	6c9f86d363	arm64: insn: Add encoders for LDRSB/LDRSH/LDRSW To support BPF sign-extend load instructions, add encoders for LDRSB/LDRSH/LDRSW. LDRSB/LDRSH/LDRSW (immediate) is encoded as follows: 3 2 2 2 2 1 0 0 0 7 6 4 2 0 5 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| sz\|1 1 1\|0\|0 1\|opc\| imm12 \| Rn \| Rt \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ LDRSB/LDRSH/LDRSW (register) is encoded as follows: 3 2 2 2 2 2 1 1 1 1 0 0 0 7 6 4 2 1 6 3 2 0 5 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| sz\|1 1 1\|0\|0 0\|opc\|1\| Rm \| opt \|S\|1 0\| Rn \| Rt \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where: - sz indicates whether 8-bit, 16-bit or 32-bit data is to be loaded - opc opc[1] (bit 23) is always 1 and opc[0] == 1 indicates regsize is 32-bit. Since BPF signed load instructions always exend the sign bit to bit 63 regardless of whether it loads an 8-bit, 16-bit or 32-bit data. So only 64-bit register size is required. That is, it's sufficient to set field opc fixed to 0x2. - opt Indicates whether to sign extend the offset register Rm and the effective bits of Rm. We set opt to 0x7 (SXTX) since we'll use Rm as a sgined 64-bit value in BPF. - S Optional only when opt field is 0x3 (LSL) In short, the above fields are encoded to the values listed below. sz opc opt S LDRSB (immediate) 0x0 0x2 na na LDRSH (immediate) 0x1 0x2 na na LDRSW (immediate) 0x2 0x2 na na LDRSB (register) 0x0 0x2 0x7 0 LDRSH (register) 0x1 0x2 0x7 0 LDRSW (register) 0x2 0x2 0x7 0 Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Florent Revest <revest@chromium.org> Acked-by: Florent Revest <revest@chromium.org> Link: https://lore.kernel.org/bpf/20230815154158.717901-2-xukuohai@huaweicloud.com	2023-08-18 15:45:34 +02:00
Jakub Kicinski	c2e5f4fd11	Merge branch 'netconsole-enable-compile-time-configuration' Breno Leitao says: ==================== netconsole: Enable compile time configuration Enable netconsole features to be set at compilation time. Create two Kconfig options that allow users to set extended logs and release prepending features at compilation time. The first patch de-duplicates the initialization code, and the second patch adds the support in the de-duplicated code, avoiding touching two different functions with the same change. ==================== Link: https://lore.kernel.org/r/20230811093158.1678322-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:25:44 -07:00
Breno Leitao	fad361a2ee	netconsole: Enable compile time configuration Enable netconsole features to be set at compilation time. Create two Kconfig options that allow users to set extended logs and release prepending features at compilation time. Right now, the user needs to pass command line parameters to netconsole, such as "+"/"r" to enable extended logs and version prepending features. With these two options, the user could set the default values for the features at compile time, and don't need to pass it in the command line to get them enabled, simplifying the command line. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230811093158.1678322-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:25:42 -07:00
Breno Leitao	b0a9e2c9a9	netconsole: Create a allocation helper De-duplicate the initialization and allocation code for struct netconsole_target. The same allocation and initialization code is duplicated in two different places in the netconsole subsystem, when the netconsole target is initialized by command line parameters (alloc_param_target()), and dynamically by sysfs (make_netconsole_target()). Create a helper function, and call it from the two different functions. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230811093158.1678322-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:25:42 -07:00
Justin Stitt	f3add6dec3	net: mdio: fix -Wvoid-pointer-to-enum-cast warning When building with clang 18 I see the following warning: \| drivers/net/mdio/mdio-xgene.c:338:13: warning: cast to smaller integer \| type 'enum xgene_mdio_id' from 'const void ' [-Wvoid-pointer-to-enum-cast] \| 338 \| mdio_id = (enum xgene_mdio_id)of_id->data; This is due to the fact that `of_id->data` is a void while `enum xgene_mdio_id` has the size of an int. This leads to truncation and possible data loss. Link: https://github.com/ClangBuiltLinux/linux/issues/1910 Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230815-void-drivers-net-mdio-mdio-xgene-v1-1-5304342e0659@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:25:40 -07:00
Jakub Kicinski	0c2d8227ba	Merge branch 'netem-use-a-seeded-prng-for-loss-and-corruption-events' François Michel says: ==================== netem: use a seeded PRNG for loss and corruption events In order to reproduce bugs or performance evaluation of network protocols and applications, it is useful to have reproducible test suites and tools. This patch adds a way to specify a PRNG seed through the TCA_NETEM_PRNG_SEED attribute for generating netem loss and corruption events. Initializing the qdisc with the same seed leads to the exact same loss and corruption patterns. If no seed is explicitly specified, the qdisc generates a random seed using get_random_u64(). This patch can be and has been tested using tc from the following iproute2-next fork: https://github.com/francoismichel/iproute2-next For instance, setting the seed 42424242 on the loopback with a loss rate of 10% will systematically drop the 5th, 12th and 24th packet when sending 25 packets. ==================== Link: https://lore.kernel.org/r/20230815092348.1449179-1-francois.michel@uclouvain.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:15:08 -07:00
François Michel	3cad70bc74	netem: use seeded PRNG for correlated loss events Use prandom_u32_state() instead of get_random_u32() to generate the correlated loss events of netem. Signed-off-by: François Michel <francois.michel@uclouvain.be> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20230815092348.1449179-4-francois.michel@uclouvain.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:15:06 -07:00
François Michel	9c87b2aecc	netem: use a seeded PRNG for generating random losses Use prandom_u32_state() instead of get_random_u32() to generate the random loss events of netem. The state of the prng is part of the prng attribute of struct netem_sched_data. Signed-off-by: François Michel <francois.michel@uclouvain.be> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20230815092348.1449179-3-francois.michel@uclouvain.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:15:05 -07:00
François Michel	4072d97ddc	netem: add prng attribute to netem_sched_data Add prng attribute to struct netem_sched_data and allows setting the seed of the PRNG through netlink using the new TCA_NETEM_PRNG_SEED attribute. The PRNG attribute is not actually used yet. Signed-off-by: François Michel <francois.michel@uclouvain.be> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://lore.kernel.org/r/20230815092348.1449179-2-francois.michel@uclouvain.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:15:05 -07:00
Jialin Zhang	a5e5b2cd47	net: ena: Use pci_dev_id() to simplify the code PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/r/20230815024248.3519068-1-zhangjialin11@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:13:09 -07:00
Ziyang Xuan	b2f8323364	tun: add __exit annotations to module exit func tun_cleanup() Add missing __exit annotations to module exit func tun_cleanup(). Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230814083000.3893589-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-17 19:11:10 -07:00
Jakub Kicinski	f54a2a132a	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZN0eNgAKCRDbK58LschI gwhhAQCwbrEgA3LslDlk22eqyfRH04D+9d7Kc3ISQssyjlr9swD+NfwfDvYqopwj Dp67QkHdluixf2/NMPTEvg/CA4mlmww= =4BwF -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-16 We've added 17 non-merge commits during the last 6 day(s) which contain a total of 20 files changed, 1179 insertions(+), 37 deletions(-). The main changes are: 1) Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy applications, from Geliang Tang. 2) Follow-up/fallout fix from the SO_REUSEPORT + bpf_sk_assign work to fix a splat on non-fullsock sks in inet[6]_steal_sock, from Lorenz Bauer. 3) Improvements to struct_ops links to avoid forcing presence of update/validate callbacks. Also add bpf_struct_ops fields documentation, from David Vernet. 4) Ensure libbpf sets close-on-exec flag on gzopen, from Marco Vedovati. 5) Several new tcx selftest additions and bpftool link show support for tcx and xdp links, from Daniel Borkmann. 6) Fix a smatch warning on uninitialized symbol in bpf_perf_link_fill_kprobe, from Yafang Shao. 7) BPF selftest fixes e.g. misplaced break in kfunc_call test, from Yipeng Zou. 8) Small cleanup to remove unused declaration bpf_link_new_file, from Yue Haibing. 9) Small typo fix to bpftool's perf help message, from Daniel T. Lee. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add mptcpify test selftests/bpf: Fix error checks of mptcp open_and_load selftests/bpf: Add two mptcp netns helpers bpf: Add update_socket_protocol hook bpftool: Implement link show support for xdp bpftool: Implement link show support for tcx selftests/bpf: Add selftest for fill_link_info bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe() net: Fix slab-out-of-bounds in inet[6]_steal_sock bpf: Document struct bpf_struct_ops fields bpf: Support default .validate() and .update() behavior for struct_ops links selftests/bpf: Add various more tcx test cases selftests/bpf: Clean up fmod_ret in bench_rename test script selftests/bpf: Fix repeat option when kfunc_call verification fails libbpf: Set close-on-exec flag on gzopen bpftool: fix perf help message bpf: Remove unused declaration bpf_link_new_file() ==================== Link: https://lore.kernel.org/r/20230816212840.1539-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-16 20:09:43 -07:00
Jakub Kicinski	42b118c9f9	Revert "net: ethernet: ti: am65-cpsw: add mqprio qdisc offload in channel mode" This reverts commit `90bc21aaef`. Patch was merged too hastily, Vladimir requested changes in: https://lore.kernel.org/all/20230816121305.5dio5tk3chge2ndh@skbuf/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-08-16 19:33:54 -07:00
Martin KaFai Lau	de40537364	Merge branch 'bpf: Force to MPTCP' Geliang Tang says: ==================== As is described in the "How to use MPTCP?" section in MPTCP wiki [1]: "Your app should create sockets with IPPROTO_MPTCP as the proto: ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be forced to create and use MPTCP sockets instead of TCP ones via the mptcpize command bundled with the mptcpd daemon." But the mptcpize (LD_PRELOAD technique) command has some limitations [2]: - it doesn't work if the application is not using libc (e.g. GoLang apps) - in some envs, it might not be easy to set env vars / change the way apps are launched, e.g. on Android - mptcpize needs to be launched with all apps that want MPTCP: we could have more control from BPF to enable MPTCP only for some apps or all the ones of a netns or a cgroup, etc. - it is not in BPF, we cannot talk about it at netdev conf. So this patchset attempts to use BPF to implement functions similer to mptcpize. The main idea is to add a hook in sys_socket() to change the protocol id from IPPROTO_TCP (or 0) to IPPROTO_MPTCP. [1] https://github.com/multipath-tcp/mptcp_net-next/wiki [2] https://github.com/multipath-tcp/mptcp_net-next/issues/79 v14: - Use getsockopt(MPTCP_INFO) to verify mptcp protocol intead of using nstat command. v13: - drop "Use random netns name for mptcp" patch. v12: - update diag_* log of update_socket_protocol. - add 'ip netns show' after 'ip netns del' to check if there is a test did not clean up its netns. - return libbpf_get_error() instead of -EIO for the error from open_and_load(). - Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of using 'ss -tOni'. v11: - add comments about outputs of 'ss' and 'nstat'. - use "err = verify_mptcpify()" instead of using =+. v10: - drop "#ifdef CONFIG_BPF_JIT". - include vmlinux.h and bpf_tracing_net.h to avoid defining some macros. - drop unneeded checks for mptcp. v9: - update comment for 'update_socket_protocol'. v8: - drop the additional checks on the 'protocol' value after the 'update_socket_protocol()' call. v7: - add __weak and __diag_* for update_socket_protocol. v6: - add update_socket_protocol. v5: - add bpf_mptcpify helper. v4: - use lsm_cgroup/socket_create v3: - patch 8: char cmd[128]; -> char cmd[256]; v2: - Fix build selftests errors reported by CI Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79 ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 11:42:35 -07:00
Geliang Tang	ddba122428	selftests/bpf: Add mptcpify test Implement a new test program mptcpify: if the family is AF_INET or AF_INET6, the type is SOCK_STREAM, and the protocol ID is 0 or IPPROTO_TCP, set it to IPPROTO_MPTCP. It will be hooked in update_socket_protocol(). Extend the MPTCP test base, add a selftest test_mptcpify() for the mptcpify case. Open and load the mptcpify test prog to mptcpify the TCP sockets dynamically, then use start_server() and connect_to_fd() to create a TCP socket, but actually what's created is an MPTCP socket, which can be verified through 'getsockopt(SOL_PROTOCOL)' and 'getsockopt(MPTCP_INFO)'. Acked-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/364e72f307e7bb38382ec7442c182d76298a9c41.1692147782.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 11:42:33 -07:00
Geliang Tang	2077465502	selftests/bpf: Fix error checks of mptcp open_and_load Return libbpf_get_error(), instead of -EIO, for the error from mptcp_sock__open_and_load(). Load success means prog_fd and map_fd are always valid. So drop these unneeded ASSERT_GE checks for them in mptcp run_test(). Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/db5fcb93293df9ab173edcbaf8252465b80da6f2.1692147782.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 10:22:16 -07:00
Geliang Tang	97c9c65208	selftests/bpf: Add two mptcp netns helpers Add two netns helpers for mptcp tests: create_netns() and cleanup_netns(). Use them in test_base(). These new helpers will be re-used in the following commits introducing new tests. Acked-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/7506371fb6c417b401cc9d7365fe455754f4ba3f.1692147782.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 10:22:16 -07:00
Geliang Tang	0dd061a6a1	bpf: Add update_socket_protocol hook Add a hook named update_socket_protocol in __sys_socket(), for bpf progs to attach to and update socket protocol. One user case is to force legacy TCP apps to create and use MPTCP sockets instead of TCP ones. Define a fmod_ret set named bpf_mptcp_fmodret_ids, add the hook update_socket_protocol into this set, and register it in bpf_mptcp_kfunc_init(). Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79 Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/ac84be00f97072a46f8a72b4e2be46cbb7fa5053.1692147782.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 10:22:16 -07:00
Daniel Borkmann	053bbf9bff	bpftool: Implement link show support for xdp Add support to dump XDP link information to bpftool. This reuses the recently added show_link_ifindex_{plain,json}(). The XDP link info only exposes the ifindex. Below shows an example link dump output, and a cgroup link is included for comparison, too: # bpftool link [...] 10: cgroup prog 2466 cgroup_id 1 attach_type cgroup_inet6_post_bind [...] 16: xdp prog 2477 ifindex enp5s0(3) [...] Equivalent json output: # bpftool link --json [...] { "id": 10, "type": "cgroup", "prog_id": 2466, "cgroup_id": 1, "attach_type": "cgroup_inet6_post_bind" }, [...] { "id": 16, "type": "xdp", "prog_id": 2477, "devname": "enp5s0", "ifindex": 3 } [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230816095651.10014-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 10:14:17 -07:00
Daniel Borkmann	e16e6c6df4	bpftool: Implement link show support for tcx Add support to dump tcx link information to bpftool. This adds a common helper show_link_ifindex_{plain,json}() which can be reused also for other link types. The plain text and json device output is the same format as in bpftool net dump. Below shows an example link dump output along with a cgroup link for comparison: # bpftool link [...] 10: cgroup prog 1977 cgroup_id 1 attach_type cgroup_inet6_post_bind [...] 13: tcx prog 2053 ifindex enp5s0(3) attach_type tcx_ingress 14: tcx prog 2080 ifindex enp5s0(3) attach_type tcx_egress [...] Equivalent json output: # bpftool link --json [...] { "id": 10, "type": "cgroup", "prog_id": 1977, "cgroup_id": 1, "attach_type": "cgroup_inet6_post_bind" }, [...] { "id": 13, "type": "tcx", "prog_id": 2053, "devname": "enp5s0", "ifindex": 3, "attach_type": "tcx_ingress" }, { "id": 14, "type": "tcx", "prog_id": 2080, "devname": "enp5s0", "ifindex": 3, "attach_type": "tcx_egress" } [...] Suggested-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230816095651.10014-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-16 10:14:17 -07:00
Yafang Shao	23cf7aa539	selftests/bpf: Add selftest for fill_link_info Add selftest for the fill_link_info of uprobe, kprobe and tracepoint. The result: $ tools/testing/selftests/bpf/test_progs --name=fill_link_info #79/1 fill_link_info/kprobe_link_info:OK #79/2 fill_link_info/kretprobe_link_info:OK #79/3 fill_link_info/kprobe_invalid_ubuff:OK #79/4 fill_link_info/tracepoint_link_info:OK #79/5 fill_link_info/uprobe_link_info:OK #79/6 fill_link_info/uretprobe_link_info:OK #79/7 fill_link_info/kprobe_multi_link_info:OK #79/8 fill_link_info/kretprobe_multi_link_info:OK #79/9 fill_link_info/kprobe_multi_invalid_ubuff:OK #79 fill_link_info:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED The test case for kprobe_multi won't be run on aarch64, as it is not supported. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230813141900.1268-3-laoar.shao@gmail.com	2023-08-16 16:44:28 +02:00
Yafang Shao	0aa35162d2	bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe() The commit `1b715e1b0e` ("bpf: Support ->fill_link_info for perf_event") leads to the following Smatch static checker warning: kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe() error: uninitialized symbol 'type'. That can happens when uname is NULL. So fix it by verifying the uname when we really need to fill it. Fixes: `1b715e1b0e` ("bpf: Support ->fill_link_info for perf_event") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Closes: https://lore.kernel.org/bpf/85697a7e-f897-4f74-8b43-82721bebc462@kili.mountain Link: https://lore.kernel.org/bpf/20230813141900.1268-2-laoar.shao@gmail.com	2023-08-16 16:44:23 +02:00
David S. Miller	950fe35831	Merge branch 'ipv6-expired-routes' Kui-Feng Lee says: ==================== Remove expired routes with a separated list of routes. FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Background ========== The size of a Linux IPv6 routing table can become a big problem if not managed appropriately. Now, Linux has a garbage collector to remove expired routes periodically. However, this may lead to a situation in which the routing path is blocked for a long period due to an excessive number of routes. For example, years ago, there is a commit `c7bb4b8903` ("ipv6: tcp: drop silly ICMPv6 packet too big messages"). The root cause is that malicious ICMPv6 packets were sent back for every small packet sent to them. These packets add routes with an expiration time that prompts the GC to periodically check all routes in the tables, including permanent ones. Why Route Expires ================= Users can add IPv6 routes with an expiration time manually. However, the Neighbor Discovery protocol may also generate routes that can expire. For example, Router Advertisement (RA) messages may create a default route with an expiration time. [RFC 4861] For IPv4, it is not possible to set an expiration time for a route, and there is no RA, so there is no need to worry about such issues. Create Routes with Expires ========================== You can create routes with expires with the command. For example, ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \ dev enp0s3 expires 30 The route that has been generated will be deleted automatically in 30 seconds. GC of FIB6 ========== The function called fib6_run_gc() is responsible for performing garbage collection (GC) for the Linux IPv6 stack. It checks for the expiration of every route by traversing the trees of routing tables. The time taken to traverse a routing table increases with its size. Holding the routing table lock during traversal is particularly undesirable. Therefore, it is preferable to keep the lock for the shortest possible duration. Solution ======== The cause of the issue is keeping the routing table locked during the traversal of large trees. To solve this problem, we can create a separate list of routes that have expiration. This will prevent GC from checking permanent routes. Result ====== We conducted a test to measure the execution times of fib6_gc_timer_cb() and observed that it enhances the GC of FIB6. During the test, we added permanent routes with the following numbers: 1000, 3000, 6000, and 9000. Additionally, we added a route with an expiration time. Here are the average execution times for the kernel without the patch. - 120020 ns with 1000 permanent routes - 308920 ns with 3000 ... - 581470 ns with 6000 ... - 855310 ns with 9000 ... The kernel with the patch consistently takes around 14000 ns to execute, regardless of the number of permanent routes that are installed. Major changes from v7: - Fix warings raised by the patchwork. Major changes from v6: - Remove unnecessary check of tb6 in fib6_clean_expires_locked(). - Use ib6_clean_expires_locked() instead in fib6_purge_rt(). Major changes from v5: - Change the order of adding new routes to the GC list and starting GC timer. - Remove time measurements from the test case. - Stop forcing GC flush. Major changes from v4: - Detect existence of 'strace' in the test case. Major changes from v3: - Fix the type of arg according to feedback. - Add 1k temporary routes and 5K permanent routes in the test case. Measure time spending on GC with strace. Major changes from v2: - Remove unnecessary and incorrect sysctl restoring in the test case. Major changes from v1: - Moved gc_link to avoid creating a hole in fib6_info. - Moved fib6_set_expires() and fib6_clean_expires() to the header file and inlined. And removed duplicated lines. - Added a test case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:26:44 +01:00
Kui-Feng Lee	a63e10da42	selftests: fib_tests: Add a test case for IPv6 garbage collection Add 1000 IPv6 routes with expiration time (w/ and w/o additional 5000 permanet routes in the background.) Wait for a few seconds to make sure they are removed correctly. The expected output of the test looks like the following example. > Fib6 garbage collection test > TEST: ipv6 route garbage collection [ OK ] Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:26:43 +01:00
Kui-Feng Lee	3dec89b14d	net/ipv6: Remove expired routes with a separated list of routes. FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:26:43 +01:00
Kai-Heng Feng	d147085183	e1000e: Use PME poll to circumvent unreliable ACPI wake On some I219 devices, ethernet cable plugging detection only works once from PCI D3 state. Subsequent cable plugging does set PME bit correctly, but device still doesn't get woken up. Since I219 connects to the root complex directly, it relies on platform firmware (ACPI) to wake it up. In this case, the GPE from _PRW only works for first cable plugging but fails to notify the driver for subsequent plugging events. The issue was originally found on CNP, but the same issue can be found on ADL too. So workaround the issue by continuing use PME poll after first ACPI wake. As PME poll is always used, the runtime suspend restriction for CNP can also be removed. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:22:22 +01:00
Abel Wu	ac8a529621	net-memcg: Fix scope of sockmem pressure indicators Now there are two indicators of socket memory pressure sit inside struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating memory reclaim pressure in memcg->memory and ->tcpmem respectively. When in legacy mode (cgroupv1), the socket memory is charged into ->tcpmem which is independent of ->memory, so socket_pressure has nothing to do with socket's pressure at all. Things could be worse by taking socket_pressure into consideration in legacy mode, as a pressure in ->memory can lead to premature reclamation/throttling in socket. While for the default mode (cgroupv2), the socket memory is charged into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used. So {socket,tcpmem}_pressure are only used in default/legacy mode respectively for indicating socket memory pressure. This patch fixes the pieces of code that make mixed use of both. Fixes: `8e8ae64524` ("mm: memcontrol: hook up vmpressure to socket pressure") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:21:32 +01:00
Louis Peens	7fd034bce6	nfp: update maintainer Take over maintainership of the nfp driver from Simon as he is moving away from Corigine. Signed-off-by: Louis Peens <louis.peens@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:18:54 +01:00
Grygorii Strashko	90bc21aaef	net: ethernet: ti: am65-cpsw: add mqprio qdisc offload in channel mode This patch adds MQPRIO Qdisc offload in full 'channel' mode which allows not only setting up pri:tc mapping, but also configuring TX shapers on external port FIFOs. The K3 CPSW MQPRIO Qdisc offload is expected to work with VLAN/priority tagged packets. Non-tagged packets have to be mapped only to TC0. - TX traffic classes must be rated starting from TC that has highest priority and with no gaps - Traffic classes are used starting from 0, that has highest priority - min_rate defines Committed Information Rate (guaranteed) - max_rate defines Excess Information Rate (non guaranteed) and offloaded as (max_rate[i] - tcX_min_rate[i]) - VLAN/priority tagged packets mapped to TC0 will exit switch with VLAN tag priority 0 The configuration example: ethtool -L eth1 tx 5 ethtool --set-priv-flags eth1 p0-rx-ptype-rrobin off tc qdisc add dev eth1 parent root handle 100: mqprio num_tc 3 \ map 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 1@1 1@2 hw 1 mode channel \ shaper bw_rlimit min_rate 0 100mbit 200mbit max_rate 0 101mbit 202mbit tc qdisc replace dev eth2 handle 100: parent root mqprio num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 queues 1@0 hw 1 ip link add link eth1 name eth1.100 type vlan id 100 ip link set eth1.100 type vlan egress 0:0 1:1 2:2 3:3 4:4 5:5 6:6 7:7 In the above example two ports share the same TX CPPI queue 0 for low priority traffic. 3 traffic classes are defined for eth1 and mapped to: TC0 - low priority, TX CPPI queue 0 -> ext Port 1 fifo0, no rate limit TC1 - prio 2, TX CPPI queue 1 -> ext Port 1 fifo1, CIR=100Mbit/s, EIR=1Mbit/s TC2 - prio 3, TX CPPI queue 2 -> ext Port 1 fifo2, CIR=200Mbit/s, EIR=2Mbit/s Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 12:17:49 +01:00
David S. Miller	569dce3f8e	Merge branch 'inet-data-races' Eric Dumazet says: ==================== inet: socket lock and data-races avoidance In this series, I converted 20 bits in "struct inet_sock" and made them truly atomic. This allows to implement many IP_ socket options in a lockless fashion (no need to acquire socket lock), and fixes data-races that were showing up in various KCSAN reports. I also took care of IP_TTL/IP_MINTTL, but left few other options for another series. v4: Rebased after recent mptcp changes. Added Reviewed-by: tags from Simon (thanks !) v3: fixed patch 7, feedback from build bot about ipvs set_mcast_loop() v2: addressed a feedback from a build bot in patch 9 by removing unused issk variable in mptcp_setsockopt_sol_ip_set_transparent() Added Acked-by: tags from Soheil (thanks !) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 11:09:18 +01:00
Eric Dumazet	12af73269f	inet: implement lockless IP_MINTTL inet->min_ttl is already read with READ_ONCE(). Implementing IP_MINTTL socket option set/read without holding the socket lock is easy. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-16 11:09:18 +01:00

1 2 3 4 5 ...

1202762 Commits All Branches Search

1202762 Commits

All Branches