linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-10-05 16:37:50 +00:00

Author	SHA1	Message	Date
Andrey Ignatov	07f9196241	selftests/bpf: Test unbounded var_off stack access Test the case when reg->smax_value is too small/big and can overflow, and separately min and max values outside of stack bounds. Example of output: # ./test_verifier #856/p indirect variable-offset stack access, unbounded OK #857/p indirect variable-offset stack access, max out of bound OK #858/p indirect variable-offset stack access, min out of bound OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:08 +02:00
Andrey Ignatov	107c26a70c	bpf: Sanity check max value for var_off stack access As discussed in [1] max value of variable offset has to be checked for overflow on stack access otherwise verifier would accept code like this: 0: (b7) r2 = 6 1: (b7) r3 = 28 2: (7a) (u64 )(r10 -16) = 0 3: (7a) (u64 )(r10 -8) = 0 4: (79) r4 = (u64 )(r1 +168) 5: (c5) if r4 s< 0x0 goto pc+4 R1=ctx(id=0,off=0,imm=0) R2=inv6 R3=inv28 R4=inv(id=0,umax_value=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff)) R10=fp0,call_-1 fp-8=mmmmmmmm fp-16=mmmmmmmm 6: (17) r4 -= 16 7: (0f) r4 += r10 8: (b7) r5 = 8 9: (85) call bpf_getsockopt#57 10: (b7) r0 = 0 11: (95) exit , where R4 obviosly has unbounded max value. Fix it by checking that reg->smax_value is inside (-BPF_MAX_VAR_OFF; BPF_MAX_VAR_OFF) range. reg->smax_value is used instead of reg->umax_value because stack pointers are calculated using negative offset from fp. This is opposite to e.g. map access where offset must be non-negative and where umax_value is used. Also dedicated verbose logs are added for both min and max bound check failures to have diagnostics consistent with variable offset handling in check_map_access(). [1] https://marc.info/?l=linux-netdev&m=155433357510597&w=2 Fixes: `2011fccfb6` ("bpf: Support variable offset stack access from helpers") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:08 +02:00
Andrey Ignatov	2c6927dbdc	selftests/bpf: Test indirect var_off stack access in unpriv mode Test that verifier rejects indirect stack access with variable offset in unprivileged mode and accepts same code in privileged mode. Since pointer arithmetics is prohibited in unprivileged mode verifier should reject the program even before it gets to helper call that uses variable offset, at the time when that variable offset is trying to be constructed. Example of output: # ./test_verifier ... #859/u indirect variable-offset stack access, priv vs unpriv OK #859/p indirect variable-offset stack access, priv vs unpriv OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:08 +02:00
Andrey Ignatov	088ec26d9c	bpf: Reject indirect var_off stack access in unpriv mode Proper support of indirect stack access with variable offset in unprivileged mode (!root) requires corresponding support in Spectre masking for stack ALU in retrieve_ptr_limit(). There are no use-case for variable offset in unprivileged mode though so make verifier reject such accesses for simplicity. Pointer arithmetics is one (and only?) way to cause variable offset and it's already rejected in unpriv mode so that verifier won't even get to helper function whose argument contains variable offset, e.g.: 0: (7a) (u64 )(r10 -16) = 0 1: (7a) (u64 )(r10 -8) = 0 2: (61) r2 = (u32 )(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2 stack pointer arithmetic goes out of range, prohibited for !root Still it looks like a good idea to reject variable offset indirect stack access for unprivileged mode in check_stack_boundary() explicitly. Fixes: `2011fccfb6` ("bpf: Support variable offset stack access from helpers") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:07 +02:00
Andrey Ignatov	f68a5b4464	selftests/bpf: Test indirect var_off stack access in raw mode Test that verifier rejects indirect access to uninitialized stack with variable offset. Example of output: # ./test_verifier ... #859/p indirect variable-offset stack access, uninitialized OK Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:07 +02:00
Andrey Ignatov	f2bcd05ec7	bpf: Reject indirect var_off stack access in raw mode It's hard to guarantee that whole memory is marked as initialized on helper return if uninitialized stack is accessed with variable offset since specific bounds are unknown to verifier. This may cause uninitialized stack leaking. Reject such an access in check_stack_boundary to prevent possible leaking. There are no known use-cases for indirect uninitialized stack access with variable offset so it shouldn't break anything. Fixes: `2011fccfb6` ("bpf: Support variable offset stack access from helpers") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:50:07 +02:00
Alexei Starovoitov	636e78b1cd	samples/bpf: fix build with new clang clang started to error on invalid asm clobber usage in x86 headers and many bpf program samples failed to build with the message: CLANG-bpf /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14: In file included from ../include/linux/in.h:23: In file included from ../include/uapi/linux/in.h:24: In file included from ../include/linux/socket.h:8: In file included from ../include/linux/uio.h:14: In file included from ../include/crypto/hash.h:16: In file included from ../include/linux/crypto.h:26: In file included from ../include/linux/uaccess.h:5: In file included from ../include/linux/sched.h:15: In file included from ../include/linux/sem.h:5: In file included from ../include/uapi/linux/sem.h:5: In file included from ../include/linux/ipc.h:9: In file included from ../include/linux/refcount.h:72: ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "er", i, "cx"); ^ ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list r->refs.counter, e, "cx"); ^ 2 errors generated. Override volatile() to workaround the problem. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-05 16:28:36 +02:00
Daniel T. Lee	e67b2c7154	samples, selftests/bpf: add NULL check for ksym_search Since, ksym_search added with verification logic for symbols existence, it could return NULL when the kernel symbols are not loaded. This commit will add NULL check logic after ksym_search. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 16:43:47 +02:00
Daniel T. Lee	0979ff7992	selftests/bpf: ksym_search won't check symbols exists Currently, ksym_search located at trace_helpers won't check symbols are existing or not. In ksym_search, when symbol is not found, it will return &syms[0](_stext). But when the kernel symbols are not loaded, it will return NULL, which is not a desired action. This commit will add verification logic whether symbols are loaded prior to the symbol search. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 16:43:46 +02:00
Daniel Borkmann	cc441a6948	Merge branch 'bpf-verifier-scalability' Alexei Starovoitov says: ==================== v1->v2: - fixed typo in patch 1 - added a patch to convert kcalloc to kvcalloc - added a patch to verbose 16-bit jump offset check - added a test with 1m insns This patch set is the first step to be able to accept large programs. The verifier still suffers from its brute force algorithm and large programs can easily hit 1M insn_processed limit. A lot more work is necessary to be able to verify large programs. v1: Realize two key ideas to speed up verification speed by ~20 times 1. every 'branching' instructions records all verifier states. not all of them are useful for search pruning. add a simple heuristic to keep states that were successful in search pruning and remove those that were not 2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ. Once the register is marked there is no need to remark it again in the future. Hence stop walking the chain once first LIVE_READ is seen. 1st optimization gives 10x speed up on large programs and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%. Combined the deliver ~20x speedup on large programs. Faster and bounded verification time allows to increase insn_processed limit to 1 million from 130k. Worst case it takes 1/10 of a second to process that many instructions and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state) which is around ~5Mbyte. Increase insn_per_program limit for root to insn_processed limit. Add verification stats and stress tests for verifier scalability. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:39 +02:00
Alexei Starovoitov	8aa2d4b4b9	selftests/bpf: synthetic tests to push verifier limits Add a test to generate 1m ld_imm64 insns to stress the verifier. Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k and jump_around_ld_abs from 4k to 5.5k. Larger sizes are not possible due to 16-bit offset encoding in jump instructions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	e5e7a8f2d8	selftests/bpf: add few verifier scale tests Add 3 basic tests that stress verifier scalability. test_verif_scale1.c calls non-inlined jhash() function 90 times on different position in the packet. This test simulates network packet parsing. jhash function is ~140 instructions and main program is ~1200 insns. test_verif_scale2.c force inlines jhash() function 90 times. This program is ~15k instructions long. test_verif_scale3.c calls non-inlined jhash() function 90 times on But this time jhash has to process 32-bytes from the packet instead of 14-bytes in tests 1 and 2. jhash function is ~230 insns and main program is ~1200 insns. $ test_progs -s can be used to see verifier stats. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	da11b41758	libbpf: teach libbpf about log_level bit 2 Allow bpf_prog_load_xattr() to specify log_level for program loading. Teach libbpf to accept log_level with bit 2 set. Increase default BPF_LOG_BUF_SIZE from 256k to 16M. There is no downside to increase it to a maximum allowed by old kernels. Existing 256k limit caused ENOSPC errors and users were not able to see verifier error which is printed at the end of the verifier log. If ENOSPC is hit, double the verifier log and try again to capture the verifier error. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	7a9f5c65ab	bpf: increase verifier log limit The existing 16Mbyte verifier log limit is not enough for log_level=2 even for small programs. Increase it to 1Gbyte. Note it's not a kernel memory limit. It's an amount of memory user space provides to store the verifier log. The kernel populates it 1k at a time. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	c04c0d2b96	bpf: increase complexity limit and maximum program size Large verifier speed improvements allow to increase verifier complexity limit. Now regardless of the program composition and its size it takes little time for the verifier to hit insn_processed limit. On typical x86 machine non-debug kernel processes 1M instructions in 1/10 of a second. (before these speed improvements specially crafted programs could be hitting multi-second verification times) Full kasan kernel with debug takes ~1 second for the same 1M insns. Hence bump the BPF_COMPLEXITY_LIMIT_INSNS limit to 1M. Also increase the number of instructions per program from 4k to internal BPF_COMPLEXITY_LIMIT_INSNS limit. 4k limit was confusing to users, since small programs with hundreds of insns could be hitting BPF_COMPLEXITY_LIMIT_INSNS limit. Sometimes adding more insns and bpf_trace_printk debug statements would make the verifier accept the program while removing code would make the verifier reject it. Some user space application started to add #define MAX_FOO to their programs and do: MAX_FOO=100; again: compile with MAX_FOO; try to load; if (fails_to_load) { reduce MAX_FOO; goto again; } to be able to fit maximum amount of processing into single program. Other users artificially split their single program into a set of programs and use all 32 iterations of tail_calls to increase compute limits. And the most advanced folks used unlimited tc-bpf filter list to execute many bpf programs. Essentially the users managed to workaround 4k insn limit. This patch removes the limit for root programs from uapi. BPF_COMPLEXITY_LIMIT_INSNS is the kernel internal limit and success to load the program no longer depends on program size, but on 'smartness' of the verifier only. The verifier will continue to get smarter with every kernel release. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	4f73379ec5	bpf: verbose jump offset overflow check Larger programs may trigger 16-bit jump offset overflow check during instruction patching. Make this error verbose otherwise users cannot decipher error code without printks in the verifier. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	71dde681a8	bpf: convert temp arrays to kvcalloc Temporary arrays used during program verification need to be vmalloc-ed to support large bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:38 +02:00
Alexei Starovoitov	25af32dad8	bpf: improve verification speed by not remarking live_read With large verifier speed improvement brought by the previous patch mark_reg_read() becomes the hottest function during verification. On a typical program it consumes 40% of cpu. mark_reg_read() walks parentage chain of registers to mark parents as LIVE_READ. Once the register is marked there is no need to remark it again in the future. Hence stop walking the chain once first LIVE_READ is seen. This optimization drops mark_reg_read() time from 40% of cpu to <1% and overall 2x improvement of verification speed. For some programs the longest_mark_read_walk counter improves from ~500 to ~5 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:37 +02:00
Alexei Starovoitov	9f4686c41b	bpf: improve verification speed by droping states Branch instructions, branch targets and calls in a bpf program are the places where the verifier remembers states that led to successful verification of the program. These states are used to prune brute force program analysis. For unprivileged programs there is a limit of 64 states per such 'branching' instructions (maximum length is tracked by max_states_per_insn counter introduced in the previous patch). Simply reducing this threshold to 32 or lower increases insn_processed metric to the point that small valid programs get rejected. For root programs there is no limit and cilium programs can have max_states_per_insn to be 100 or higher. Walking 100+ states multiplied by number of 'branching' insns during verification consumes significant amount of cpu time. Turned out simple LRU-like mechanism can be used to remove states that unlikely will be helpful in future search pruning. This patch introduces hit_cnt and miss_cnt counters: hit_cnt - this many times this state successfully pruned the search miss_cnt - this many times this state was not equivalent to other states (and that other states were added to state list) The heuristic introduced in this patch is: if (sl->miss_cnt > sl->hit_cnt * 3 + 3) /* drop this state from future considerations / Higher numbers increase max_states_per_insn (allow more states to be considered for pruning) and slow verification speed, but do not meaningfully reduce insn_processed metric. Lower numbers drop too many states and insn_processed increases too much. Many different formulas were considered. This one is simple and works well enough in practice. (the analysis was done on selftests/progs/ and on cilium programs) The end result is this heuristic improves verification speed by 10 times. Large synthetic programs that used to take a second more now take 1/10 of a second. In cases where max_states_per_insn used to be 100 or more, now it's ~10. There is a slight increase in insn_processed for cilium progs: before after bpf_lb-DLB_L3.o 1831 1838 bpf_lb-DLB_L4.o 3029 3218 bpf_lb-DUNKNOWN.o 1064 1064 bpf_lxc-DDROP_ALL.o 26309 26935 bpf_lxc-DUNKNOWN.o 33517 34439 bpf_netdev.o 9713 9721 bpf_overlay.o 6184 6184 bpf_lcx_jit.o 37335 39389 And 2-3 times improvement in the verification speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:37 +02:00
Alexei Starovoitov	06ee7115b0	bpf: add verifier stats and log_level bit 2 In order to understand the verifier bottlenecks add various stats and extend log_level: log_level 1 and 2 are kept as-is: bit 0 - level=1 - print every insn and verifier state at branch points bit 1 - level=2 - print every insn and verifier state at every insn bit 2 - level=4 - print verifier error and stats at the end of verification When verifier rejects the program the libbpf is trying to load the program twice. Once with log_level=0 (no messages, only error code is reported to user space) and second time with log_level=1 to tell the user why the verifier rejected it. With introduction of bit 2 - level=4 the libbpf can choose to always use that level and load programs once, since the verification speed is not affected and in case of error the verbose message will be available. Note that the verifier stats are not part of uapi just like all other verbose messages. They're expected to change in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-04 01:27:37 +02:00
Andrii Nakryiko	e83b9f5544	kbuild: add ability to generate BTF type info for vmlinux This patch adds new config option to trigger generation of BTF type information from DWARF debuginfo for vmlinux and kernel modules through pahole, which in turn relies on libbpf for btf_dedup() algorithm. The intent is to record compact type information of all types used inside kernel, including all the structs/unions/typedefs/etc. This enables BPF's compile-once-run-everywhere ([0]) approach, in which tracing programs that are inspecting kernel's internal data (e.g., struct task_struct) can be compiled on a system running some kernel version, but would be possible to run on other kernel versions (and configurations) without recompilation, even if the layout of structs changed and/or some of the fields were added, removed, or renamed. This is only possible if BPF loader can get kernel type info to adjust all the offsets correctly. This patch is a first time in this direction, making sure that BTF type info is part of Linux kernel image in non-loadable ELF section. BTF deduplication ([1]) algorithm typically provides 100x savings compared to DWARF data, so resulting .BTF section is not big as is typically about 2MB in size. [0] http://vger.kernel.org/lpc-bpf2018.html#session-2 [1] https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-03 00:53:07 +02:00
Daniel Borkmann	99182beed8	Merge branch 'bpf-selftest-clang-fixes' Stanislav Fomichev says: ==================== This series contains small fixes to make bpf selftests compile cleanly with clangs. All of them are not real problems, but it's nice to have an option to use clang for the tests themselves. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:19 +02:00
Stanislav Fomichev	7596aa3ea8	selftests: bpf: remove duplicate .flags initialization in ctx_skb.c verifier/ctx_skb.c:708:11: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	a918b03e8c	selftests: bpf: fix -Wformat-invalid-specifier for bpf_obj_id.c Use standard C99 %zu for sizeof, not GCC's custom %Zu: bpf_obj_id.c:76:48: warning: invalid conversion specifier 'Z' Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	94e8f3c712	selftests: bpf: fix -Wformat-security warning for flow_dissector_load.c flow_dissector_load.c:55:19: warning: format string is not a string literal (potentially insecure) [-Wformat-security] error(1, errno, command); ^~~~~~~ flow_dissector_load.c:55:19: note: treat the string as an argument to avoid this error(1, errno, command); ^ "%s", 1 warning generated. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Stanislav Fomichev	6b7b6995c4	selftests: bpf: tests.h should depend on .c files, not the output This makes sure we don't put headers as input files when doing compilation, because clang complains about the following: clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_verifier' failed make: * [xxx/tools/testing/selftests/bpf/test_verifier] Error 1 make: * Waiting for unfinished jobs.... clang-9: error: cannot specify -o when generating multiple output files ../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_progs' failed Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-02 23:17:18 +02:00
Yonghong Song	9de2640b06	bpf: add bpffs multi-dimensional array tests in test_btf For multiple dimensional arrays like below, int a[2][3] both llvm and pahole generated one BTF_KIND_ARRAY type like . element_type: int . index_type: unsigned int . number of elements: 6 Such a collapsed BTF_KIND_ARRAY type will cause the divergence in BTF vs. the user code. In the compile-once-run-everywhere project, the header file is generated from BTF and used for bpf program, and the definition in the header file will be different from what user expects. But the kernel actually supports chained multi-dimensional array types properly. The above "int a[2][3]" can be represented as Type #n: . element_type: int . index_type: unsigned int . number of elements: 3 Type #(n+1): . element_type: type #n . index_type: unsigned int . number of elements: 2 The following llvm commit https://reviews.llvm.org/rL357215 also enables llvm to generated proper chained multi-dimensional arrays. The test_btf already has a raw test ("struct test #1") for chained multi-dimensional arrays. This patch added amended bpffs test for chained multi-dimensional arrays. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-04-01 15:41:05 +02:00
Alexei Starovoitov	c3969de8ac	Merge branch 'variable-stack-access' Andrey Ignatov says: ==================== The patch set adds support for stack access with variable offset from helpers. Patch 1 is the main patch in the set and provides more details. Patch 2 adds selftests for new functionality. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:36 -07:00
Andrey Ignatov	8ff80e96e3	selftests/bpf: Test variable offset stack access Test different scenarios of indirect variable-offset stack access: out of bound access (>0), min_off below initialized part of the stack, max_off+size above initialized part of the stack, initialized stack. Example of output: ... #856/p indirect variable-offset stack access, out of bound OK #857/p indirect variable-offset stack access, max_off+size > max_initialized OK #858/p indirect variable-offset stack access, min_off < min_initialized OK #859/p indirect variable-offset stack access, ok OK ... Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:35 -07:00
Andrey Ignatov	2011fccfb6	bpf: Support variable offset stack access from helpers Currently there is a difference in how verifier checks memory access for helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to variable part of offset. check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable offsets just fine, so that BPF program can call a helper like this: some_helper(map_value_ptr + off, size); , where offset is unknown at load time, but is checked by program to be in a safe rage (off >= 0 && off + size < map_value_size). But it's not the case for check_stack_boundary, that is used for PTR_TO_STACK, and same code with pointer to stack is rejected by verifier: some_helper(stack_value_ptr + off, size); For example: 0: (7a) (u64 )(r10 -16) = 0 1: (7a) (u64 )(r10 -8) = 0 2: (61) r2 = (u32 )(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 6: (18) r1 = 0xffff888111343a80 8: (85) call bpf_map_lookup_elem#1 invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4) Add support for variable offset access to check_stack_boundary so that if offset is checked by program to be in a safe range it's accepted by verifier. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-03-29 12:05:35 -07:00
Luca Boccassi	dd399ac9e3	tools/bpf: generate pkg-config file for libbpf Generate a libbpf.pc file at build time so that users can rely on pkg-config to find the library, its CFLAGS and LDFLAGS. Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-03-28 17:06:03 +01:00
David S. Miller	356d71e00d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-03-27 17:37:58 -07:00
Eric Dumazet	df453700e8	inet: switch IP ID generator to siphash According to Amit Klein and Benny Pinkas, IP ID generation is too weak and might be used by attackers. Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix()) having 64bit key and Jenkins hash is risky. It is time to switch to siphash and its 128bit keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 14:29:26 -07:00
Colin Ian King	180a8c3d5d	net: phy: mdio-bcm-unimac: remove redundant !timeout check The check for zero timeout is always true at the end of the proceeding while loop; the only other exit path in the loop is if the unimac MDIO is not busy. Remove the redundant zero timeout check and always return -ETIMEDOUT on this timeout return path. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 14:27:30 -07:00
Eric Dumazet	4f661542a4	tcp: fix zerocopy and notsent_lowat issues My recent patch had at least three problems : 1) TX zerocopy wants notification when skb is acknowledged, thus we need to call skb_zcopy_clear() if the skb is cached into sk->sk_tx_skb_cache 2) Some applications might expect precise EPOLLOUT notifications, so we need to update sk->sk_wmem_queued and call sk_mem_uncharge() from sk_wmem_free_skb() in all cases. The SOCK_QUEUE_SHRUNK flag must also be set. 3) Reuse of saved skb should have used skb_cloned() instead of simply checking if the fast clone has been freed. Fixes: `472c2e07ee` ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:59:02 -07:00
Numan Siddique	4d5ec89fc8	net: openvswitch: Add a new action check_pkt_len This patch adds a new action - 'check_pkt_len' which checks the packet length and executes a set of actions if the packet length is greater than the specified length or executes another set of actions if the packet length is lesser or equal to. This action takes below nlattrs * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions to apply if the packet length is greater than the specified 'pkt_len' * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested actions to apply if the packet length is lesser or equal to the specified 'pkt_len'. The main use case for adding this action is to solve the packet drops because of MTU mismatch in OVN virtual networking solution. When a VM (which belongs to a logical switch of OVN) sends a packet destined to go via the gateway router and if the nic which provides external connectivity, has a lesser MTU, OVS drops the packet if the packet length is greater than this MTU. With the help of this action, OVN will check the packet length and if it is greater than the MTU size, it will generate an ICMP packet (type 3, code 4) and includes the next hop mtu in it so that the sender can fragment the packets. Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html Suggested-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> CC: Gregory Rose <gvrose8192@gmail.com> CC: Pravin B Shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:53:23 -07:00
David S. Miller	d7aa033831	Merge branch 'ethtool-add-support-for-Fast-Link-Down-as-new-PHY-tunable' Heiner Kallweit says: ==================== ethtool: add support for Fast Link Down as new PHY tunable This adds support for Fast Link Down as new PHY tunable. Fast Link Down reduces the time until a link down event is reported for 1000BaseT. According to the standard it's 750ms what is too long for several use cases. This is the kernel-related series, the ethtool userspace extension I'd submit once the kernel part has been applied. v2: - add describing comment in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Heiner Kallweit	69f42be8af	net: phy: marvell: add PHY tunable fast link down support for 88E1540 1000BaseT standard requires that a link is reported as down earliest after 750ms. Several use case however require a much faster detecion of a broken link. Fast Link Down supports this by intentionally violating a the standard. This patch exposes the Fast Link Down feature of 88E1540 and 88E6390. These PHY's can be found as internal PHY's in several switches: 88E6352, 88E6240, 88E6176, 88E6172, and 88E6390(X). Fast Link Down and EEE are mutually exclusive. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Heiner Kallweit	3aeb0803f7	ethtool: add PHY Fast Link Down support This adds support for Fast Link Down as new PHY tunable. Fast Link Down reduces the time until a link down event is reported for 1000BaseT. According to the standard it's 750ms what is too long for several use cases. v2: - add comment describing the constants Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:51:49 -07:00
Bart Van Assche	7b7ed885af	net/core: Allow the compiler to verify declaration and definition consistency Instead of declaring a function in a .c file, declare it in a header file and include that header file from the source files that define and that use the function. That allows the compiler to verify consistency of declaration and definition. See also commit `52267790ef` ("sock: add MSG_ZEROCOPY") # v4.14. Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:44 -07:00
Bart Van Assche	a986967eb8	net/core: Fix rtnetlink kernel-doc headers This patch avoids that the following warnings are reported when building with W=1: net/core/rtnetlink.c:3580: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3580: warning: Function parameter or member 'flags' not described in 'ndo_dflt_fdb_add' net/core/rtnetlink.c:3718: warning: Function parameter or member 'ndm' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'tb' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'dev' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'addr' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3718: warning: Function parameter or member 'vid' not described in 'ndo_dflt_fdb_del' net/core/rtnetlink.c:3861: warning: Function parameter or member 'skb' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'cb' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'filter_dev' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Function parameter or member 'idx' not described in 'ndo_dflt_fdb_dump' net/core/rtnetlink.c:3861: warning: Excess function parameter 'nlh' description in 'ndo_dflt_fdb_dump' Cc: Hubert Sokolowski <hubert.sokolowski@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:44 -07:00
Bart Van Assche	d79b3bafab	net/core: Document __skb_flow_dissect() flags argument This patch avoids that the following warning is reported when building with W=1: warning: Function parameter or member 'flags' not described in '__skb_flow_dissect' Cc: Tom Herbert <tom@herbertland.com> Fixes: `cd79a2382a` ("flow_dissector: Add flags argument to skb_flow_dissector functions") # v4.3. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Bart Van Assche	b3c0fd61e6	net/core: Document all dev_ioctl() arguments This patch avoids that the following warnings are reported when building with W=1: net/core/dev_ioctl.c:378: warning: Function parameter or member 'ifr' not described in 'dev_ioctl' net/core/dev_ioctl.c:378: warning: Function parameter or member 'need_copyout' not described in 'dev_ioctl' net/core/dev_ioctl.c:378: warning: Excess function parameter 'arg' description in 'dev_ioctl' Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: `44c02a2c3d` ("dev_ioctl(): move copyin/copyout to callers") # v4.16. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Bart Van Assche	37f3c421e8	net/core: Document reuseport_add_sock() bind_inany argument This patch avoids that the following warning is reported when building with W=1: warning: Function parameter or member 'bind_inany' not described in 'reuseport_add_sock' Cc: Martin KaFai Lau <kafai@fb.com> Fixes: `2dbb9b9e6d` ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") # v4.19. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:49:43 -07:00
Heiner Kallweit	863d1a8d55	net: dsa: mv88e6xxx: remove unneeded cmode initialization This partially reverts `ed8fe20205` ("net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode"). I missed that chip->ports[].cmode is overwritten anyway by the cmode caching in mv88e6xxx_setup(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:47:23 -07:00
Sudarsana Reddy Kalluru	32705592f9	bnx2x: Utilize FW 7.13.11.0. Commit 8fcf0ec44c11f "bnx2x: Add FW 7.13.11.0" added said .bin FW to linux-firmware; This patch incorporates the FW in the bnx2x driver. This introduces few FW fixes and the support for Tx VLAN filtering. Please consider applying it to 'net-next' tree. Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:43:14 -07:00
Kristian Evensen	1713cb37bf	fou: Support binding FoU socket An FoU socket is currently bound to the wildcard-address. While this works fine, there are several use-cases where the use of the wildcard-address is not desirable. For example, I use FoU on some multi-homed servers and would like to use FoU on only one of the interfaces. This commit adds support for binding FoU sockets to a given source address/interface, as well as connecting the socket to a given destination address/port. udp_tunnel already provides the required infrastructure, so most of the code added is for exposing and setting the different attributes (local address, peer address, etc.). The lookups performed when we add, delete or get an FoU-socket has also been updated to compare all the attributes a user can set. Since the comparison now involves several elements, I have added a separate comparison-function instead of open-coding. In order to test the code and ensure that the new comparison code works correctly, I started by creating a wildcard socket bound to port 1234 on my machine. I then tried to create a non-wildcarded socket bound to the same port, as well as fetching and deleting the socket (including source address, peer address or interface index in the netlink request). Both the create, fetch and delete request failed. Deleting/fetching the socket was only successful when my netlink request attributes matched those used to create the socket. I then repeated the tests, but with a socket bound to a local ip address, a socket bound to a local address + interface, and a bound socket that was also «connected» to a peer. Add only worked when no socket with the matching source address/interface (or wildcard) existed, while fetch/delete was only successful when all attributes matched. In addition to testing that the new code work, I also checked that the current behavior is kept. If none of the new attributes are provided, then an FoU-socket is configured as before (i.e., wildcarded). If any of the new attributes are provided, the FoU-socket is configured as expected. v1->v2: * Fixed building with IPv6 disabled (kbuild). * Fixed a return type warning and make the ugly comparison function more readable (kbuild). * Describe more in detail what has been tested (thanks David Miller). * Make peer port required if peer address is specified. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 13:30:07 -07:00
Linus Torvalds	1a9df9e29c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "Fixes here and there, a couple new device IDs, as usual: 1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei. 2) Fix 64-bit division in iwlwifi, from Arnd Bergmann. 3) Fix documentation for some eBPF helpers, from Quentin Monnet. 4) Some UAPI bpf header sync with tools, also from Quentin Monnet. 5) Set descriptor ownership bit at the right time for jumbo frames in stmmac driver, from Aaro Koskinen. 6) Set IFF_UP properly in tun driver, from Eric Dumazet. 7) Fix load/store doubleword instruction generation in powerpc eBPF JIT, from Naveen N. Rao. 8) nla_nest_start() return value checks all over, from Kangjie Lu. 9) Fix asoc_id handling in SCTP after the SCTP__ASSOC changes this merge window. From Marcelo Ricardo Leitner and Xin Long. 10) Fix memory corruption with large MTUs in stmmac, from Aaro Koskinen. 11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric Dumazet. 12) Fix topology subscription cancellation in tipc, from Erik Hugne. 13) Memory leak in genetlink error path, from Yue Haibing. 14) Valid control actions properly in packet scheduler, from Davide Caratti. 15) Even if we get EEXIST, we still need to rehash if a shrink was delayed. From Herbert Xu. 16) Fix interrupt mask handling in interrupt handler of r8169, from Heiner Kallweit. 17) Fix leak in ehea driver, from Wen Yang" git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits) dpaa2-eth: fix race condition with bql frame accounting chelsio: use BUG() instead of BUG_ON(1) net: devlink: skip info_get op call if it is not defined in dumpit net: phy: bcm54xx: Encode link speed and activity into LEDs tipc: change to check tipc_own_id to return in tipc_net_stop net: usb: aqc111: Extend HWID table by QNAP device net: sched: Kconfig: update reference link for PIE net: dsa: qca8k: extend slave-bus implementations net: dsa: qca8k: remove leftover phy accessors dt-bindings: net: dsa: qca8k: support internal mdio-bus dt-bindings: net: dsa: qca8k: fix example net: phy: don't clear BMCR in genphy_soft_reset bpf, libbpf: clarify bump in libbpf version info bpf, libbpf: fix version info and add it to shared object rxrpc: avoid clang -Wuninitialized warning tipc: tipc clang warning net: sched: fix cleanup NULL pointer exception in act_mirr r8169: fix cable re-plugging issue net: ethernet: ti: fix possible object reference leak net: ibm: fix possible object reference leak ...	2019-03-27 12:22:57 -07:00
David S. Miller	eec7e2954d	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-03-26 This series contains more updates to the ice driver only. Jeremiah provides his first patch to the Linux kernel to clean up un-necessary newlines in driver log messages. Mitch updates the ice driver to use existing status codes in the iavf driver so that when errors occur, it will not report nonsensical results. Adds support for VF admin queue interrupts by programming the VPINT_MBX_CTL register array. Brett adds a check for a bit that we set while preparing for a reset, to ensure we are prepared to do a proper reset. Also implemented PCI error handling operations. Went through and audited the hot path with pahole and made modifications based on the results since 2 structures were taking up more space than necessary due to cache alignment issues. Fixed an issue where when flow control was disabled, the state of flow control was being displayed as "Unknown". Anirudh fixes adaptive interrupt moderation changes by adding code that was missed, that should have been added in the initial patch to add that support. Cleaned up a function prototype that was never implemented. Did additional code cleanup by removing unneeded braces and redundant code comments. Akeem fixes an issue that occurs when the VF is attempting to remove the default LAN/MAC address, which is programmed by the administrator by updating the error message to explicitly say that the VF cannot change the MAC programmed by the PF. Preethi fixes the driver to not fall into the error path when a added filter already exists, but instead continue to process the rest of the function and add appropriate checks after adding MAC filters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:19:13 -07:00
David S. Miller	b0be25c575	Merge branch 'net-mvpp2-Classifier-updates-and-cleanups' Maxime Chevallier says: ==================== net: mvpp2: Classifier updates and cleanups In preparation for the future addition of classification offload support, this series cleans-up the current code dealing with the classification engines. As of today, the classification code only deals with RSS, which is a very narrow use-case when considering all of the classifier's features. This lead to design and naming decisions that don't really stand when considering using more classification features. This series therefore includes quite a lot a renaming of functions and macros, and tries to make the naming schemes more consistent and the code more readable. The debugfs interface that allows reading the various Parsing and Classification engines has been cleaned-up and made more generic, allowing to read the Flow Table and C2 engine hit counters on a per-entry basis. The Classifier itself has been made more robust by introducing the lu_type field in classification lookups, that prevents false-positive matches in the future. We also initialise the various engine in a more extensive and less error-prone way by assining default values to all entries in the C2 and Flow table. This is a pretty big series considering it's mostly cleanup, but my goal is to make the future series that will contain new big features easier to review, focusing on the real logic. Besides the debugfs interface, this series doesn't intend to introduce any new features or change in behaviours. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-27 11:10:58 -07:00

1 2 3 4 5 ...

825493 commits