linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-09-29 05:44:11 +00:00

Author	SHA1	Message	Date
Yonghong Song	6b9e333134	selftests/bpf: Add arraymap test for bpf_for_each_map_elem() helper A test is added for arraymap and percpu arraymap. The test also exercises the early return for the helper which does not traverse all elements. $ ./test_progs -n 45 #45/1 hash_map:OK #45/2 array_map:OK #45 for_each:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204934.3885756-1-yhs@fb.com	2021-02-26 13:23:53 -08:00
Yonghong Song	9de7f0fdab	selftests/bpf: Add hashmap test for bpf_for_each_map_elem() helper A test case is added for hashmap and percpu hashmap. The test also exercises nested bpf_for_each_map_elem() calls like bpf_prog: bpf_for_each_map_elem(func1) func1: bpf_for_each_map_elem(func2) func2: $ ./test_progs -n 45 #45/1 hash_map:OK #45 for_each:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204933.3885657-1-yhs@fb.com	2021-02-26 13:23:53 -08:00
Yonghong Song	f1f9f0d8d7	bpftool: Print subprog address properly With later hashmap example, using bpftool xlated output may look like: int dump_task(struct bpf_iter__task * ctx): ; struct task_struct task = ctx->task; 0: (79) r2 = (u64 )(r1 +8) ; if (task == (void )0 \|\| called > 0) ... 19: (18) r2 = subprog[+17] 30: (18) r2 = subprog[+25] ... 36: (95) exit __u64 check_hash_elem(struct bpf_map * map, __u32 * key, __u64 * val, struct callback_ctx * data): ; struct bpf_iter__task ctx = data->ctx; 37: (79) r5 = (u64 )(r4 +0) ... 55: (95) exit __u64 check_percpu_elem(struct bpf_map map, __u32 * key, __u64 * val, void * unused): ; check_percpu_elem(struct bpf_map map, __u32 key, __u64 val, void unused) 56: (bf) r6 = r3 ... 83: (18) r2 = subprog[-47] Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204931.3885458-1-yhs@fb.com	2021-02-26 13:23:53 -08:00
Yonghong Song	53eddb5e04	libbpf: Support subprog address relocation A new relocation RELO_SUBPROG_ADDR is added to capture subprog addresses loaded with ld_imm64 insns. Such ld_imm64 insns are marked with BPF_PSEUDO_FUNC and will be passed to kernel. For bpf_for_each_map_elem() case, kernel will check that the to-be-used subprog address must be a static function and replace it with proper actual jited func address. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204930.3885367-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	b8f871fa32	libbpf: Move function is_ldimm64() earlier in libbpf.c Move function is_ldimm64() close to the beginning of libbpf.c, so it can be reused by later code and the next patch as well. There is no functionality change. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204929.3885295-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	06dcdcd4b9	bpf: Add arraymap support for bpf_for_each_map_elem() helper This patch added support for arraymap and percpu arraymap. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204928.3885192-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	314ee05e2f	bpf: Add hashtab support for bpf_for_each_map_elem() helper This patch added support for hashmap, percpu hashmap, lru hashmap and percpu lru hashmap. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204927.3885020-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	69c087ba62	bpf: Add bpf_for_each_map_elem() helper The bpf_for_each_map_elem() helper is introduced which iterates all map elements with a callback function. The helper signature looks like long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags) and for each map element, the callback_fn will be called. For example, like hashmap, the callback signature may look like long callback_fn(map, key, val, callback_ctx) There are two known use cases for this. One is from upstream ([1]) where a for_each_map_elem helper may help implement a timeout mechanism in a more generic way. Another is from our internal discussion for a firewall use case where a map contains all the rules. The packet data can be compared to all these rules to decide allow or deny the packet. For array maps, users can already use a bounded loop to traverse elements. Using this helper can avoid using bounded loop. For other type of maps (e.g., hash maps) where bounded loop is hard or impossible to use, this helper provides a convenient way to operate on all elements. For callback_fn, besides map and map element, a callback_ctx, allocated on caller stack, is also passed to the callback function. This callback_ctx argument can provide additional input and allow to write to caller stack for output. If the callback_fn returns 0, the helper will iterate through next element if available. If the callback_fn returns 1, the helper will stop iterating and returns to the bpf program. Other return values are not used for now. Currently, this helper is only available with jit. It is possible to make it work with interpreter with so effort but I leave it as the future work. [1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	282a0f46d6	bpf: Change return value of verifier function add_subprog() Currently, verifier function add_subprog() returns 0 for success and negative value for failure. Change the return value to be the subprog number for success. This functionality will be used in the next patch to save a call to find_subprog(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204924.3884848-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	1435137573	bpf: Refactor check_func_call() to allow callback function Later proposed bpf_for_each_map_elem() helper has callback function as one of its arguments. This patch refactored check_func_call() to permit callback function which sets callee state. Different callback functions may have different callee states. There is no functionality change for this patch. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204923.3884627-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	bc2591d63f	bpf: Factor out verbose_invalid_scalar() Factor out the function verbose_invalid_scalar() to verbose print if a scalar is not in a tnum range. There is no functionality change and the function will be used by later patch which introduced bpf_for_each_map_elem(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204922.3884375-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Yonghong Song	efdb22de7d	bpf: Factor out visit_func_call_insn() in check_cfg() During verifier check_cfg(), all instructions are visited to ensure verifier can handle program control flows. This patch factored out function visit_func_call_insn() so it can be reused in later patch to visit callback function calls. There is no functionality change for this patch. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204920.3884136-1-yhs@fb.com	2021-02-26 13:23:52 -08:00
Ilya Leoshkevich	86fd166575	selftests/bpf: Copy extras in out-of-srctree builds Building selftests in a separate directory like this: make O="$BUILD" -C tools/testing/selftests/bpf and then running: cd "$BUILD" && ./test_progs -t btf causes all the non-flavored btf_dump_test_case_*.c tests to fail, because these files are not copied to where test_progs expects to find them. Fix by not skipping EXT-COPY when the original $(OUTPUT) is not empty (lib.mk sets it to $(shell pwd) in that case) and using rsync instead of cp: cp fails because e.g. urandom_read is being copied into itself, and rsync simply skips such cases. rsync is already used by kselftests and therefore is not a new dependency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210224111445.102342-1-iii@linux.ibm.com	2021-02-26 13:18:44 -08:00
KP Singh	2854436612	selftests/bpf: Propagate error code of the command to vmtest.sh When vmtest.sh ran a command in a VM, it did not record or propagate the error code of the command. This made the script less "script-able". The script now saves the error code of the said command in a file in the VM, copies the file back to the host and (when available) uses this error code instead of its own. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210225161947.1778590-1-kpsingh@kernel.org	2021-02-26 13:12:52 -08:00
Alexei Starovoitov	1e0ab70778	Merge branch 'sock_map: clean up and refactor code for BPF_SK_SKB_VERDICT' Cong Wang says: ==================== From: Cong Wang <cong.wang@bytedance.com> This patchset is the first series of patches separated out from the original large patchset, to make reviews easier. This patchset does not add any new feature or change any functionality but merely cleans up the existing sockmap and skmsg code and refactors it, to prepare for the patches followed up. This passed all BPF selftests. To see the big picture, the original whole patchset is available on github: https://github.com/congwang/linux/tree/sockmap and this patchset is also available on github: https://github.com/congwang/linux/tree/sockmap1 --- v7: add 1 trivial cleanup patch define a mask for sk_redir fix CONFIG_BPF_SYSCALL in include/net/udp.h make sk_psock_done_strp() static move skb_bpf_redirect_clear() to sk_psock_backlog() v6: fix !CONFIG_INET case v5: improve CONFIG_BPF_SYSCALL dependency add 3 trivial cleanup patches v4: reuse skb dst instead of skb ext fix another Kconfig error v3: fix a few Kconfig compile errors remove an unused variable add a comment for bpf_convert_data_end_access() v2: split the original patchset compute data_end with bpf_convert_data_end_access() get rid of psock->bpf_running reduce the scope of CONFIG_BPF_STREAM_PARSER do not add CONFIG_BPF_SOCK_MAP ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-02-26 12:28:23 -08:00
Cong Wang	ff9614b81b	skmsg: Remove unused sk_psock_stop() declaration It is not defined or used anywhere. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-10-xiyou.wangcong@gmail.com	2021-02-26 12:28:04 -08:00
Cong Wang	5333423222	skmsg: Get rid of sk_psock_bpf_run() It is now nearly identical to bpf_prog_run_pin_on_cpu() and it has an unused parameter 'psock', so we can just get rid of it and call bpf_prog_run_pin_on_cpu() directly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-9-xiyou.wangcong@gmail.com	2021-02-26 12:28:04 -08:00
Cong Wang	cd81cefb1a	skmsg: Make __sk_psock_purge_ingress_msg() static It is only used within skmsg.c so can become static. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-8-xiyou.wangcong@gmail.com	2021-02-26 12:28:04 -08:00
Cong Wang	4675e234b9	sock_map: Make sock_map_prog_update() static It is only used within sock_map.c so can become static. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-7-xiyou.wangcong@gmail.com	2021-02-26 12:28:04 -08:00
Cong Wang	ae8b8332fb	sock_map: Rename skb_parser and skb_verdict These two eBPF programs are tied to BPF_SK_SKB_STREAM_PARSER and BPF_SK_SKB_STREAM_VERDICT, rename them to reflect the fact they are only used for TCP. And save the name 'skb_verdict' for general use later. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-6-xiyou.wangcong@gmail.com	2021-02-26 12:28:04 -08:00
Cong Wang	e3526bb92a	skmsg: Move sk_redir from TCP_SKB_CB to skb Currently TCP_SKB_CB() is hard-coded in skmsg code, it certainly does not work for any other non-TCP protocols. We can move them to skb ext, but it introduces a memory allocation on fast path. Fortunately, we only need to a word-size to store all the information, because the flags actually only contains 1 bit so can be just packed into the lowest bit of the "pointer", which is stored as unsigned long. Inside struct sk_buff, '_skb_refdst' can be reused because skb dst is no longer needed after ->sk_data_ready() so we can just drop it. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-5-xiyou.wangcong@gmail.com	2021-02-26 12:28:03 -08:00
Cong Wang	16137b09a6	bpf: Compute data_end dynamically with JIT code Currently, we compute ->data_end with a compile-time constant offset of skb. But as Jakub pointed out, we can actually compute it in eBPF JIT code at run-time, so that we can competely get rid of ->data_end. This is similar to skb_shinfo(skb) computation in bpf_convert_shinfo_access(). Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-4-xiyou.wangcong@gmail.com	2021-02-26 12:28:03 -08:00
Cong Wang	5a685cd94b	skmsg: Get rid of struct sk_psock_parser struct sk_psock_parser is embedded in sk_psock, it is unnecessary as skb verdict also uses ->saved_data_ready. We can simply fold these fields into sk_psock, and get rid of ->enabled. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-3-xiyou.wangcong@gmail.com	2021-02-26 12:28:03 -08:00
Cong Wang	887596095e	bpf: Clean up sockmap related Kconfigs As suggested by John, clean up sockmap related Kconfigs: Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name. Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-2-xiyou.wangcong@gmail.com	2021-02-26 12:28:03 -08:00
Hangbin Liu	a83586a7dd	bpf: Remove blank line in bpf helper description comment Commit `34b2021cc6` ("bpf: Add BPF-helper for MTU checking") added an extra blank line in bpf helper description. This will make bpf_helpers_doc.py stop building bpf_helper_defs.h immediately after bpf_check_mtu(), which will affect future added functions. Fixes: `34b2021cc6` ("bpf: Add BPF-helper for MTU checking") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-02-26 12:18:12 -08:00
Alexei Starovoitov	43c5026be7	Merge branch 'selftests/bpf: xsk improvements and new stats' Ciara Loftus says: ==================== This series attempts to improve the xsk selftest framework by: 1. making the default output less verbose 2. adding an optional verbose flag to both the test_xsk.sh script and xdpxceiver app. 3. renaming the debug option in the app to to 'dump-pkts' and add a flag to the test_xsk.sh script which enables the flag in the app. 4. changing how tests are launched - now they are launched from the xdpxceiver app instead of the script. Once the improvements are made, a new set of tests are added which test the xsk statistics. The output of the test script now looks like: ./test_xsk.sh PREREQUISITES: [ PASS ] 1..10 ok 1 PASS: SKB NOPOLL ok 2 PASS: SKB POLL ok 3 PASS: SKB NOPOLL Socket Teardown ok 4 PASS: SKB NOPOLL Bi-directional Sockets ok 5 PASS: SKB NOPOLL Stats ok 6 PASS: DRV NOPOLL ok 7 PASS: DRV POLL ok 8 PASS: DRV NOPOLL Socket Teardown ok 9 PASS: DRV NOPOLL Bi-directional Sockets ok 10 PASS: DRV NOPOLL Stats # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0 XSK KSELFTESTS: [ PASS ] v2->v3: * Rename dump-pkts to dump_pkts in test_xsk.sh * Add examples of flag usage to test_xsk.sh v1->v2: * Changed '-d' flag in the shell script to '-D' to be consistent with the xdpxceiver app. * Renamed debug mode to 'dump-pkts' which better reflects the behaviour. * Use libpf APIs instead of calls to ss for configuring xdp on the links * Remove mutex init & destroy for each stats test * Added a description for each of the new statistics tests * Distinguish between exiting due to initialisation failure vs test failure This series applies on commit `d310ec03a3` Ciara Loftus (3): selftests/bpf: expose and rename debug argument selftests/bpf: restructure xsk selftests selftests/bpf: introduce xsk statistics tests ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-02-26 12:08:49 -08:00
Ciara Loftus	b267e5a458	selftests/bpf: Introduce xsk statistics tests This commit introduces a range of tests to the xsk testsuite for validating xsk statistics. A new test type called 'stats' is added. Within it there are four sub-tests. Each test configures a scenario which should trigger the given error statistic. The test passes if the statistic is successfully incremented. The four statistics for which tests have been created are: 1. rx dropped Increase the UMEM frame headroom to a value which results in insufficient space in the rx buffer for both the packet and the headroom. 2. tx invalid Set the 'len' field of tx descriptors to an invalid value (umem frame size + 1). 3. rx ring full Reduce the size of the RX ring to a fraction of the fill ring size. 4. fill queue empty Do not populate the fill queue and then try to receive pkts. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210223162304.7450-5-ciara.loftus@intel.com	2021-02-26 12:08:49 -08:00
Ciara Loftus	d3e3bf5b4c	selftests/bpf: Restructure xsk selftests Prior to this commit individual xsk tests were launched from the shell script 'test_xsk.sh'. When adding a new test type, two new test configurations had to be added to this file - one for each of the supported XDP 'modes' (skb or drv). Should zero copy support be added to the xsk selftest framework in the future, three new test configurations would need to be added for each new test type. Each new test type also typically requires new CLI arguments for the xdpxceiver program. This commit aims to reduce the overhead of adding new tests, by launching the test configurations from within the xdpxceiver program itself, using simple loops. Every test is run every time the C program is executed. Many of the CLI arguments can be removed as a result. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210223162304.7450-4-ciara.loftus@intel.com	2021-02-26 12:08:48 -08:00
Ciara Loftus	d2b0dfd5d1	selftests/bpf: Expose and rename debug argument Launching xdpxceiver with -D enables what was formerly know as 'debug' mode. Rename this mode to 'dump-pkts' as it better describes the behavior enabled by the option. New usage: ./xdpxceiver .. -D or ./xdpxceiver .. --dump-pkts Also make it possible to pass this flag to the app via the test_xsk.sh shell script like so: ./test_xsk.sh -D Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210223162304.7450-3-ciara.loftus@intel.com	2021-02-26 12:08:48 -08:00
Magnus Karlsson	ecde60614d	selftest/bpf: Make xsk tests less verbose Make the xsk tests less verbose by only printing the essentials. Currently, it is hard to see if the tests passed or not due to all the printouts. Move the extra printouts to a verbose option, if further debugging is needed when a problem arises. To run the xsk tests with verbose output: ./test_xsk.sh -v Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210223162304.7450-2-ciara.loftus@intel.com	2021-02-26 12:08:48 -08:00
Brendan Jackman	e6ac593372	bpf: Rename fixup_bpf_calls and add some comments This function has become overloaded, it actually does lots of diverse things in a single pass. Rename it to avoid confusion, and add some concise commentary. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210217104509.2423183-1-jackmanb@google.com	2021-02-26 12:05:07 -08:00
Dmitrii Banshchikov	523a4cf491	bpf: Use MAX_BPF_FUNC_REG_ARGS macro Instead of using integer literal here and there use macro name for better context. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225202629.585485-1-me@ubique.spb.ru	2021-02-26 11:59:53 -08:00
Alexei Starovoitov	a7d24d9582	Merge branch 'bpf: enable task local storage for tracing' Song Liu says: ==================== This set enables task local storage for non-BPF_LSM programs. It is common for tracing BPF program to access per-task data. Currently, these data are stored in hash tables with pid as the key. In bcc/libbpftools [1], 9 out of 23 tools use such hash tables. However, hash table is not ideal for many use case. Task local storage provides better usability and performance for BPF programs. Please refer to 6/6 for some performance comparison of task local storage vs. hash table. Changes v5 => v6: 1. Add inc/dec bpf_task_storage_busy in bpf_local_storage_map_free(). Changes v4 => v5: 1. Fix build w/o CONFIG_NET. (kernel test robot) 2. Remove unnecessary check for !task_storage_ptr(). (Martin) 3. Small changes in commit logs. Changes v3 => v4: 1. Prevent deadlock from recursive calls of bpf_task_storage_[get\|delete]. (2/6 checks potential deadlock and fails over, 4/6 adds a selftest). Changes v2 => v3: 1. Make the selftest more robust. (Andrii) 2. Small changes with runqslower. (Andrii) 3. Shortern CC list to make it easy for vger. Changes v1 => v2: 1. Do not allocate task local storage when the task is being freed. 2. Revise the selftest and added a new test for a task being freed. 3. Minor changes in runqslower. ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-02-26 11:51:55 -08:00
Song Liu	ced47e30ab	bpf: runqslower: Use task local storage Replace hashtab with task local storage in runqslower. This improves the performance of these BPF programs. The following table summarizes average runtime of these programs, in nanoseconds: task-local hash-prealloc hash-no-prealloc handle__sched_wakeup 125 340 3124 handle__sched_wakeup_new 2812 1510 2998 handle__sched_switch 151 208 991 Note that, task local storage gives better performance than hashtab for handle__sched_wakeup and handle__sched_switch. On the other hand, for handle__sched_wakeup_new, task local storage is slower than hashtab with prealloc. This is because handle__sched_wakeup_new accesses the data for the first time, so it has to allocate the data for task local storage. Once the initial allocation is done, subsequent accesses, as those in handle__sched_wakeup, are much faster with task local storage. If we disable hashtab prealloc, task local storage is much faster for all 3 functions. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210225234319.336131-7-songliubraving@fb.com	2021-02-26 11:51:48 -08:00
Song Liu	4b0d2d4156	bpf: runqslower: Prefer using local vmlimux to generate vmlinux.h Update the Makefile to prefer using $(O)/vmlinux, $(KBUILD_OUTPUT)/vmlinux (for selftests) or ../../../vmlinux. These two files should have latest definitions for vmlinux.h. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210225234319.336131-6-songliubraving@fb.com	2021-02-26 11:51:48 -08:00
Song Liu	c540957a4d	selftests/bpf: Test deadlock from recursive bpf_task_storage_[get\|delete] Add a test with recursive bpf_task_storage_[get\|delete] from fentry programs on bpf_local_storage_lookup and bpf_local_storage_update. Without proper deadlock prevent mechanism, this test would cause deadlock. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210225234319.336131-5-songliubraving@fb.com	2021-02-26 11:51:48 -08:00
Song Liu	1f87dcf116	selftests/bpf: Add non-BPF_LSM test for task local storage Task local storage is enabled for tracing programs. Add two tests for task local storage without CONFIG_BPF_LSM. The first test stores a value in sys_enter and read it back in sys_exit. The second test checks whether the kernel allows allocating task local storage in exit_creds() (which it should not). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210225234319.336131-4-songliubraving@fb.com	2021-02-26 11:51:48 -08:00
Song Liu	bc235cdb42	bpf: Prevent deadlock from recursive bpf_task_storage_[get\|delete] BPF helpers bpf_task_storage_[get\|delete] could hold two locks: bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling these helpers from fentry/fexit programs on functions in bpf_*_storage.c may cause deadlock on either locks. Prevent such deadlock with a per cpu counter, bpf_task_storage_busy. We need this counter to be global, because the two locks here belong to two different objects: bpf_local_storage_map and bpf_local_storage. If we pick one of them as the owner of the counter, it is still possible to trigger deadlock on the other lock. For example, if bpf_local_storage_map owns the counters, it cannot prevent deadlock on bpf_local_storage->lock when two maps are used. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225234319.336131-3-songliubraving@fb.com	2021-02-26 11:51:48 -08:00
Song Liu	a10787e6d5	bpf: Enable task local storage for tracing programs To access per-task data, BPF programs usually creates a hash table with pid as the key. This is not ideal because: 1. The user need to estimate the proper size of the hash table, which may be inaccurate; 2. Big hash tables are slow; 3. To clean up the data properly during task terminations, the user need to write extra logic. Task local storage overcomes these issues and offers a better option for these per-task data. Task local storage is only available to BPF_LSM. Now enable it for tracing programs. Unlike LSM programs, tracing programs can be called in IRQ contexts. Helpers that access task local storage are updated to use raw_spin_lock_irqsave() instead of raw_spin_lock_bh(). Tracing programs can attach to functions on the task free path, e.g. exit_creds(). To avoid allocating task local storage after bpf_task_storage_free(). bpf_task_storage_get() is updated to not allocate new storage when the task is not refcounted (task->usage == 0). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225234319.336131-2-songliubraving@fb.com	2021-02-26 11:51:47 -08:00
Xuan Zhuo	9c8f21e6f8	xsk: Build skb by page (aka generic zerocopy xmit) This patch is used to construct skb based on page to save memory copy overhead. This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to directly construct skb. If this feature is not supported, it is still necessary to copy data to construct skb. ---------------- Performance Testing ------------ The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s <msg size> ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-6-alobakin@pm.me	2021-02-25 00:56:43 +01:00
Alexander Lobakin	3914d88f76	xsk: Respect device's headroom and tailroom on generic xmit path xsk_generic_xmit() allocates a new skb and then queues it for xmitting. The size of new skb's headroom is desc->len, so it comes to the driver/device with no reserved headroom and/or tailroom. Lots of drivers need some headroom (and sometimes tailroom) to prepend (and/or append) some headers or data, e.g. CPU tags, device-specific headers/descriptors (LSO, TLS etc.), and if case of no available space skb_cow_head() will reallocate the skb. Reallocations are unwanted on fast-path, especially when it comes to XDP, so generic XSK xmit should reserve the spaces declared in dev->needed_headroom and dev->needed tailroom to avoid them. Note on max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom)): Usually, output functions reserve LL_RESERVED_SPACE(dev), which consists of dev->hard_header_len + dev->needed_headroom, aligned by 16. However, on XSK xmit hard header is already here in the chunk, so hard_header_len is not needed. But it'd still be better to align data up to cacheline, while reserving no less than driver requests for headroom. NET_SKB_PAD here is to double-insure there will be no reallocations even when the driver advertises no needed_headroom, but in fact need it (not so rare case). Fixes: `35fcde7f8d` ("xsk: support for Tx") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-5-alobakin@pm.me	2021-02-25 00:56:34 +01:00
Xuan Zhuo	ab5bd583b9	virtio-net: Support IFF_TX_SKB_NO_LINEAR flag Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-4-alobakin@pm.me	2021-02-25 00:56:11 +01:00
Xuan Zhuo	c2ff53d804	net: Add priv_flags for allow tx skb without linear In some cases, we hope to construct skb directly based on the existing memory without copying data. In this case, the page will be placed directly in the skb, and the linear space of skb is empty. But unfortunately, many the network card does not support this operation. For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will get the following error message: mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb So a priv_flag is added here to indicate whether the network card supports this feature. Suggested-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-3-alobakin@pm.me	2021-02-25 00:56:02 +01:00
Alexander Lobakin	2463e07349	netdevice: Add missing IFF_PHONY_HEADROOM self-definition This is harmless for now, but can be fatal for future refactors. Fixes: `871b642ade` ("netdev: introduce ndo_set_rx_headroom") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-2-alobakin@pm.me	2021-02-25 00:55:54 +01:00
Grant Seltzer	b9fc8b4a59	bpf: Add kernel/modules BTF presence checks to bpftool feature command This adds both the CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES kernel compile option to output of the bpftool feature command. This is relevant for developers that want to account for data structure definition differences between kernels. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210222195846.155483-1-grantseltzer@gmail.com	2021-02-24 17:24:37 +01:00
Linus Torvalds	d310ec03a3	The performance event updates for v5.12 are: - Add CPU-PMU support for Intel Sapphire Rapids CPUs - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter sampling event feedback. Not used yet, but is intended for Golden Cove CPU-PMU, which can provide both the instruction latency and the cache latency information for memory profiling events. - Remove experimental, default-disabled perfmon-v4 counter_freezing support that could only be enabled via a boot option. The hardware is hopelessly broken, we'd like to make sure nobody starts relying on this, as it would only end in tears. - Fix energy/power events on Intel SPR platforms - Simplify the uprobes resume_execution() logic - Misc smaller fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtf7kRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iJ2xAAvygKF8hm/UAGyT2R3iEruO49wRrmUfgt 13iBBA1DotKw2b8F5UN5MqjfwS8UgGKuAd8agvQ6XXANpnJ5mpy0nrzgjEXUx4j+ sQUqL7vxSdZ5J3kKblSZ4QoMzLVYSUkEDmw818vsa4eFWN8z58FJsv+ySegIFbXx +I3hF1O9a8MERZBUz4T5xHlgcbSDGEX6EvYRcO+zZ0rXfARfo9StfHYv1V53j6iY EOotFEKEn/5naczAd/sQo1SE1IgHtX2cbjOaKF7LulgEwZQWHpdKq0gww6nFK5yz XMSE9oXAFXRkRCJbrSqC0Dvrrf8hdlxWbKYbj9L7XILoxw199AdOBDbliJm6P/UH 6+JSEu/N4R0TFYc7TX6yef7ncw12e+64USjKOlWWwww97rVWWH1/tFTdlXhS6s+d jVI3yEECKyZlddrDdsetRdUj+QKyZQfDqbMXPXiDTv9P6AFqBvNLZYT0UPU3akk5 jXueHJQYSSgqnN+eRaIwvm4ZYWa031YHJXxiq2E89RnzL4JJArBYaddpukgxTYka c6Tn8L7f4zP5Bghu7hHv5Vy69i1N/3YvzUoYc6ljjmapgAJzxzq/yoEKrBlKnjtA MrstHhnwnPJl+PKjlbLpjl74rtcCiKJxjVhm+a5UbEcYoVuzJ86lmQK2WrLaoCTU B/zFplUF8C4= =BCcg -----END PGP SIGNATURE----- Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance event updates from Ingo Molnar: - Add CPU-PMU support for Intel Sapphire Rapids CPUs - Extend the perf ABI with PERF_SAMPLE_WEIGHT_STRUCT, to offer two-parameter sampling event feedback. Not used yet, but is intended for Golden Cove CPU-PMU, which can provide both the instruction latency and the cache latency information for memory profiling events. - Remove experimental, default-disabled perfmon-v4 counter_freezing support that could only be enabled via a boot option. The hardware is hopelessly broken, we'd like to make sure nobody starts relying on this, as it would only end in tears. - Fix energy/power events on Intel SPR platforms - Simplify the uprobes resume_execution() logic - Misc smaller fixes. * tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/rapl: Fix psys-energy event on Intel SPR platform perf/x86/rapl: Only check lower 32bits for RAPL energy counters perf/x86/rapl: Add msr mask support perf/x86/kvm: Add Cascade Lake Xeon steppings to isolation_ucodes[] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters perf/x86/intel: Add perf core PMU support for Sapphire Rapids perf/x86/intel: Filter unsupported Topdown metrics event perf/x86/intel: Factor out intel_update_topdown_event() perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT perf/intel: Remove Perfmon-v4 counter_freezing support x86/perf: Use static_call for x86_pmu.guest_get_msrs perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info perf/x86/intel/uncore: Store the logical die id instead of the physical die id. x86/kprobes: Do not decode opcode in resume_execution()	2021-02-21 12:49:32 -08:00
Linus Torvalds	657bd90c93	Scheduler updates for v5.12: [ NOTE: unfortunately this tree had to be freshly rebased today, it's a same-content tree of 82891be90f3c (-next published) merged with v5.11. The main reason for the rebase was an authorship misattribution problem with a new commit, which we noticed in the last minute, and which we didn't want to be merged upstream. The offending commit was deep in the tree, and dependent commits had to be rebased as well. ] - Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values - Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove USER_PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtHBsRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1itgg/+NGed12pgPjYBzesdou60Lvx7LZLGjfOt M1F1EnmQGn/hEH2fCY6ZoqIZQTVltm7GIcBNabzYTzlaHZsdtyuDUJBZyj19vTlk zekcj7WVt+qvfjChaNwEJhQ9nnOM/eohMgEOHMAAJd9zlnQvve7NOLQ56UDM+kn/ 9taFJ5ZPvb4avP6C5p3KivvKex6Bjof/Tl0m3utpNyPpI/qK3FyGxwdgCxU0yepT ABWQX5ZQCufFvo1bgnBPfqyzab4MqhoM3bNKBsLQfuAlssG1xRv4KQOev4dRwrt9 pXJikV5C9yez5d2lGe5p0ltH5IZS/l9x2yI/ZQj3OUDTFyV1ic6WfFAqJgDzVF8E i/vvA4NPQiI241Bkps+ErcCw4aVOgiY6TWli74cHjLUIX0+As6aHrFWXGSxUmiHB WR+B8KmdfzRTTlhOxMA+cvlpZcKCfxWkJJmXzr/lDZzIuKPqM3QCE2wD9sixkfVo JNICT0IvZghWOdbMEfZba8Psh/e2LVI9RzdpEiuYJz1ZrVlt1hO0M6jBxY0hMz9n k54z81xODw0a8P2FHMtpmB1vhAeqCmvwA6DO8z0Oxs0DFi+KM2bLf2efHsCKafI+ Bm5v9YFaOk/55R76hJVh+aYLlyFgFkKd+P/niJTPDnxOk3SqJuXvTrql1HeGHkNr kYgQa23dsZk= =pyaG -----END PGP SIGNATURE----- Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove USER_PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...	2021-02-21 12:35:04 -08:00
Linus Torvalds	7b15c27e2f	These changes fix MM (soft-)dirty bit management in the procfs code & clean up the API. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAtAgsRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gOnA/9GKJblgi88Qb23YwGKp0OfCMdLfx8FJa+ dq0AB0jgzc8v2J8IITSs/qo/8o25IE9IPTjTfItn0E0jxz7Y8J16urb/vyWX6O2s jb4riT5fIRCXvhv9DooxSQerZePaOJXbHYa2BBk8yqNJPGbd5kr0SUGn3BQnBQhR 0yAfqjrzBLMGzzSO+kK0nhGQH8BJZgYu94CHNnUZJtWcIb2ZC6lzZ7Lz0zi6ueRJ 81JblV4NCOC9uy9I9odOwESu2TIxT9afq1C/6COyrKYx3sWY6xPOGQTxYZe1LITE lb57/95qc7SOIj7Y3aL4YRSVRYRihEU31qlAltwP4fEnz49qdHJOR1HQmjKVG8xs Uaa6kCYFeTKmh4SRRr8ZR/hUkebrFUT+9+db6LmBs/i4Kt09T+ZurXC4jqmUHMFn 2nYCDH6RX153V1YwcHGkr4OWaUVWZwAZl+t0zIo7o7wQdkoAD75ydecW2R3nLMN7 p1ofGPXmT8Wh4en8LngBawO/4bBuunezh4L3vpz0/EU3viK5+DRsyNKf+d+Tti28 XCe7ID0GDGq7nIzSZxuyIxmAbWJxjI+7gWT2WUudrJxJ2rUUxPQms8GsQD54IMh5 UILv9GMBNuV8iA/2c3B52ff5iFl7kp+SxVS3MRC6zTudIV9VV6bb7WpFb8FLOhsH 3sEo0qDFab4= =qcXO -----END PGP SIGNATURE----- Merge tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull tlb gather updates from Ingo Molnar: "Theses fix MM (soft-)dirty bit management in the procfs code & clean up the TLB gather API" * tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables tlb: arch: Remove empty __tlb_remove_tlb_entry() stubs tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() mm: proc: Invalidate TLB after clearing soft-dirty page state	2021-02-21 12:19:56 -08:00
Linus Torvalds	9eef023345	These are the v5.12 updates for the locking subsystem: - Core locking primitives updates: - Remove mutex_trylock_recursive() from the API - no users left - Simplify + constify the futex code a bit - Lockdep updates: - Teach lockdep about local_lock_t - Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for potentially unsafe IRQ mask restoration patterns. (I.e. calling raw_local_irq_restore() with IRQs enabled.) - Add wait context self-tests - Fix graph lock corner case corrupting internal data structures - Fix noinstr annotations - LKMM updates: - Simplify the litmus tests - Documentation fixes - KCSAN updates: - Re-enable KCSAN instrumentation in lib/random32.c - Misc fixes: - Don't branch-trace static label APIs - DocBook fix - Remove stale leftover empty file Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs//sRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1im+g/9G8taVrfiBQ7hg4PoEo28w8fzu5pGBOWd rYzUNJO96dW262FbQE6txGDBeGEahnVTz1sGwqKcy1NfZgQBCWj4uZMOluyECrY3 SV8Iccz2+M6CV+pyjM6Agm7OrgEHxlB/oorZy3TD6s2YeuR6nVGfO3vAXbNNeAsk N8TR5mKY8ELbKXkjrc4KauOOiaqsQmVMuV/l/1DLoydDxATYq4Fczh0lcIdwMtYB pqzWAKa0Qy2mKcHXe2YMYjddn2JEcDWNGJCsmZTa6m45aaAW1XyICLLxcQ2X8aL+ aj9rxYTBkZl9vAjrICfbJTtYku6fN48JiDoNRQxUShGVmVKAlHxYQ4vZ7dJz0NHz EdRrd9JIr25ImXNHlX2KCKGc/aUm4TvDtNVXCdxVlZGwnEEF8J5VocWKRKmXmA1W MkAvPnXnynqRfcMkFaTtTfdMTan41uEixwEnUy++JTuNSMx2ie3VGMC0MgxvTBiH iKN5iVtZVa1mUN2593Jd1qdZvGQMeIydMj+WaT4xh5hptjLCGLg4yPgYuoO7vNMT uEfv8oODvTN8BqEixNP1Ef9pzxujuSiPoO4ZO4DNnbJJZVw1TwAZIK5Zz1wR1Zso Wf1LKPaEOyqz5cFAJ/OxcnxvxMv3fat0vhLNzJlBEFEgKmfRhbsQVUNNL1AcdMJA +Npbj/v5seo= =BYju -----END PGP SIGNATURE----- Merge tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Core locking primitives updates: - Remove mutex_trylock_recursive() from the API - no users left - Simplify + constify the futex code a bit Lockdep updates: - Teach lockdep about local_lock_t - Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for potentially unsafe IRQ mask restoration patterns. (I.e. calling raw_local_irq_restore() with IRQs enabled.) - Add wait context self-tests - Fix graph lock corner case corrupting internal data structures - Fix noinstr annotations LKMM updates: - Simplify the litmus tests - Documentation fixes KCSAN updates: - Re-enable KCSAN instrumentation in lib/random32.c Misc fixes: - Don't branch-trace static label APIs - DocBook fix - Remove stale leftover empty file" * tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) checkpatch: Don't check for mutex_trylock_recursive() locking/mutex: Kill mutex_trylock_recursive() s390: Use arch_local_irq_{save,restore}() in early boot code lockdep: Noinstr annotate warn_bogus_irq_restore() locking/lockdep: Avoid unmatched unlock locking/rwsem: Remove empty rwsem.h locking/rtmutex: Add missing kernel-doc markup futex: Remove unneeded gotos futex: Change utime parameter to be 'const ... *' lockdep: report broken irq restoration jump_label: Do not profile branch annotations locking: Add Reviewers locking/selftests: Add local_lock inversion tests locking/lockdep: Exclude local_lock_t from IRQ inversions locking/lockdep: Clean up check_redundant() a bit locking/lockdep: Add a skip() function to __bfs() locking/lockdep: Mark local_lock_t locking/selftests: More granular debug_locks_verbose lockdep/selftest: Add wait context selftests tools/memory-model: Fix typo in klitmus7 compatibility table ...	2021-02-21 12:12:01 -08:00
Linus Torvalds	d089f48fba	These are the latest RCU updates for v5.12: - Documentation updates. - Miscellaneous fixes. - kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return addresses to more easily locate bugs. This has a couple of RCU-related commits, but is mostly MM. Was pulled in with akpm's agreement. - Per-callback-batch tracking of numbers of callbacks, which enables better debugging information and smarter reactions to large numbers of callbacks. - The first round of changes to allow CPUs to be runtime switched from and to callback-offloaded state. - CONFIG_PREEMPT_RT-related changes. - RCU CPU stall warning updates. - Addition of polling grace-period APIs for SRCU. - Torture-test and torture-test scripting updates, including a "torture everything" script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale. Plus does an allmodconfig build. - nolibc fixes for the torture tests Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmAs9lgRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j/axAAsqIvarDD6OLmgcOPCyWSvfG6LsIFgqI9 CY0JdQBtvFBTvE8Q2No5ktbLVmuYsBh0dGeFkv4HQZJyRlr7mjstVMNN4SeBDVIS /+zZO1wlwzaXfKQopLctTK1O/UzFqIN2sqyzA3nLzGGj8DqgxXJyreJ10feK5XM+ 6ttZPd1qm4hqtpA22ZEODbct5OFqZuvnK8VNqBb2YHabA1rasUXbIEJPBpsuv/W2 l9W5AGP4erdOFm3nHJxiCpvLJtgHy4njvw0HJp5f99Abj6OVeAzw5kFjvRB3n1Qd ayKyTw8T/1mfmkjvYkGsMAqhEmqwXcryFX0dR/14/XPdXyjPhZlbkz+MfRKrn4NT LBJPX+MlX9lVFWBNR9HMe2o/083+gorlwZt9wtyt0OBBGGgudYo4uKNdbyy6tB3Y Gb98P2vtVSO24EsQce6M+ppHN4TgVBd6id82MQxNuFw+PQJdBiCY0JJfNQApbAry cIKOchSSR2SkJHlAevNVaKAeiTnkAXd1jDBKtCnvCqOUyvtnhE3rQCqwS5xT2Cno oQydpudwBKT7uO/GUyS0ESErjHuy9zhExNSYD0ydxlBCrGbzrrgPg57ntXHA1die mtFyvc2tfT/AshWRNYiuCG+eaUG3qK7n7jN7Vc6/K5DR4GMb5tOhL9wPx2ljCRGu Z8WDg0pJGz4= =31yj -----END PGP SIGNATURE----- Merge tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "These are the latest RCU updates for v5.12: - Documentation updates. - Miscellaneous fixes. - kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return addresses to more easily locate bugs. This has a couple of RCU-related commits, but is mostly MM. Was pulled in with akpm's agreement. - Per-callback-batch tracking of numbers of callbacks, which enables better debugging information and smarter reactions to large numbers of callbacks. - The first round of changes to allow CPUs to be runtime switched from and to callback-offloaded state. - CONFIG_PREEMPT_RT-related changes. - RCU CPU stall warning updates. - Addition of polling grace-period APIs for SRCU. - Torture-test and torture-test scripting updates, including a "torture everything" script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale. Plus does an allmodconfig build. - nolibc fixes for the torture tests" * tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits) percpu_ref: Dump mem_dump_obj() info upon reference-count underflow rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback mm: Make mem_obj_dump() vmalloc() dumps include start and length mm: Make mem_dump_obj() handle vmalloc() memory mm: Make mem_dump_obj() handle NULL and zero-sized pointers mm: Add mem_dump_obj() to print source of memory block tools/rcutorture: Fix position of -lgcc in mkinitrd.sh tools/nolibc: Fix position of -lgcc in the documented example tools/nolibc: Emit detailed error for missing alternate syscall number definitions tools/nolibc: Remove incorrect definitions of __ARCH_WANT_* tools/nolibc: Get timeval, timespec and timezone from linux/time.h tools/nolibc: Implement poll() based on ppoll() tools/nolibc: Implement fork() based on clone() tools/nolibc: Make getpgrp() fall back to getpgid(0) tools/nolibc: Make dup2() rely on dup3() when available tools/nolibc: Add the definition for dup() rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01 torture: Maintain torture-specific set of CPUs-online books torture: Clean up after torture-test CPU hotplugging rcutorture: Make object_debug also double call_rcu() heap object ...	2021-02-21 12:04:41 -08:00

1 2 3 4 5 ...

988834 commits