linux-stable/kernel/bpf
Daniel Borkmann 42049e65d3 bpf: Adjust insufficient default bpf_jit_limit
[ Upstream commit 10ec8ca8ec ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

  After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
  clusters after a few days a number of the nodes have containers stuck
  in ContainerCreating state or liveness/readiness probes reporting the
  following error:

    Readiness probe errored: rpc error: code = Unknown desc = failed to
    exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
    OCI runtime exec failed: exec failed: unable to start container process:
    unable to init seccomp: error loading seccomp filter into kernel:
    error loading seccomp filter: errno 524: unknown

  However, we had not been seeing this issue on previous AMIs and it only
  started to occur on v20230217 (following the upgrade from kernel 5.4 to
  5.10) with no other changes to the underlying cluster or workloads.

  We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
  which helped to immediately allow containers to be created and probes to
  execute but after approximately a day the issue returned and the value
  returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
  was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:15:34 +02:00
..
arraymap.c bpf: decouple btf from seq bpf fs dump and enable more maps 2018-08-13 00:52:45 +02:00
bpf_lru_list.c bpf_lru_list: Read double-checked variable once without lock 2021-03-04 09:39:35 +01:00
bpf_lru_list.h
btf.c bpf: btf: fix truncated last_member_type_id in btf_struct_resolve 2022-10-26 13:19:25 +02:00
cgroup.c bpf: introduce update_effective_progs() 2018-08-07 14:29:55 +02:00
core.c bpf: Adjust insufficient default bpf_jit_limit 2023-04-05 11:15:34 +02:00
cpumap.c cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabled 2020-05-02 17:25:53 +02:00
devmap.c bpf: devmap: fix wrong interface selection in notifier_call 2019-12-01 09:17:01 +01:00
disasm.c bpf: Introduce BPF nospec instruction for mitigating Spectre v4 2021-09-22 11:47:58 +02:00
disasm.h bpf: Remove struct bpf_verifier_env argument from print_bpf_insn 2018-03-23 17:38:57 +01:00
hashtab.c bpf: Remove recursion prevention from rcu free callback 2020-10-01 13:14:34 +02:00
helpers.c bpf: introduce the bpf_get_local_storage() helper function 2018-08-03 00:47:32 +02:00
inode.c bpf: Fix a rcu warning for bpffs map pretty-print 2020-10-01 13:14:52 +02:00
local_storage.c bpf: allocate local storage buffers using GFP_ATOMIC 2018-12-17 09:24:33 +01:00
lpm_trie.c bpf: lpm_trie: check left child of last leftmost node for NULL 2019-07-03 13:14:48 +02:00
Makefile bpf: silence warning messages in core 2019-07-26 09:14:06 +02:00
map_in_map.c bpf: fix inner map masking to prevent oob under speculation 2019-01-31 08:14:41 +01:00
map_in_map.h
offload.c bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-28 16:38:59 +01:00
percpu_freelist.c bpf: fix lockdep false positive in percpu_freelist 2019-03-13 14:02:36 -07:00
percpu_freelist.h bpf: fix lockdep false positive in percpu_freelist 2019-03-13 14:02:36 -07:00
reuseport_array.c bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY 2018-08-11 01:58:46 +02:00
sockmap.c bpf: sockmap, fix transition through disconnect without close 2018-09-22 02:46:41 +02:00
stackmap.c bpf: Fix integer overflow in prealloc_elems_and_freelist() 2021-10-13 10:10:51 +02:00
syscall.c bpf: Ensure correct locking around vulnerable function find_vpid() 2022-10-26 13:19:25 +02:00
tnum.c bpf: Fix incorrect verifier simulation of ARSH under ALU32 2020-01-23 08:21:32 +01:00
verifier.c bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation 2023-02-06 07:49:38 +01:00
xskmap.c xsk: do not call synchronize_net() under RCU read lock 2018-10-11 10:19:01 +02:00