linux-stable/kernel
Daniel Borkmann 374ed03630 bpf: Adjust insufficient default bpf_jit_limit
[ Upstream commit 10ec8ca8ec ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

  After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
  clusters after a few days a number of the nodes have containers stuck
  in ContainerCreating state or liveness/readiness probes reporting the
  following error:

    Readiness probe errored: rpc error: code = Unknown desc = failed to
    exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
    OCI runtime exec failed: exec failed: unable to start container process:
    unable to init seccomp: error loading seccomp filter into kernel:
    error loading seccomp filter: errno 524: unknown

  However, we had not been seeing this issue on previous AMIs and it only
  started to occur on v20230217 (following the upgrade from kernel 5.4 to
  5.10) with no other changes to the underlying cluster or workloads.

  We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
  which helped to immediately allow containers to be created and probes to
  execute but after approximately a day the issue returned and the value
  returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
  was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:14:15 +02:00
..
bpf bpf: Adjust insufficient default bpf_jit_limit 2023-04-05 11:14:15 +02:00
cgroup memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:26:13 +01:00
configs
debug kdb: Make memory allocations more robust 2021-03-03 18:22:36 +01:00
events perf: Fix possible memleak in pmu_dev_alloc() 2023-01-18 09:26:09 +01:00
gcov gcov: add support for checksum field 2023-01-18 09:26:35 +01:00
irq irqdomain: Drop bogus fwspec-mapping error handling 2023-03-11 16:26:47 +01:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:16:57 +02:00
locking locking/lockdep: Avoid RCU-induced noinstr fail 2021-11-26 11:40:26 +01:00
power PM: hibernate: Fix mistake in kerneldoc comment 2023-01-18 09:26:09 +01:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-20 09:08:15 +02:00
rcu rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait() 2023-03-11 16:26:41 +01:00
sched panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:46:34 +01:00
time timers: Prevent union confusion from unexpected restart_syscall() 2023-03-11 16:26:41 +01:00
trace ftrace: Fix invalid address access in lookup_rec() when index is 0 2023-03-22 13:26:16 +01:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-18 09:26:30 +01:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 11:57:33 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:16:28 +01:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:25:02 +02:00
audit_tree.c
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:12:33 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-20 10:25:10 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:14:03 +01:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:15:08 -08:00
capability.c
compat.c make 'user_access_begin()' do 'access_ok()' 2020-06-20 10:24:58 +02:00
configs.c
context_tracking.c
cpu.c random: clear fast pool, crng, and batches in cpuhp bring up 2022-06-25 11:46:35 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-20 10:25:19 +02:00
crash_core.c
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:17:54 +01:00
delayacct.c delayacct: Use raw_spinlocks 2018-08-03 07:50:38 +02:00
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:46:35 +01:00
extable.c
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-22 11:45:32 +02:00
freezer.c
futex.c mm, futex: fix shared futex pgoff on shmem huge page 2021-07-11 12:48:12 +02:00
groups.c
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:15:05 +02:00
irq_work.c
jump_label.c sched/core: Fix cpu.max vs. cpuhotplug deadlock 2018-12-05 19:41:17 +01:00
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-09-21 07:15:38 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:50:22 +02:00
kexec.c
kexec_core.c kexec: Allocate decrypted control pages for kdump if SME is enabled 2019-11-24 08:23:15 +01:00
kexec_file.c kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] 2022-07-02 16:18:11 +02:00
kexec_internal.h
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-24 08:00:44 +02:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-03-11 16:26:46 +01:00
ksysfs.c
kthread.c kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() 2021-07-11 12:48:13 +02:00
latencytop.c
Makefile elfcore: fix building with clang 2021-02-10 09:12:08 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:01:02 +01:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-06 07:46:31 +01:00
module_signing.c
notifier.c x86/mm: split vmalloc_sync_all() 2020-04-02 16:34:20 +02:00
nsproxy.c
padata.c padata: purge get_cpu and reorder_via_wq from padata_do_serial 2020-05-27 16:43:05 +02:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:46:35 +01:00
params.c
pid.c
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 11:45:32 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:11:24 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 16:53:43 +02:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-18 18:28:02 +01:00
relay.c kernel/relay.c: fix memleak on destroy relay channel 2020-08-26 10:29:54 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:36:22 +02:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:44:52 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 20:42:47 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:08:33 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 14:47:41 +01:00
smpboot.h
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:12:47 +02:00
stacktrace.c
stop_machine.c stop_machine: Atomically queue and wake stopper threads 2018-09-05 09:26:36 +02:00
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:05:18 +01:00
sys_ni.c
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-08 11:16:33 +01:00
sysctl_binary.c
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:17:53 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Do not fail unregistering a probe due to memory failure 2021-03-03 18:22:47 +01:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:57:34 +01:00
ucount.c
uid16.c
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 09:51:10 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-22 10:57:35 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 19:56:00 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 19:56:00 +02:00
watchdog.c watchdog/softlockup: Enforce that timestamp is valid on boot 2020-02-28 16:36:05 +01:00
watchdog_hld.c watchdog: Mark watchdog touch functions as notrace 2018-09-05 09:26:42 +02:00
workqueue.c workqueue: fix UAF in pwq_unbound_release_workfn() 2021-08-04 12:22:14 +02:00
workqueue_internal.h