linux-stable/kernel
Daniel Borkmann 6c1dc8f96b bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
[ Upstream commit fdadd04931 ]

Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-25 10:51:50 +02:00
..
bpf bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K 2019-08-25 10:51:50 +02:00
configs config: android: enable CONFIG_SECCOMP 2016-10-11 15:06:32 -07:00
debug kdb: use memmove instead of overlapping memcpy 2018-12-08 13:05:05 +01:00
events perf/core: Fix creating kernel counters for PMUs that override event->cpu 2019-08-25 10:51:35 +02:00
gcov gcov: support GCC 7.1 2017-09-02 07:07:53 +02:00
irq genirq: Prevent use-after-free and work list corruption 2019-05-10 17:52:10 +02:00
livepatch livepatch/module: make TAINT_LIVEPATCH module-specific 2016-08-26 14:42:08 +02:00
locking locking/lockdep: Hide unused 'class' variable 2019-08-04 09:33:42 +02:00
power x86/power: Fix 'nosmt' vs hibernation triple fault during resume 2019-06-11 12:22:48 +02:00
printk printk: Fix panic caused by passing log_buf_len to command line 2018-11-13 11:17:01 -08:00
rcu rcuperf: Fix cleanup path for invalid perf_type strings 2019-05-31 06:48:30 -07:00
sched sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:33:45 +02:00
time timer_list: Guard procfs specific code 2019-08-04 09:33:21 +02:00
trace ftrace: Enable trampoline when rec count returns back to one 2019-08-06 18:29:35 +02:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:29:51 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-17 13:21:18 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:50:49 +02:00
audit.h Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit 2016-07-29 17:54:17 -07:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:36:37 +02:00
auditfilter.c audit: fix a memory leak bug 2019-05-31 06:48:20 -07:00
auditsc.c audit: allow not equal op for audit by executable 2018-08-03 07:55:25 +02:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-13 11:16:57 -08:00
capability.c ptrace: Capture the ptracer's creds not PT_PTRACE_CAP 2017-01-06 10:40:13 +01:00
cgroup.c cgroup: Fix deadlock in cpu hotplug path 2018-10-13 09:18:56 +02:00
cgroup_freezer.c
cgroup_pids.c cgroup/pids: remove spurious suspicious RCU usage warning 2017-03-26 13:05:58 +02:00
compat.c
configs.c
context_tracking.c
cpu.c cpu/speculation: Warn on unsupported mitigations= parameter 2019-07-10 09:55:39 +02:00
cpu_pm.c
cpuset.c sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:51:25 +02:00
crash_dump.c
cred.c access: avoid the RCU grace period for the temporary subjective credentials 2019-08-04 09:33:43 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c kernel/exit.c: release ptraced tasks before zap_pid_ns_processes 2019-02-06 17:33:29 +01:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:42:21 +02:00
fork.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-08-04 09:33:45 +02:00
freezer.c freezer, oom: check TIF_MEMDIE on the correct task 2016-07-28 16:07:41 -07:00
futex.c futex: Ensure that futex address is aligned in handle_futex_death() 2019-03-27 14:13:03 +09:00
futex_compat.c
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:29:52 +01:00
hung_task.c kernel: hung_task.c: disable on suspend 2019-04-20 09:07:52 +02:00
irq_work.c
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-14 09:28:24 +01:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: ensure irq code sees a valid area 2018-08-03 07:55:12 +02:00
kexec.c kexec: allow architectures to override boot mapping 2016-08-02 19:35:27 -04:00
kexec_core.c objtool, x86: Add several functions and files to the objtool whitelist 2018-06-05 10:28:57 +02:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-01 17:52:01 -07:00
kexec_internal.h
kmod.c
kprobes.c kprobes: Fix error check when reusing optimized probes 2019-04-27 09:34:45 +02:00
ksysfs.c kexec: add a kexec_crash_loaded() function 2016-08-02 19:35:30 -04:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-08-03 07:55:12 +02:00
latencytop.c
Makefile x86/uaccess, kcov: Disable stack protector 2019-06-22 08:17:19 +02:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:41:45 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:03:51 +01:00
module-internal.h
module.c kernel/module.c: Only return -EEXIST for modules that have finished loading 2019-08-06 18:29:35 +02:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-08-04 09:33:28 +02:00
panic.c panic: avoid deadlocks in re-entrant console drivers 2018-12-29 13:40:16 +01:00
params.c
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2018-04-13 19:47:53 +02:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-08-04 09:33:16 +02:00
profile.c profile: Convert to hotplug state machine 2016-07-15 10:41:42 +02:00
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:55:45 +02:00
range.c
reboot.c
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:50:29 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:34:09 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-22 16:58:02 +02:00
signal.c kernel/signal.c: trace_signal_deliver when signal_group_exit 2019-06-11 12:22:43 +02:00
smp.c cpu/hotplug: Fix SMT supported evaluation 2018-08-15 18:14:53 +02:00
smpboot.c kthread/smpboot: do not park in kthread_create_on_cpu() 2016-10-11 15:06:33 -07:00
smpboot.h
softirq.c Mark HI and TASKLET softirq synchronous 2018-08-15 18:14:42 +02:00
stacktrace.c stacktrace, lockdep: Fix address, newline ugliness 2017-02-14 15:25:42 -08:00
stop_machine.c stop_machine: Use raw spinlocks 2018-08-03 07:55:24 +02:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-22 08:17:13 +02:00
sys_ni.c x86/pkeys: Fix pkeys build breakage for some non-x86 arches 2016-09-13 14:41:36 +02:00
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-22 08:17:11 +02:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_works 2016-08-02 19:35:02 -04:00
taskstats.c taskstats: fix the length of cgroupstats_cmd_get_policy 2016-11-03 16:55:58 -04:00
test_kprobes.c
torture.c torture: Convert torture_shutdown() to hrtimer 2016-08-22 10:01:49 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-09 09:50:20 +02:00
tsacct.c
ucount.c kernel/ucount.c: mark user_header with kmemleak_ignore() 2017-06-17 06:41:51 +02:00
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:29:52 +01:00
up.c smp: Add function to execute a function synchronously on a CPU 2016-09-05 13:52:39 +02:00
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 20:01:24 +02:00
utsname.c Merge branch 'nsfs-ioctls' into HEAD 2016-09-22 20:00:36 -05:00
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:01:24 +02:00
watchdog.c kernel/watchdog: prevent false hardlockup on overloaded system 2017-06-17 06:41:57 +02:00
watchdog_hld.c kernel/watchdog: prevent false hardlockup on overloaded system 2017-06-17 06:41:57 +02:00
workqueue.c workqueue: use put_device() instead of kfree() 2018-05-30 07:50:36 +02:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 15:53:17 +01:00