linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2024-11-01 08:58:07 +00:00

History

Daniel Borkmann ff40e51043 bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks Commit `59438b4647` ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -------------------------+ trace_sched_switch() -> \| bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> \| audit_log_end() -> \| wake_up_interruptible() -> \| try_to_wake_up() -> \| rq_lock(rq) --------------+ What's worse is that the intention of `59438b4647` to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this: allow <who> <who> : lockdown { <reason> }; However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_. Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second. The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on `59438b4647` where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as `9d1f8be5cf` ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode"). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Fixes: `59438b4647` ("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Jakub Hrozek <jhrozek@redhat.com> Reported-by: Serhei Makarov <smakarov@redhat.com> Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jamorris@linux.microsoft.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Frank Eigler <fche@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net		2021-06-02 21:59:22 +02:00
..
preload	bpf: Fix umd memory leak in copy_process()	2021-03-19 22:23:19 +01:00
arraymap.c	bpf: Add batched ops support for percpu array	2021-04-28 01:17:45 +02:00
bpf_inode_storage.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-03-25 15:31:22 -07:00
bpf_iter.c	bpf: Add bpf_for_each_map_elem() helper	2021-02-26 13:23:52 -08:00
bpf_local_storage.c	bpf: Prevent deadlock from recursive bpf_task_storage_[get\|delete]	2021-02-26 11:51:48 -08:00
bpf_lru_list.c	bpf_lru_list: Read double-checked variable once without lock	2021-02-10 15:54:26 -08:00
bpf_lru_list.h	bpf: Fix a typo "inacitve" -> "inactive"	2020-04-06 21:54:10 +02:00
bpf_lsm.c	bpf: Fix BPF_LSM kconfig symbol dependency	2021-05-25 21:16:23 +02:00
bpf_struct_ops.c	bpf: Fix fexit trampoline.	2021-03-18 00:22:51 +01:00
bpf_struct_ops_types.h	bpf: tcp: Support tcp_congestion_ops in bpf	2020-01-09 08:46:18 -08:00
bpf_task_storage.c	bpf: Make symbol 'bpf_task_storage_busy' static	2021-03-16 12:24:20 -07:00
btf.c	bpf: Forbid trampoline attach for functions with variable arguments	2021-05-07 01:28:28 +02:00
cgroup.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2021-02-16 13:14:06 -08:00
core.c	bpf: Remove unused parameter from ___bpf_prog_run	2021-04-03 01:38:52 +02:00
cpumap.c	bpf, cpumap: Bulk skb using netif_receive_skb_list	2021-04-27 17:13:49 +02:00
devmap.c	bpf, devmap: Move drop error path to devmap for XDP_REDIRECT	2021-03-18 16:38:51 +01:00
disasm.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-04-09 20:48:35 -07:00
disasm.h
dispatcher.c	bpf: Remove bpf_image tree	2020-03-13 12:49:52 -07:00
hashtab.c	kernel/bpf/: Fix misspellings using codespell tool	2021-03-16 12:22:20 -07:00
helpers.c	bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks	2021-06-02 21:59:22 +02:00
inode.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	2021-04-25 18:02:32 -07:00
Kconfig	bpf: Fix BPF_JIT kconfig symbol dependency	2021-05-20 23:48:37 +02:00
local_storage.c	bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper	2021-03-25 18:31:36 -07:00
lpm_trie.c	bpf: Add support for batched ops in LPM trie maps	2021-03-25 18:51:08 -07:00
Makefile	bpf: Enable task local storage for tracing programs	2021-02-26 11:51:47 -08:00
map_in_map.c	bpf: Relax max_entries check for most of the inner map types	2020-08-28 15:41:30 +02:00
map_in_map.h	bpf: Add map_meta_equal map ops	2020-08-28 15:41:30 +02:00
map_iter.c	bpf: Implement link_query callbacks in map element iterators	2020-08-21 14:01:39 -07:00
net_namespace.c	bpf: Add support for forced LINK_DETACH command	2020-08-01 20:38:28 -07:00
offload.c	bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill	2020-02-17 16:53:49 +01:00
percpu_freelist.c	bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI	2020-10-06 00:04:11 +02:00
percpu_freelist.h	bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI	2020-10-06 00:04:11 +02:00
prog_iter.c	bpf: Refactor bpf_iter_reg to have separate seq_info member	2020-07-25 20:16:32 -07:00
queue_stack_maps.c	bpf: Eliminate rlimit-based memory accounting for queue_stack_maps maps	2020-12-02 18:32:46 -08:00
reuseport_array.c	bpf: Eliminate rlimit-based memory accounting for reuseport_array maps	2020-12-02 18:32:47 -08:00
ringbuf.c	bpf: Prevent writable memory-mapping of read-only ringbuf pages	2021-05-11 13:31:10 +02:00
stackmap.c	bpf: Refcount task stack in bpf_get_task_stack	2021-04-01 13:58:07 -07:00
syscall.c	bpf: Add kconfig knob for disabling unpriv bpf by default	2021-05-11 13:56:16 -07:00
sysfs_btf.c	bpf: Load and verify kernel module BTFs	2020-11-10 15:25:53 -08:00
task_iter.c	bpf: Introduce task_vma bpf_iter	2021-02-12 12:56:53 -08:00
tnum.c	bpf: Verifier, do explicit ALU32 bounds tracking	2020-03-30 14:59:53 -07:00
trampoline.c	bpf: Allow trampoline re-attach for tracing and lsm programs	2021-04-25 21:09:01 -07:00
verifier.c	bpf: No need to simulate speculative domain for immediates	2021-05-25 22:08:53 +02:00