linux-stable/include
Daniel Borkmann 3b1efb196e bpf, maps: flush own entries on perf map release
The behavior of perf event arrays are quite different from all
others as they are tightly coupled to perf event fds, f.e. shown
recently by commit e03e7ee34f ("perf/bpf: Convert perf_event_array
to use struct file") to make refcounting on perf event more robust.
A remaining issue that the current code still has is that since
additions to the perf event array take a reference on the struct
file via perf_event_get() and are only released via fput() (that
cleans up the perf event eventually via perf_event_release_kernel())
when the element is either manually removed from the map from user
space or automatically when the last reference on the perf event
map is dropped. However, this leads us to dangling struct file's
when the map gets pinned after the application owning the perf
event descriptor exits, and since the struct file reference will
in such case only be manually dropped or via pinned file removal,
it leads to the perf event living longer than necessary, consuming
needlessly resources for that time.

Relations between perf event fds and bpf perf event map fds can be
rather complex. F.e. maps can act as demuxers among different perf
event fds that can possibly be owned by different threads and based
on the index selection from the program, events get dispatched to
one of the per-cpu fd endpoints. One perf event fd (or, rather a
per-cpu set of them) can also live in multiple perf event maps at
the same time, listening for events. Also, another requirement is
that perf event fds can get closed from application side after they
have been attached to the perf event map, so that on exit perf event
map will take care of dropping their references eventually. Likewise,
when such maps are pinned, the intended behavior is that a user
application does bpf_obj_get(), puts its fds in there and on exit
when fd is released, they are dropped from the map again, so the map
acts rather as connector endpoint. This also makes perf event maps
inherently different from program arrays as described in more detail
in commit c9da161c65 ("bpf: fix clearing on persistent program
array maps").

To tackle this, map entries are marked by the map struct file that
added the element to the map. And when the last reference to that map
struct file is released from user space, then the tracked entries
are purged from the map. This is okay, because new map struct files
instances resp. frontends to the anon inode are provided via
bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
for retrieving a pinned map, but also when an initial instance is
created via map_create(). The rest is resolved by the vfs layer
automatically for us by keeping reference count on the map's struct
file. Any concurrent updates on the map slot are fine as well, it
just means that perf_event_fd_array_release() needs to delete less
of its own entires.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
..
acpi Merge branches 'acpica-fixes', 'acpi-video' and 'acpi-processor' 2016-06-03 22:35:05 +02:00
asm-generic Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:11:43 -07:00
clocksource
crypto Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-05-19 09:21:36 -07:00
drm Merge tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel into drm-next 2016-05-27 16:08:38 +10:00
dt-bindings Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2016-05-26 09:23:43 -07:00
keys
kvm KVM: arm/arm64: vgic-new: implement mapped IRQ handling 2016-05-20 15:40:09 +02:00
linux bpf, maps: flush own entries on perf map release 2016-06-15 23:42:57 -07:00
math-emu
media
memory
misc
net net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads 2016-06-15 21:39:59 -07:00
pcmcia
ras
rdma IB/core: Make all casts in ib_device_cap_flags enum consistent 2016-06-07 09:50:55 -04:00
rxrpc
scsi
soc fsl/qe: Do not prefix header guard with CONFIG_ 2016-06-08 11:08:01 -07:00
sound
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2016-05-28 12:04:17 -07:00
trace - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat 2016-05-27 13:41:54 -07:00
uapi net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads 2016-06-15 21:39:59 -07:00
video imx-drm probing fix 2016-05-25 12:36:20 +10:00
xen
Kbuild