diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 38b4db8be7a2..26f4bb3107fc 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -48,6 +48,15 @@ Program types bpf_lsm +Map types +========= + +.. toctree:: + :maxdepth: 1 + + map_cgroup_storage + + Testing and debugging BPF ========================= diff --git a/Documentation/bpf/map_cgroup_storage.rst b/Documentation/bpf/map_cgroup_storage.rst new file mode 100644 index 000000000000..cab9543017bf --- /dev/null +++ b/Documentation/bpf/map_cgroup_storage.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0-only +.. Copyright (C) 2020 Google LLC. + +=========================== +BPF_MAP_TYPE_CGROUP_STORAGE +=========================== + +The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized +storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that +attach to cgroups; the programs are made available by the same Kconfig. The +storage is identified by the cgroup the program is attached to. + +The map provide a local storage at the cgroup that the BPF program is attached +to. It provides a faster and simpler access than the general purpose hash +table, which performs a hash table lookups, and requires user to track live +cgroups on their own. + +This document describes the usage and semantics of the +``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in +Linux 5.9 and this document will describe the differences. + +Usage +===== + +The map uses key of type of either ``__u64 cgroup_inode_id`` or +``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: + + struct bpf_cgroup_storage_key { + __u64 cgroup_inode_id; + __u32 attach_type; + }; + +``cgroup_inode_id`` is the inode id of the cgroup directory. +``attach_type`` is the the program's attach type. + +Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. +When this key type is used, then all attach types of the particular cgroup and +map will share the same storage. Otherwise, if the type is +``struct bpf_cgroup_storage_key``, then programs of different attach types +be isolated and see different storages. + +To access the storage in a program, use ``bpf_get_local_storage``:: + + void *bpf_get_local_storage(void *map, u64 flags) + +``flags`` is reserved for future use and must be 0. + +There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` +can be accessed by multiple programs across different CPUs, and user should +take care of synchronization by themselves. The bpf infrastructure provides +``struct bpf_spin_lock`` to synchronize the storage. See +``tools/testing/selftests/bpf/progs/test_spin_lock.c``. + +Examples +======== + +Usage with key type as ``struct bpf_cgroup_storage_key``:: + + #include + + struct { + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); + __type(key, struct bpf_cgroup_storage_key); + __type(value, __u32); + } cgroup_storage SEC(".maps"); + + int program(struct __sk_buff *skb) + { + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); + __sync_fetch_and_add(ptr, 1); + + return 0; + } + +Userspace accessing map declared above:: + + #include + #include + + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) + { + struct bpf_cgroup_storage_key = { + .cgroup_inode_id = cgrp, + .attach_type = type, + }; + __u32 value; + bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); + // error checking omitted + return value; + } + +Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: + + #include + + struct { + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); + __type(key, __u64); + __type(value, __u32); + } cgroup_storage SEC(".maps"); + + int program(struct __sk_buff *skb) + { + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); + __sync_fetch_and_add(ptr, 1); + + return 0; + } + +And userspace:: + + #include + #include + + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) + { + __u32 value; + bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); + // error checking omitted + return value; + } + +Semantics +========= + +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This +per-CPU variant will have different memory regions for each CPU for each +storage. The non-per-CPU will have the same memory region for each storage. + +Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and +for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded +that uses the map. A program may be attached to multiple cgroups or have +multiple attach types, and each attach creates a fresh zeroed storage. The +storage is freed upon detach. + +There is a one-to-one association between the map of each type (per-CPU and +non-per-CPU) and the BPF program during load verification time. As a result, +each map can only be used by one BPF program and each BPF program can only use +one storage map of each type. Because of map can only be used by one BPF +program, sharing of this cgroup's storage with other BPF programs were +impossible. + +Since Linux 5.9, storage can be shared by multiple programs. When a program is +attached to a cgroup, the kernel would create a new storage only if the map +does not already contain an entry for the cgroup and attach type pair, or else +the old storage is reused for the new attachment. If the map is attach type +shared, then attach type is simply ignored during comparison. Storage is freed +only when either the map or the cgroup attached to is being freed. Detaching +will not directly free the storage, but it may cause the reference to the map +to reach zero and indirectly freeing all storage in the map. + +The map is not associated with any BPF program, thus making sharing possible. +However, the BPF program can still only associate with one map of each type +(per-CPU and non-per-CPU). A BPF program cannot use more than one +``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. + +In all versions, userspace may use the the attach parameters of cgroup and +attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map +APIs to read or update the storage for a given attachment. For Linux 5.9 +attach type shared storages, only the first value in the struct, cgroup inode +id, is used during comparison, so userspace may just specify a ``__u64`` +directly. + +The storage is bound at attach time. Even if the program is attached to parent +and triggers in child, the storage still belongs to the parent. + +Userspace cannot create a new entry in the map or delete an existing entry. +Program test runs always use a temporary storage.