linux-stable/fs
Christian Brauner cb12fd8e0d
pidfd: add pidfs
This moves pidfds from the anonymous inode infrastructure to a tiny
pseudo filesystem. This has been on my todo for quite a while as it will
unblock further work that we weren't able to do simply because of the
very justified limitations of anonymous inodes. Moving pidfds to a tiny
pseudo filesystem allows:

* statx() on pidfds becomes useful for the first time.
* pidfds can be compared simply via statx() and then comparing inode
  numbers.
* pidfds have unique inode numbers for the system lifetime.
* struct pid is now stashed in inode->i_private instead of
  file->private_data. This means it is now possible to introduce
  concepts that operate on a process once all file descriptors have been
  closed. A concrete example is kill-on-last-close.
* file->private_data is freed up for per-file options for pidfds.
* Each struct pid will refer to a different inode but the same struct
  pid will refer to the same inode if it's opened multiple times. In
  contrast to now where each struct pid refers to the same inode. Even
  if we were to move to anon_inode_create_getfile() which creates new
  inodes we'd still be associating the same struct pid with multiple
  different inodes.

The tiny pseudo filesystem is not visible anywhere in userspace exactly
like e.g., pipefs and sockfs. There's no lookup, there's no complex
inode operations, nothing. Dentries and inodes are always deleted when
the last pidfd is closed.

We allocate a new inode for each struct pid and we reuse that inode for
all pidfds. We use iget_locked() to find that inode again based on the
inode number which isn't recycled. We allocate a new dentry for each
pidfd that uses the same inode. That is similar to anonymous inodes
which reuse the same inode for thousands of dentries. For pidfds we're
talking way less than that. There usually won't be a lot of concurrent
openers of the same struct pid. They can probably often be counted on
two hands. I know that systemd does use separate pidfd for the same
struct pid for various complex process tracking issues. So I think with
that things actually become way simpler. Especially because we don't
have to care about lookup. Dentries and inodes continue to be always
deleted.

The code is entirely optional and fairly small. If it's not selected we
fallback to anonymous inodes. Heavily inspired by nsfs which uses a
similar stashing mechanism just for namespaces.

Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01 12:23:37 +01:00
..
9p 9p: Use length of data written to the server in preference to error 2024-01-04 13:15:31 +00:00
adfs
affs
afs vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
autofs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
bcachefs More bcachefs updates for 6.7-rc1 2024-01-21 14:01:12 -08:00
befs
bfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
btrfs for-6.8/block-2024-01-08 2024-01-11 13:58:04 -08:00
cachefiles vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
ceph Assorted CephFS fixes and cleanups with nothing standing out. 2024-01-19 09:58:55 -08:00
coda dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
configfs
cramfs
crypto
debugfs Merge branches 'acpi-pm', 'acpi-video', 'acpi-apei' and 'acpi-extlog' 2024-01-04 13:19:40 +01:00
devpts
dlm
ecryptfs fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
efivarfs
efs
erofs vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
exfat exfat: do not zero the extended part 2024-01-08 21:57:22 +09:00
exportfs
ext2 fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
ext4 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
f2fs f2fs: fix double free of f2fs_sb_info 2024-01-12 18:55:09 -08:00
fat
freevxfs
fuse vfs-6.8.rw 2024-01-08 11:11:51 -08:00
gfs2 dlm for 6.8 2024-01-10 10:17:23 -08:00
hfs
hfsplus Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
hostfs
hpfs
hugetlbfs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
iomap
isofs
jbd2 jbd2: abort journal when detecting metadata writeback error of fs dev 2024-01-04 23:42:21 -05:00
jffs2
jfs
kernfs Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" 2024-01-11 11:51:27 +01:00
lockd sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
minix minixfs kmap_local_page() switchover and related fixes - very similar to sysv series. 2024-01-11 19:54:18 -08:00
netfs vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
nfs vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
nfs_common
nfsd misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
nilfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
nls
notify dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ntfs sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ntfs3
ocfs2 misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
omfs
openpromfs
orangefs
overlayfs dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
proc 17 hotfixes. 10 address post-6.7 issues and the other 7 are cc:stable. 2024-01-17 09:31:36 -08:00
pstore
qnx4
qnx6
quota sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
ramfs mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
reiserfs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
romfs
smb Various smb client fixes, including multichannel and for SMB3.1.1 POSIX extensions 2024-01-20 16:48:07 -08:00
squashfs
sysfs
sysv
tracefs eventfs: Use kcalloc() instead of kzalloc() 2024-01-16 17:52:33 -05:00
ubifs ubifs: fix kernel-doc warnings 2024-01-06 23:49:50 +01:00
udf misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
ufs Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
unicode
vboxsf
verity Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
xfs Bug fixes for 6.8: 2024-01-19 09:57:08 -08:00
zonefs misc cleanups (the part that hadn't been picked by individual fs trees) 2024-01-11 20:23:50 -08:00
Kconfig pidfd: add pidfs 2024-03-01 12:23:37 +01:00
Kconfig.binfmt
Makefile pidfd: move struct pidfd_fops 2024-02-28 17:17:07 +01:00
aio.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
anon_inodes.c
attr.c
backing-file.c
bad_inode.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c
dcache.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c pidfd: kill the no longer needed do_notify_pidfd() in de_thread() 2024-02-02 14:57:53 +01:00
fcntl.c
fhandle.c
file.c
file_table.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
filesystems.c
fs-writeback.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fix directory locking scheme on rename 2024-01-11 20:00:22 -08:00
internal.h dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
ioctl.c
kernel_read_file.c
libfs.c dcache stuff for this cycle 2024-01-11 20:11:35 -08:00
locks.c
mbcache.c
mnt_idmapping.c
mount.h
mpage.c
namei.c fix buggered locking in bch2_ioctl_subvolume_destroy() 2024-01-12 18:04:01 -08:00
namespace.c fs: rework listmount() implementation 2024-01-13 13:06:25 +01:00
nsfs.c
open.c vfs-6.8.rw 2024-01-08 11:11:51 -08:00
pidfs.c pidfd: add pidfs 2024-03-01 12:23:37 +01:00
pipe.c sysctl-6.8-rc1 2024-01-10 17:44:36 -08:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c vfs-6.8.mount 2024-01-08 10:57:34 -08:00
statfs.c
super.c fscrypt updates for 6.8 2024-01-10 10:24:49 -08:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c Generic: 2024-01-17 13:03:37 -08:00
utimes.c
xattr.c