linux-stable/fs
Benjamin Segall a16ceb1396 epoll: autoremove wakers even more aggressively
If a process is killed or otherwise exits while having active network
connections and many threads waiting on epoll_wait, the threads will all
be woken immediately, but not removed from ep->wq.  Then when network
traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will
not remove the entries from the list.

This means that the cost of the wakeup attempt is far higher than usual,
does not decrease, and this also competes with the dying threads trying to
actually make progress and remove themselves from the wq.

Handle this by removing visited epoll wq entries unconditionally, rather
than only when the wakeup succeeds - the structure of ep_poll means that
the only potential loss is the timed_out->eavail heuristic, which now can
race and result in a redundant ep_send_events attempt.  (But only when
incoming data and a timeout actually race, not on every timeout)

Shakeel added:

: We are seeing this issue in production with real workloads and it has
: caused hard lockups.  Particularly network heavy workloads with a lot
: of threads in epoll_wait() can easily trigger this issue if they get
: killed (oom-killed in our case).

Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Heiher <r@hev.cc>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17 17:31:40 -07:00
..
9p 9p: fix EBADF errors in cached mode 2022-06-17 06:03:30 +09:00
adfs
affs
afs afs: Fix dynamic root getattr 2022-06-21 11:47:30 -05:00
autofs
befs
bfs
btrfs for-5.19-rc3-tag 2022-06-26 10:11:36 -07:00
cachefiles
ceph netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
cifs cifs: update cifs_ses::ip_addr after failover 2022-06-24 13:34:28 -05:00
coda
configfs
cramfs
crypto
debugfs
devpts
dlm
ecryptfs
efivarfs
efs
erofs
exfat exfat: use updated exfat_chain directly during renaming 2022-06-09 21:26:32 +09:00
exportfs
ext2 ext2: fix fs corruption when trying to remove a non-empty directory with IO error 2022-06-16 10:55:45 +02:00
ext4 ext4: fix a doubled word "need" in a comment 2022-06-18 19:36:20 -04:00
f2fs f2fs: do not count ENOENT for error case 2022-06-21 08:29:56 -07:00
fat fat: add renameat2 RENAME_EXCHANGE flag support 2022-06-16 19:58:22 -07:00
freevxfs
fscache
fuse
gfs2
hfs
hfsplus
hostfs
hpfs
hugetlbfs hugetlbfs: zero partial pages during fallocate hole punch 2022-06-16 19:11:32 -07:00
iomap
isofs
jbd2 fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment 2022-06-16 10:36:09 -04:00
jffs2
jfs
kernfs
ksmbd
lockd
minix
netfs netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
nfs NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file 2022-06-15 15:03:12 -04:00
nfs_common
nfsd Notable changes: 2022-06-10 17:28:43 -07:00
nilfs2
nls
notify
ntfs
ntfs3
ocfs2 ocfs2: kill EBUSY from dlmfs_evict_inode 2022-06-16 19:58:20 -07:00
omfs
openpromfs
orangefs
overlayfs
proc proc: delete unused <linux/uaccess.h> includes 2022-07-17 17:31:39 -07:00
pstore
qnx4
qnx6
quota quota: Prevent memory allocation recursion while holding dq_lock 2022-06-06 10:08:10 +02:00
ramfs
reiserfs
romfs
smbfs_common
squashfs squashfs: don't use intermediate buffer if pages missing 2022-06-16 19:58:22 -07:00
sysfs
sysv
tracefs tracefs: Fix syntax errors in comments 2022-06-17 19:01:28 -04:00
ubifs
udf
ufs
unicode
vboxsf
verity
xfs xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes 2022-06-15 23:13:33 -07:00
zonefs zonefs: fix zonefs_iomap_begin() for reads 2022-06-08 19:13:55 +09:00
Kconfig
Kconfig.binfmt
Makefile
aio.c
anon_inodes.c
attr.c fs: account for group membership 2022-06-14 12:18:47 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c
dcache.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c epoll: autoremove wakers even more aggressively 2022-07-17 17:31:40 -07:00
exec.c
fcntl.c
fhandle.c
file.c fix the breakage in close_fd_get_file() calling conventions change 2022-06-05 15:03:03 -04:00
file_table.c
filesystems.c
fs-writeback.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c writeback: Fix inode->i_io_list not be protected by inode->i_lock error 2022-06-06 09:54:30 +02:00
internal.h
io-wq.c
io-wq.h
io_uring.c io_uring: use original request task for inflight tracking 2022-06-23 11:06:43 -06:00
ioctl.c
kernel_read_file.c fs/kernel_read_file: allow to read files up-to ssize_t 2022-06-16 19:58:21 -07:00
libfs.c
locks.c
mbcache.c
mount.h
mpage.c
namei.c Several cleanups in fs/namei.c. 2022-06-04 19:07:15 -07:00
namespace.c
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c
sync.c
sysctls.c
timerfd.c
userfaultfd.c
utimes.c
xattr.c