linux-stable/fs
Soheil Hassas Yeganeh c41a89af7b epoll: check for events when removing a timed out thread from the wait queue
commit 289caf5d8f upstream.

Patch series "simplify ep_poll".

This patch series is a followup based on the suggestions and feedback by
Linus:
https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com

The first patch in the series is a fix for the epoll race in presence of
timeouts, so that it can be cleanly backported to all affected stable
kernels.

The rest of the patch series simplify the ep_poll() implementation.  Some
of these simplifications result in minor performance enhancements as well.
We have kept these changes under self tests and internal benchmarks for a
few days, and there are minor (1-2%) performance enhancements as a result.

This patch (of 8):

After abc610e01c ("fs/epoll: avoid barrier after an epoll_wait(2)
timeout"), we break out of the ep_poll loop upon timeout, without checking
whether there is any new events available.  Prior to that patch-series we
always called ep_events_available() after exiting the loop.

This can cause races and missed wakeups.  For example, consider the
following scenario reported by Guantao Liu:

Suppose we have an eventfd added using EPOLLET to an epollfd.

Thread 1: Sleeps for just below 5ms and then writes to an eventfd.
Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an
          event of the eventfd, it will write back on that fd.
Thread 3: Calls epoll_wait with a negative timeout.

Prior to abc610e01c, it is guaranteed that Thread 3 will wake up either
by Thread 1 or Thread 2.  After abc610e01c, Thread 3 can be blocked
indefinitely if Thread 2 sees a timeout right before the write to the
eventfd by Thread 1.  Thread 2 will be woken up from
schedule_hrtimeout_range and, with evail 0, it will not call
ep_send_events().

To fix this issue:
1) Simplify the timed_out case as suggested by Linus.
2) while holding the lock, recheck whether the thread was woken up
   after its time out has reached.

Note that (2) is different from Linus' original suggestion: It do not set
"eavail = ep_events_available(ep)" to avoid unnecessary contention (when
there are too many timed-out threads and a small number of events), as
well as races mentioned in the discussion thread.

This is the first patch in the series so that the backport to stable
releases is straightforward.

Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com
Fixes: abc610e01c ("fs/epoll: avoid barrier after an epoll_wait(2) timeout")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Guantao Liu <guantaol@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Guantao Liu <guantaol@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08 11:23:05 +01:00
..
9p 9p: missing chunk of "fs/9p: Don't update file type when updating file attributes" 2022-06-22 14:11:02 +02:00
adfs
affs fs/affs: release old buffer head on error path 2021-03-04 10:26:48 +01:00
afs afs: Fix fileserver probe RTT handling 2022-12-08 11:23:03 +01:00
autofs
befs
bfs bfs: don't use WARNING: string when it's just info. 2021-01-06 14:48:39 +01:00
btrfs btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit() 2022-12-08 11:23:02 +01:00
cachefiles cachefiles: Handle readpage error correctly 2020-11-05 11:43:36 +01:00
ceph ceph: avoid putting the realm twice when decoding snaps fails 2022-12-08 11:23:00 +01:00
cifs cifs: add check for returning value of SMB2_set_info_init 2022-11-25 17:42:16 +01:00
coda
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-03-02 11:41:10 +01:00
cramfs
crypto fscrypt: add fscrypt_symlink_getattr() for computing st_size 2021-09-12 08:56:38 +02:00
debugfs debugfs: add debugfs_lookup_and_remove() 2022-09-15 12:04:54 +02:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:24:34 +01:00
dlm fs: dlm: handle -EBUSY first in lock arg validation 2022-10-26 13:22:14 +02:00
ecryptfs Revert "ecryptfs: replace BUG_ON with error handling code" 2021-05-26 12:05:19 +02:00
efivarfs efivarfs: revert "fix memory leak in efivarfs_create()" 2020-12-02 08:49:53 +01:00
efs
erofs erofs: avoid consecutive detection for Highmem memory 2022-08-25 11:17:36 +02:00
exportfs
ext2 ext2: Add more validity checks for inode counts 2022-08-25 11:17:28 +02:00
ext4 ext4: fix BUG_ON() when directory entry has invalid rec_len 2022-11-10 17:57:56 +01:00
f2fs f2fs: fix race condition on setting FI_NO_EXTENT flag 2022-10-26 13:22:46 +02:00
fat fat: add ratelimit to fat*_ent_bread() 2022-06-14 18:11:30 +02:00
freevxfs
fscache fscache: Fix cookie key hashing 2021-09-22 12:26:25 +02:00
fuse fuse: lock inode unconditionally in fuse_fallocate() 2022-12-08 11:23:01 +01:00
gfs2 gfs2: Switch from strlcpy to strscpy 2022-11-25 17:42:22 +01:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-31 08:19:38 +02:00
hfsplus hfsplus: prevent corruption in shrinking truncate 2021-05-19 10:08:29 +02:00
hostfs hostfs: fix memory handling in follow_link() 2021-04-14 08:24:14 +02:00
hpfs
hugetlbfs mm, hugetlb: allow for "high" userspace addresses 2022-05-09 09:03:28 +02:00
iomap iomap: iomap_write_failed fix 2022-06-14 18:11:36 +02:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-11-12 14:43:03 +01:00
jbd2 jbd2: wake up journal waiters in FIFO order, not LIFO 2022-10-26 13:22:17 +02:00
jffs2 jffs2: fix memory leak in jffs2_do_fill_super 2022-06-14 18:11:55 +02:00
jfs fs: jfs: fix possible NULL pointer dereference in dbFree() 2022-06-14 18:11:29 +02:00
kernfs kernfs: fix use-after-free in __kernfs_remove 2022-11-03 23:56:54 +09:00
lockd lockd: lockd server-side shouldn't set fl_ops 2021-09-22 12:26:34 +02:00
minix minix: fix bug when opening a file with O_DIRECT 2022-04-15 14:18:35 +02:00
nfs NFSv4: Retry LOCK on OLD_STATEID during delegation return 2022-11-25 17:42:12 +01:00
nfs_common nfs_common: need lock during iterate through the list 2020-12-30 11:51:22 +01:00
nfsd NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data 2022-10-26 13:22:47 +02:00
nilfs2 nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() 2022-12-08 11:23:04 +01:00
nls
notify fsnotify: fix wrong lockdep annotations 2022-06-14 18:11:34 +02:00
ntfs ntfs: check overflow when iterating ATTR_RECORDs 2022-11-25 17:42:22 +01:00
ocfs2 ocfs2: fix BUG when iput after ocfs2_mknod fails 2022-10-29 10:20:34 +02:00
omfs
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2022-01-20 09:19:17 +01:00
overlayfs ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh() 2022-08-25 11:17:23 +02:00
proc mm: /proc/pid/smaps_rollup: fix no vma's null-deref 2022-10-29 10:20:36 +02:00
pstore pstore: Fix typo in compression option name 2021-03-04 10:26:45 +01:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-30 10:09:26 +02:00
qnx6
quota quota: Check next/prev free block number after reading from quota file 2022-10-26 13:22:14 +02:00
ramfs ramfs: fix nommu mmap with gaps in the page cache 2020-10-29 09:57:53 +01:00
reiserfs reiserfs: check directory items on read from disk 2021-08-12 13:21:05 +02:00
romfs romfs: fix uninitialized memory leak in romfs_dev_read() 2020-08-26 10:40:51 +02:00
squashfs squashfs: fix divide error in calculate_skip() 2021-05-19 10:08:29 +02:00
sysfs sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output 2021-03-07 12:20:48 +01:00
sysv
tracefs tracefs: Only clobber mode/uid/gid on remount if asked 2022-09-20 12:28:00 +02:00
ubifs ubifs: Rectify space amount budget for mkdir/tmpfile operations 2022-04-15 14:18:31 +02:00
udf udf: Fix a slab-out-of-bounds write bug in udf_find_entry() 2022-11-25 17:42:09 +01:00
ufs fs/ufs: avoid potential u32 multiplication overflow 2020-08-21 13:05:37 +02:00
unicode
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-10-06 15:42:30 +02:00
xfs xfs: drain the buf delwri queue before xfsaild idles 2022-11-25 17:42:03 +01:00
aio.c aio: fix use-after-free due to missing POLLFREE handling 2021-12-14 14:49:02 +01:00
anon_inodes.c
attr.c vfs: Check the truncate maximum size in inode_newsize_ok() 2022-08-25 11:17:21 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings 2021-10-06 15:42:35 +02:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c binfmt_flat: do not stop relocating GOT entries prematurely on riscv 2022-06-14 18:11:23 +02:00
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-17 17:03:57 +01:00
binfmt_script.c
block_dev.c block: reexpand iov_iter after read/write 2021-05-22 11:38:29 +02:00
buffer.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:42:22 +01:00
char_dev.c
compat.c
compat_binfmt_elf.c
compat_ioctl.c compat_ioctl: remove /dev/random commands 2022-06-22 14:11:03 +02:00
coredump.c coredump: fix core_pattern parse error 2020-12-11 13:23:30 +01:00
d_path.c fs: fix NULL dereference due to data race in prepend_path() 2020-10-29 09:57:45 +01:00
dax.c dax: fix cache flush on PMD-mapped pages 2022-06-14 18:11:41 +02:00
dcache.c fix dget_parent() fastpath race 2020-10-01 13:17:19 +02:00
dcookies.c
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-14 08:24:11 +02:00
drop_caches.c
eventfd.c
eventpoll.c epoll: check for events when removing a timed out thread from the wait queue 2022-12-08 11:23:05 +01:00
exec.c exec: Force single empty string when argv is empty 2022-06-06 08:33:50 +02:00
fcntl.c fcntl: fix potential deadlock for &fasync_struct.fa_lock 2021-09-15 09:47:28 +02:00
fhandle.c
file.c fget: clarify and improve __fget_files() implementation 2022-03-02 11:41:18 +01:00
file_table.c SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() 2022-05-25 09:14:34 +02:00
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-17 10:50:21 +02:00
fs-writeback.c fs-writeback: writeback_sb_inodes:Recalculate 'wrote' according skipped pages 2022-06-14 18:11:44 +02:00
fs_context.c memcg: charge fs_context and legacy_fs_context 2022-02-08 18:24:29 +01:00
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
inode.c fs: fix UAF/GPF bug in nilfs_mdt_destroy 2022-10-15 07:54:36 +02:00
internal.h cgroup1: fix leaked context root causing sporadic NULL deref in LTP 2021-07-31 08:19:37 +02:00
io_uring.c io_uring/af_unix: defer registered files gc to io_uring release 2022-10-26 13:22:59 +02:00
ioctl.c
Kconfig
Kconfig.binfmt
libfs.c libfs: fix error cast of negative value in simple_attr_write() 2020-11-24 13:29:19 +01:00
locks.c locks: reinstate locks_delete_block optimization 2020-03-25 08:25:41 +01:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-25 17:42:22 +01:00
namespace.c fs: warn about impending deprecation of mandatory locks 2021-08-26 08:36:22 -04:00
no-block.c
nsfs.c
open.c cifs_atomic_open(): fix double-put on late allocation failure 2020-03-18 07:17:51 +01:00
pipe.c pipe: increase minimum default pipe size to 2 pages 2021-08-12 13:21:02 +02:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-05-02 08:48:44 +02:00
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-17 17:03:33 +01:00
posix_acl.c
proc_namespace.c
read_write.c
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-21 12:56:16 +02:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-29 10:25:11 +01:00
seq_file.c seq_file: disallow extremely large seq buffer allocations 2021-07-20 16:10:54 +02:00
signalfd.c io_uring: disable polling pollfree files 2022-09-05 10:27:47 +02:00
splice.c Revert "fs: check FMODE_LSEEK to control internal pipe splicing" 2022-10-17 17:24:32 +02:00
stack.c
stat.c stat: fix inconsistency between struct stat and struct compat_stat 2022-04-27 13:50:48 +02:00
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-02-23 11:59:55 +01:00
sync.c
timerfd.c
userfaultfd.c userfaultfd: open userfaultfds with O_RDONLY 2022-10-26 13:22:21 +02:00
utimes.c
xattr.c xattr: break delegations in {set,remove}xattr 2020-08-11 15:33:39 +02:00