linux-stable/fs
Maxim Patlasov 2939e1a86f btrfs: limit async_work allocation and worker func duration
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).

Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):

[root@kteam1 ~]# cat prep.sh

DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
	mkdir /mnt/$i
	btrfs subvolume create /mnt/SV_$i
	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
	chmod a+rwx /mnt/$i
done

[root@kteam1 ~]# sh prep.sh

[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done

[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0

The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.

The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.

The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.

The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.

Changed in v2: remove support of thresh == NO_THRESHOLD.

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Chris Mason <clm@fb.com>
Cc: stable@vger.kernel.org # v3.15+
2016-12-13 11:01:30 -08:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
adfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
afs afs: call->operation_ID sometimes used as __be32 sometimes as u32 2016-10-13 17:03:52 +01:00
autofs4 autofs: refactor ioctl fn vector in iookup_dev_ioctl() 2016-10-11 15:06:31 -07:00
befs befs fixes for 4.9-rc1 2016-10-15 12:09:13 -07:00
bfs Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
btrfs btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
cachefiles Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ceph ceph: use default file splice read callback 2016-11-10 20:13:04 +01:00
cifs CIFS: Retrieve uid and gid from special sid if enabled 2016-10-14 14:22:16 -05:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
configfs Merge remote-tracking branch 'ovl/rename2' into for-linus 2016-10-10 23:02:51 -04:00
cramfs
crypto fscrypto: don't use on-stack buffer for key derivation 2016-11-19 20:56:13 -05:00
debugfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
devpts Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
dlm dlm: free workqueues after the connections 2016-10-10 09:54:00 -05:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
efs
exofs fs: exofs: print a hex number after a 0x prefix 2016-10-27 18:43:43 -07:00
exportfs exportfs: be careful to only return expected errors. 2016-10-06 09:07:44 -04:00
ext2 ext2: avoid bogus -Wmaybe-uninitialized warning 2016-10-18 11:29:35 +02:00
ext4 ext4: sanity check the block and cluster size at mount time 2016-11-19 20:58:15 -05:00
f2fs This includes fixing a bug which references a wrong pointer, sum_page, in 2016-10-18 14:15:23 -07:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
freevxfs freevxfs: update Kconfig information 2016-06-13 10:20:39 +02:00
fscache Merge branch 'd_real' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc 2016-06-30 23:34:49 -04:00
fuse fuse: fix fuse_write_end() if zero bytes were copied 2016-11-15 12:34:21 +01:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hostfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
hugetlbfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
isofs isofs: Do not return EACCES for unknown filesystems 2016-10-18 11:28:21 +02:00
jbd2 jbd2: fix incorrect unlock on j_list_lock 2016-10-12 23:19:18 -04:00
jffs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
kernfs kernfs: Add noop_fsync to supported kernfs_file_fops 2016-10-27 17:47:11 +02:00
lockd treewide: remove redundant #include <linux/kconfig.h> 2016-10-11 15:06:33 -07:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
nfs NFS client bugfixes for Linux 4.9 part 4 2016-11-23 14:43:40 -08:00
nfs_common
nfsd nfsd: Fix general protection fault in release_lock_stateid() 2016-11-01 15:24:43 -04:00
nilfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
nls
notify fsnotify: clean up spinlock assertions 2016-10-07 18:46:26 -07:00
ntfs fs: remove the never implemented aio_fsync file operation 2016-10-30 13:09:42 -04:00
ocfs2 ocfs2: fix not enough credit panic 2016-11-11 08:12:37 -08:00
omfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
openpromfs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
orangefs orangefs: add .owner to debugfs file_operations 2016-11-16 11:52:19 -05:00
overlayfs ovl: fsync after copy-up 2016-10-31 14:42:14 +01:00
proc proc: fix NULL dereference when reading /proc/<pid>/auxv 2016-10-27 18:43:43 -07:00
pstore Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
qnx4
qnx6
quota quota: fill in Q_XGETQSTAT inode information for inactive quotas 2016-08-15 17:43:31 +02:00
ramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
reiserfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
romfs
squashfs vfs: Remove {get,set,remove}xattr inode operations 2016-10-07 21:48:36 -04:00
sysfs Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-10-14 12:18:50 -07:00
sysv Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
tracefs fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
ubifs ubifs: Fix regression in ubifs_readdir() 2016-10-28 14:48:31 +02:00
udf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
ufs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
xfs xfs: defer should abort intent items if the trans roll fails 2016-10-24 14:21:18 +11:00
aio.c aio: fix freeze protection of aio writes 2016-10-30 13:09:42 -04:00
anon_inodes.c
attr.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
bad_inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
binfmt_aout.c fs: fix binfmt_aout.c build error 2016-05-28 16:34:59 -07:00
binfmt_elf.c x86/coredump: Use pr_reg size, rather that TIF_IA32 flag 2016-09-14 21:28:10 +02:00
binfmt_elf_fdpic.c elf_fdpic_transfer_args_to_stack(): make it generic 2016-07-25 16:51:49 +10:00
binfmt_em86.c fs/binfmt_em86.c: fix incompatible pointer type 2016-08-02 19:35:15 -04:00
binfmt_flat.c binfmt_flat: allow compressed flat binary format to work on MMU systems 2016-07-28 13:29:12 +10:00
binfmt_misc.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
binfmt_script.c
block_dev.c block: implement (some of) fallocate for block devices 2016-10-11 15:06:30 -07:00
buffer.c fs: use mapping_set_error instead of opencoded set_bit 2016-10-11 15:06:33 -07:00
char_dev.c dax: define a unified inode/address_space for device-dax mappings 2016-08-23 22:58:51 -07:00
compat.c compat: remove compat_printk() 2016-09-27 21:20:53 -04:00
compat_binfmt_elf.c
compat_ioctl.c fs: compat_ioctl: add pretimeout functions for watchdogs 2016-09-24 09:27:18 +02:00
coredump.c coredump: fix unfreezable coredumping task 2016-11-11 08:12:37 -08:00
dax.c thp: reduce usage of huge zero page's atomic counter 2016-10-07 18:46:28 -07:00
dcache.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-08-07 10:01:14 -04:00
dcookies.c
direct-io.c consistent treatment of EFAULT on O_DIRECT read/write 2016-10-03 20:38:55 -04:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c mm: replace get_user_pages_remote() write/force parameters with gup_flags 2016-10-19 08:12:02 -07:00
fcntl.c
fhandle.c
file.c fs/file: more unsigned file descriptors 2016-09-27 18:47:38 -04:00
file_table.c
filesystems.c
fs-writeback.c mm, writeback: flush plugged IO in wakeup_flusher_threads() 2016-08-09 19:58:06 -06:00
fs_pin.c
fs_struct.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
internal.h Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 13:04:49 -07:00
ioctl.c vfs: cap dedupe request structure size at PAGE_SIZE 2016-09-15 13:29:52 -07:00
iomap.c fs: Do to trim high file position bits in iomap_page_mkwrite_actor 2016-10-24 14:20:25 +11:00
Kconfig mm/hugetlb: introduce ARCH_HAS_GIGANTIC_PAGE 2016-10-07 18:46:29 -07:00
Kconfig.binfmt ARM: 8594/1: enable binfmt_flat on systems with an MMU 2016-08-12 16:47:05 +01:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
locks.c locking, fs/locks: Add missing file_sem locks 2016-10-18 12:21:28 +02:00
Makefile fs: introduce iomap infrastructure 2016-06-21 09:23:11 +10:00
mbcache.c mbcache: fix to detect failure of register_shrinker 2016-08-31 11:44:36 -04:00
mount.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
mpage.c block/mm: make bdev_ops->rw_page() take a bool for read/write 2016-08-07 14:41:02 -06:00
namei.c Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs 2016-10-14 17:23:33 -07:00
namespace.c This adds a new gcc plugin named "latent_entropy". It is designed to 2016-10-15 10:03:15 -07:00
no-block.c
nsfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
open.c xfs: reflink update for 4.9-rc1 2016-10-13 20:28:22 -07:00
pipe.c pipe: cap initial pipe capacity according to pipe-max-size limit 2016-10-11 15:06:32 -07:00
pnode.c mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2016-09-30 12:46:48 -05:00
posix_acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
proc_namespace.c
read_write.c iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() 2016-10-14 20:00:34 -04:00
readdir.c
select.c fs/select: add vmalloc fallback for select(2) 2016-10-11 15:06:30 -07:00
seq_file.c seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char 2016-10-07 18:46:30 -07:00
signalfd.c
splice.c fix default_file_splice_read() 2016-11-26 20:05:42 -05:00
stack.c
stat.c
statfs.c
super.c fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths 2016-10-14 20:41:59 -04:00
sync.c
timerfd.c timerfd: Reject ALARM timerfds without CAP_WAKE_ALARM 2016-06-09 23:42:38 +02:00
userfaultfd.c mm: introduce fault_env 2016-07-26 16:19:19 -07:00
utimes.c Merge remote-tracking branch 'jk/vfs' into work.misc 2016-10-08 11:06:08 -04:00
xattr.c xattr: Fix setting security xattrs on sockfs 2016-11-17 00:00:23 -05:00