linux-stable/fs/btrfs
Maxim Patlasov 2939e1a86f btrfs: limit async_work allocation and worker func duration
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).

Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):

[root@kteam1 ~]# cat prep.sh

DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
	mkdir /mnt/$i
	btrfs subvolume create /mnt/SV_$i
	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
	chmod a+rwx /mnt/$i
done

[root@kteam1 ~]# sh prep.sh

[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done

[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0

The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.

The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.

The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.

The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.

Changed in v2: remove support of thresh == NO_THRESHOLD.

Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Chris Mason <clm@fb.com>
Cc: stable@vger.kernel.org # v3.15+
2016-12-13 11:01:30 -08:00
..
tests btrfs: pull node/sector/stripe sizes out of root and into fs_info 2016-12-06 16:06:58 +01:00
acl.c posix_acl: Clear SGID bit when setting file permissions 2016-09-22 10:55:32 +02:00
async-thread.c btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
async-thread.h btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
backref.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
backref.h
btrfs_inode.h Btrfs: add a flags field to btrfs_fs_info 2016-09-26 17:59:49 +02:00
check-integrity.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.h btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
ctree.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
ctree.h Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors 2016-12-09 06:00:28 -08:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: limit async_work allocation and worker func duration 2016-12-13 11:01:30 -08:00
delayed-inode.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
delayed-ref.c btrfs: improve delayed refs iterations 2016-11-30 13:45:21 +01:00
delayed-ref.h Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
dev-replace.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
dev-replace.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
dir-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
disk-io.c Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
disk-io.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
export.c btrfs: root->fs_info cleanup, add fs_info convenience variables 2016-12-06 16:06:59 +01:00
export.h
extent-tree.c btrfs: opencode chunk locking, remove helpers 2016-12-06 16:07:00 +01:00
extent_io.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
extent_io.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
extent_map.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
extent_map.h
file-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
file.c Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
free-space-cache.c btrfs: opencode chunk locking, remove helpers 2016-12-06 16:07:00 +01:00
free-space-cache.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
free-space-tree.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
free-space-tree.h
hash.c btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.h
inode.c Revert "Btrfs: adjust len of writes if following a preallocated extent" 2016-12-11 15:27:15 -08:00
ioctl.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
Kconfig
locking.c
locking.h
lzo.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00
Makefile
math.h
ordered-data.c btrfs: root->fs_info cleanup, add fs_info convenience variables 2016-12-06 16:06:59 +01:00
ordered-data.h btrfs: pull node/sector/stripe sizes out of root and into fs_info 2016-12-06 16:06:58 +01:00
orphan.c
print-tree.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
print-tree.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
props.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
props.h
qgroup.c Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
qgroup.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
raid56.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
relocation.c Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
root-tree.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
scrub.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
send.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
send.h
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
sysfs.c btrfs: convert printk(KERN_* to use pr_* calls 2016-09-26 18:08:44 +02:00
sysfs.h
transaction.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
transaction.h btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
tree-defrag.c
tree-log.c Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 2016-12-13 09:14:42 -08:00
tree-log.h Btrfs: fix lockdep warning on deadlock against an inode's log mutex 2016-08-25 03:58:32 -07:00
ulist.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
ulist.h
uuid-tree.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
volumes.c btrfs: opencode chunk locking, remove helpers 2016-12-06 16:07:00 +01:00
volumes.h btrfs: opencode chunk locking, remove helpers 2016-12-06 16:07:00 +01:00
xattr.c btrfs: remove root parameter from transaction commit/end routines 2016-12-06 16:07:00 +01:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c btrfs: use bio iterators for the decompression handlers 2016-11-30 13:45:19 +01:00