linux-stable/fs/btrfs
Wang Xiaoguang 18513091af btrfs: update btrfs_space_info's bytes_may_use timely
This patch can fix some false ENOSPC errors, below test script can
reproduce one false ENOSPC error:
	#!/bin/bash
	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
	dev=$(losetup --show -f fs.img)
	mkfs.btrfs -f -M $dev
	mkdir /tmp/mntpoint
	mount $dev /tmp/mntpoint
	cd /tmp/mntpoint
	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile

Above script will fail for ENOSPC reason, but indeed fs still has free
space to satisfy this request. Please see call graph:
btrfs_fallocate()
|-> btrfs_alloc_data_chunk_ondemand()
|   bytes_may_use += 64M
|-> btrfs_prealloc_file_range()
    |-> btrfs_reserve_extent()
        |-> btrfs_add_reserved_bytes()
        |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
        |   change bytes_may_use, and bytes_reserved += 64M. Now
        |   bytes_may_use + bytes_reserved == 128M, which is greater
        |   than btrfs_space_info's total_bytes, false enospc occurs.
        |   Note, the bytes_may_use decrease operation will be done in
        |   end of btrfs_fallocate(), which is too late.

Here is another simple case for buffered write:
                    CPU 1              |              CPU 2
                                       |
|-> cow_file_range()                   |-> __btrfs_buffered_write()
    |-> btrfs_reserve_extent()         |   |
    |                                  |   |
    |                                  |   |
    |    .....                         |   |-> btrfs_check_data_free_space()
    |                                  |
    |                                  |
    |-> extent_clear_unlock_delalloc() |

In CPU 1, btrfs_reserve_extent()->find_free_extent()->
btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
operation will be delayed to be done in extent_clear_unlock_delalloc().
Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
btrfs_check_data_free_space() tries to reserve 100MB data space.
If
	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
btrfs_check_data_free_space() will try to allcate new data chunk or call
btrfs_start_delalloc_roots(), or commit current transaction in order to
reserve some free space, obviously a lot of work. But indeed it's not
necessary as long as decreasing bytes_may_use timely, we still have
free space, decreasing 128M from bytes_may_use.

To fix this issue, this patch chooses to update bytes_may_use for both
data and metadata in btrfs_add_reserved_bytes(). For compress path, real
extent length may not be equal to file content length, so introduce a
ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
file content length. Then compress path can update bytes_may_use
correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
and RESERVE_FREE.

As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
PREALLOC, we also need to update bytes_may_use, but can not pass
EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
to update btrfs_space_info's bytes_may_use.

Meanwhile __btrfs_prealloc_file_range() will call
btrfs_free_reserved_data_space() internally for both sucessful and failed
path, btrfs_prealloc_file_range()'s callers does not need to call
btrfs_free_reserved_data_space() any more.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:26 -07:00
..
tests btrfs: tests, require fs_info for root 2016-07-26 13:53:18 +02:00
acl.c btrfs: Replace -ENOENT by -ERANGE in btrfs_get_acl() 2016-07-26 13:52:25 +02:00
async-thread.c btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
async-thread.h btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
backref.c btrfs: backref: Fix soft lockup in __merge_refs function 2016-08-25 03:58:16 -07:00
backref.h
btrfs_inode.h Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
check-integrity.c btrfs: Use correct format specifier 2016-06-17 18:32:40 +02:00
check-integrity.h
compression.c Btrfs: fix BUG_ON in btrfs_submit_compressed_write 2016-07-26 13:52:25 +02:00
compression.h btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
ctree.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
ctree.h btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
delayed-inode.h Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes 2016-06-25 06:20:10 -07:00
delayed-ref.c btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
delayed-ref.h Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() 2016-08-03 11:02:51 +01:00
dev-replace.c btrfs: btrfs_test_opt and friends should take a btrfs_fs_info 2016-07-26 13:53:16 +02:00
dev-replace.h btrfs: refactor btrfs_dev_replace_start for reuse 2016-04-28 10:59:13 +02:00
dir-item.c
disk-io.c btrfs: waiting on qgroup rescan should not always be interruptible 2016-08-25 03:58:20 -07:00
disk-io.h btrfs: tests, require fs_info for root 2016-07-26 13:53:18 +02:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent-tree.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
extent_io.c Btrfs: fix eb memory leak due to readpage failure 2016-07-26 13:52:25 +02:00
extent_io.h btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
extent_map.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
file-item.c Btrfs: fix __MAX_CSUM_ITEMS 2016-08-03 14:08:37 -07:00
file.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
free-space-cache.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
free-space-cache.h btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
free-space-tree.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: rename btrfs_std_error to btrfs_handle_fs_error 2016-04-28 10:36:54 +02:00
inode-map.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
ioctl.c btrfs: waiting on qgroup rescan should not always be interruptible 2016-08-25 03:58:20 -07:00
Kconfig
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h
ordered-data.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
ordered-data.h Btrfs: fix race setting block group readonly during device replace 2016-05-30 12:58:21 +01:00
orphan.c
print-tree.c btrfs: teach print_leaf about temporary item subtypes 2016-02-11 16:15:43 +01:00
print-tree.h
props.c btrfs: simpilify btrfs_subvol_inherit_props 2016-07-26 13:54:22 +02:00
props.h
qgroup.c btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
qgroup.h btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
raid56.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
raid56.h Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation 2015-08-09 07:34:26 -07:00
rcu-string.h
reada.c Btrfs: fix race between readahead and device replace/removal 2016-05-30 12:58:18 +01:00
relocation.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
root-tree.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
scrub.c btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
send.c Btrfs: send, don't bug on inconsistent snapshots 2016-08-01 07:31:41 +01:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
sysfs.c btrfs: add missing bytes_readonly attribute file in sysfs 2016-07-26 13:52:25 +02:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
transaction.h btrfs: add btrfs_trans_handle->fs_info pointer 2016-07-26 13:54:26 +02:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c btrfs: qgroup: Fix qgroup incorrectness caused by log replay 2016-08-25 03:58:23 -07:00
tree-log.h Btrfs: fix unreplayable log after snapshot delete + parent dir fsync 2016-03-01 08:23:25 -08:00
ulist.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
ulist.h
uuid-tree.c
volumes.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
volumes.h Merge branch 'foreign/jeffm/uapi' into for-chris-4.7-20160516 2016-05-16 15:46:29 +02:00
xattr.c switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00