linux-stable/fs/btrfs
Zygo Blaxell d24a67b2d9 btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2
Now that check_extent_in_eb()'s extent offset filter can be turned off,
we need a way to do it from userspace.

Add a 'flags' field to the btrfs_logical_ino_args structure to disable
extent offset filtering, taking the place of one of the existing
reserved[] fields.

Previous versions of LOGICAL_INO neglected to check whether any of the
reserved fields have non-zero values.  Assigning meaning to those fields
now may change the behavior of existing programs that left these fields
uninitialized.  The lack of a zero check also means that new programs
have no way to know whether the kernel is honoring the flags field.

To avoid these problems, define a new ioctl LOGICAL_INO_V2.  We can
use the same argument layout as LOGICAL_INO, but shorten the reserved[]
array by one element and turn it into the 'flags' field.  The V2 ioctl
explicitly checks that reserved fields and unsupported flag bits are zero
so that userspace can negotiate future feature bits as they are defined.

Since the memory layouts of the two ioctls' arguments are compatible,
there is no need for a separate function for logical_to_ino_v2 (contrast
with tree_search_v2 vs tree_search where the layout and code are quite
different).  A version parameter and an 'if' statement will suffice.

Now that we have a flags field in logical_ino_args, add a flag
BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET to get the behavior we want,
and pass it down the stack to iterate_inodes_from_logical.

Motivation and background, copied from the patchset cover letter:

Suppose we have a file with one extent:

    root@tester:~# zcat /usr/share/doc/cpio/changelog.gz > /test/a
    root@tester:~# sync

Split the extent by overwriting it in the middle:

    root@tester:~# cat /dev/urandom | dd bs=4k seek=2 skip=2 count=1 conv=notrunc of=/test/a

We should now have 3 extent refs to 2 extents, with one block unreachable.
The extent tree looks like:

    root@tester:~# btrfs-debug-tree /dev/vdc -t 2
    [...]
            item 9 key (1103101952 EXTENT_ITEM 73728) itemoff 15942 itemsize 53
                    extent refs 2 gen 29 flags DATA
                    extent data backref root 5 objectid 261 offset 0 count 2
    [...]
            item 11 key (1103175680 EXTENT_ITEM 4096) itemoff 15865 itemsize 53
                    extent refs 1 gen 30 flags DATA
                    extent data backref root 5 objectid 261 offset 8192 count 1
    [...]

and the ref tree looks like:

    root@tester:~# btrfs-debug-tree /dev/vdc -t 5
    [...]
            item 6 key (261 EXTENT_DATA 0) itemoff 15825 itemsize 53
                    extent data disk byte 1103101952 nr 73728
                    extent data offset 0 nr 8192 ram 73728
                    extent compression(none)
            item 7 key (261 EXTENT_DATA 8192) itemoff 15772 itemsize 53
                    extent data disk byte 1103175680 nr 4096
                    extent data offset 0 nr 4096 ram 4096
                    extent compression(none)
            item 8 key (261 EXTENT_DATA 12288) itemoff 15719 itemsize 53
                    extent data disk byte 1103101952 nr 73728
                    extent data offset 12288 nr 61440 ram 73728
                    extent compression(none)
    [...]

There are two references to the same extent with different, non-overlapping
byte offsets:

    [------------------72K extent at 1103101952----------------------]
    [--8K----------------|--4K unreachable----|--60K-----------------]
    ^                                         ^
    |                                         |
    [--8K ref offset 0--][--4K ref offset 0--][--60K ref offset 12K--]
                         |
                         v
                         [-----4K extent-----] at 1103175680

We want to find all of the references to extent bytenr 1103101952.

Without the patch (and without running btrfs-debug-tree), we have to
do it with 18 LOGICAL_INO calls:

    root@tester:~# btrfs ins log 1103101952 -P /test/
    Using LOGICAL_INO
    inode 261 offset 0 root 5

    root@tester:~# for x in $(seq 0 17); do btrfs ins log $((1103101952 + x * 4096)) -P /test/; done 2>&1 | grep inode
    inode 261 offset 0 root 5
    inode 261 offset 4096 root 5   <- same extent ref as offset 0
                                   (offset 8192 returns empty set, not reachable)
    inode 261 offset 12288 root 5
    inode 261 offset 16384 root 5  \
    inode 261 offset 20480 root 5  |
    inode 261 offset 24576 root 5  |
    inode 261 offset 28672 root 5  |
    inode 261 offset 32768 root 5  |
    inode 261 offset 36864 root 5  \
    inode 261 offset 40960 root 5   > all the same extent ref as offset 12288.
    inode 261 offset 45056 root 5  /  More processing required in userspace
    inode 261 offset 49152 root 5  |  to figure out these are all duplicates.
    inode 261 offset 53248 root 5  |
    inode 261 offset 57344 root 5  |
    inode 261 offset 61440 root 5  |
    inode 261 offset 65536 root 5  |
    inode 261 offset 69632 root 5  /

In the worst case the extents are 128MB long, and we have to do 32768
iterations of the loop to find one 4K extent ref.

With the patch, we just use one call to map all refs to the extent at once:
    root@tester:~# btrfs ins log 1103101952 -P /test/
    Using LOGICAL_INO_V2
    inode 261 offset 0 root 5
    inode 261 offset 12288 root 5

The TREE_SEARCH ioctl allows userspace to retrieve the offset and
extent bytenr fields easily once the root, inode and offset are known.
This is sufficient information to build a complete map of the extent
and all of its references.  Userspace can use this information to make
better choices to dedup or defrag.

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
[ copy background and motivation from cover letter ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01 20:45:35 +01:00
..
tests btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
acl.c btrfs: preserve i_mode if __btrfs_set_acl() fails 2017-08-21 17:47:42 +02:00
async-thread.c Btrfs: fix confusing worker helper info in stacktrace 2017-10-30 12:27:57 +01:00
async-thread.h btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
backref.c btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
backref.h btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
btrfs_inode.h btrfs: separate defrag and property compression 2017-08-16 16:12:05 +02:00
check-integrity.c btrfs: Use bd_dev to generate index when dev_state_hashtable add items. 2017-10-30 12:28:01 +01:00
check-integrity.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
compression.c btrfs: allow setting zlib compression level via :9 2017-11-01 20:45:34 +01:00
compression.h btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
ctree.c btrfs: pass root to various extent ref mod functions 2017-10-30 12:28:00 +01:00
ctree.h btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: increase ctx->pos for delayed dir index 2017-08-18 16:36:20 +02:00
delayed-inode.h btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
delayed-ref.c btrfs: remove type argument from comp_tree_refs 2017-10-30 12:28:00 +01:00
delayed-ref.h btrfs: remove delayed_ref_node from ref_head 2017-10-30 12:28:00 +01:00
dev-replace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
dev-replace.h btrfs: constify device path passed to relevant helpers 2017-02-28 14:26:07 +01:00
dir-item.c btrfs: fix validation of XATTR_ITEM dir items 2017-06-29 20:06:11 +02:00
disk-io.c btrfs: Replace opencoded sizes with their symbolic constants 2017-10-30 12:28:01 +01:00
disk-io.h btrfs: use named constant for bdev blocksize 2017-08-16 16:12:04 +02:00
export.c btrfs: Check name_len before reading btrfs_get_name 2017-06-21 19:16:04 +02:00
export.h
extent-tree.c btrfs: add assertions for releasing trans handle reservations 2017-10-30 12:28:01 +01:00
extent_io.c btrfs: get rid of sector_t and use u64 offset in submit_extent_page 2017-10-30 12:28:00 +01:00
extent_io.h Btrfs: remove bio_flags which indicates a meta block of log-tree 2017-10-30 12:27:56 +01:00
extent_map.c btrfs: convert extent_map.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
extent_map.h btrfs: convert extent_map.refs from atomic_t to refcount_t 2017-04-18 14:07:23 +02:00
file-item.c Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-05 16:41:23 -07:00
file.c btrfs: cleanup extent locking sequence 2017-10-30 12:28:01 +01:00
free-space-cache.c btrfs: drop newlines from strings when using btrfs_* helpers 2017-08-16 16:12:02 +02:00
free-space-cache.h btrfs: free-space-cache, clean up unnecessary root arguments 2017-02-17 12:03:56 +01:00
free-space-tree.c btrfs: Clean up unused variables in free-space-tree.c 2017-10-30 12:27:59 +01:00
free-space-tree.h btrfs: expose internal free space tree routine only if sanity tests are enabled 2017-08-18 16:36:29 +02:00
hash.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
hash.h
inode-item.c btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
inode-map.c btrfs: qgroup: Introduce extent changeset for qgroup reserve functions 2017-06-29 20:17:02 +02:00
inode-map.h
inode.c btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
ioctl.c btrfs: add a flags argument to LOGICAL_INO and call it LOGICAL_INO_V2 2017-11-01 20:45:35 +01:00
Kconfig Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
locking.c
locking.h
lzo.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
Makefile Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
math.h
ordered-data.c btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
ordered-data.h btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
orphan.c
print-tree.c Btrfs: add one more sanity check for shared ref type 2017-08-21 17:47:43 +02:00
print-tree.h btrfs: get fs_info from eb in btrfs_print_tree, remove argument 2017-08-16 16:12:03 +02:00
props.c Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2017-09-14 17:30:49 -07:00
props.h
qgroup.c btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
qgroup.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
raid56.c Btrfs: fix memory leak in raid56 2017-10-30 12:27:57 +01:00
raid56.h btrfs: take an fs_info directly when the root is not used otherwise 2016-12-06 16:06:59 +01:00
rcu-string.h
reada.c btrfs: remove unused member err from reada_extent 2017-06-19 18:25:59 +02:00
ref-verify.c Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
ref-verify.h Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
relocation.c btrfs: pass root to various extent ref mod functions 2017-10-30 12:28:00 +01:00
root-tree.c btrfs: Clean up dead code in root-tree 2017-10-30 12:27:56 +01:00
scrub.c btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
send.c btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
send.h btrfs: fix send ioctl on 32bit with 64bit kernel 2017-10-30 12:27:59 +01:00
struct-funcs.c btrfs: struct-funcs, constify readers 2017-08-16 14:19:53 +02:00
super.c btrfs: allow setting zlib compression level via :9 2017-11-01 20:45:34 +01:00
sysfs.c btrfs: prefix sysfs attribute struct names 2017-10-30 12:27:58 +01:00
sysfs.h btrfs: prefix sysfs attribute struct names 2017-10-30 12:27:58 +01:00
transaction.c Btrfs: only check delayed ref usage in should_end_transaction 2017-10-30 12:28:00 +01:00
transaction.h btrfs: remove unused qgroup members from btrfs_trans_handle 2017-04-18 14:07:25 +02:00
tree-checker.c btrfs: tree-checker: use %zu format string for size_t 2017-10-30 12:27:59 +01:00
tree-checker.h btrfs: Move leaf and node validation checker to tree-checker.c 2017-10-30 12:27:58 +01:00
tree-defrag.c
tree-log.c btrfs: pass root to various extent ref mod functions 2017-10-30 12:28:00 +01:00
tree-log.h btrfs: Make btrfs_del_inode_ref take btrfs_inode 2017-02-14 15:50:54 +01:00
ulist.c btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
ulist.h btrfs: ulist: rename ulist_fini to ulist_release 2017-02-17 12:03:50 +01:00
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2016-12-19 18:08:15 +01:00
volumes.c btrfs: remove BUG_ON in btrfs_rm_dev_replace_free_srcdev() 2017-11-01 20:45:34 +01:00
volumes.h btrfs: declare btrfs_report_missing_device() static 2017-10-30 12:27:59 +01:00
xattr.c btrfs: Check name_len with boundary in verify dir_item 2017-06-21 19:16:04 +02:00
xattr.h
zlib.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
zstd.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00